Proceedings ArticleDOI
Integer division using reciprocals
R. Alverson
- pp 186-190
TLDR
The author describes the design decisions made when designing integer division for a new 64-b machine, and proposes a fast and economical scheme for computing both unsigned and signed integer quotients which guarantees an exact answer without any correction.Abstract:
By using a reciprocal approximation, integer division can be synthesized from a multiply followed by a shift. Without carefully selecting the reciprocal, however, the quotient obtained often suffers from off-by-one errors, requiring a correction step. The author describes the design decisions made when designing integer division for a new 64-b machine. The result is a fast and economical scheme for computing both unsigned and signed integer quotients which guarantees an exact answer without any correction. The reciprocal computation is fast enough, with one table lookup and five multiplies, so that this scheme is competitive with a dedicated divider, while requiring much less hardware specific to division. The real strength of the proposed method is division by a constant, which takes only a single multiply and shift, one operation on the machine considered. The analysis shows that the computed quotient is always exact: no adjustment or correction is necessary. >read more
Citations
More filters
Book
Hacker's Delight
TL;DR: The term "hacker" in the title is meant in the originalsense of an aficionado of computerssomeone who enjoys making computers do new things, or do old things in a new and clever way.
Patent
Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts
TL;DR: In this article, a multiprocessing system with a plurality of thread contexts (TCs) and a virtual processing element (VPE) is described, where the interrupt requests are non-specific to the plurality of TCs, and the VPE is configured to select a non-exempt one of the plurality to service each of the interrupted requests.
Proceedings ArticleDOI
Division by invariant integers using multiplication
TL;DR: This paper presents code sequences for division by arbitrary nonzero integer constants and run-time invariants using integer multiplication using a two's complement architecture, and treats unsigned division, signed division, and division where the result is known a priori.
Patent
Integrated mechanism for suspension and deallocation of computational threads of execution in a processor
TL;DR: In this paper, a yield instruction for execution in a multithreaded microprocessor is disclosed and conditionally reschedules the thread based on the qualifier inputs and bit vector values, if the operand is zero or positive integer, the microprocessor terminates the program thread including the yield instruction.
Patent
Apparatus, method and instruction for initiation of concurrent instruction streams in a multithreading microprocessor
TL;DR: In this article, the fork instruction for a multithreaded microprocessor and occupying a single instruction issue slot is disclosed. The fork instruction, executing in a parent thread, includes a first operand specifying the initial instruction address of a new thread and a second operand.
References
More filters
Proceedings ArticleDOI
The Tera computer system
Robert Alverson,David Callahan,Daniel Cummings,Brian D. Koblenz,Allan Porterfield,Burton Smith +5 more
TL;DR: The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.
Proceedings Article
Tera computer system
Robert L. Alverson,David Callahan,Allan Porterfield,Daniel Cummings,Burton Smith,Brian D. Koblenz +5 more
TL;DR: The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.
Journal ArticleDOI
Computation of elementary functions on the IBM RISC System/6000 processor
TL;DR: New results are obtained which avoid the necessity of doing special testing to get the last bit rounded correctly in accordance with all of the IEEE rounding modes in the case of division and square root.
Journal ArticleDOI
Second-generation RISC floating point with multiply-add fused
TL;DR: Improved design techniques for logarithmic addition and higher order counters for multiplication complete this second-generation RISC floating-point unit design, and it allows for reduced overall system latency.