scispace - formally typeset
Proceedings ArticleDOI

Integer division using reciprocals

R. Alverson
- pp 186-190
TLDR
The author describes the design decisions made when designing integer division for a new 64-b machine, and proposes a fast and economical scheme for computing both unsigned and signed integer quotients which guarantees an exact answer without any correction.
Abstract
By using a reciprocal approximation, integer division can be synthesized from a multiply followed by a shift. Without carefully selecting the reciprocal, however, the quotient obtained often suffers from off-by-one errors, requiring a correction step. The author describes the design decisions made when designing integer division for a new 64-b machine. The result is a fast and economical scheme for computing both unsigned and signed integer quotients which guarantees an exact answer without any correction. The reciprocal computation is fast enough, with one table lookup and five multiplies, so that this scheme is competitive with a dedicated divider, while requiring much less hardware specific to division. The real strength of the proposed method is division by a constant, which takes only a single multiply and shift, one operation on the machine considered. The analysis shows that the computed quotient is always exact: no adjustment or correction is necessary. >

read more

Citations
More filters
Book

Hacker's Delight

TL;DR: The term "hacker" in the title is meant in the originalsense of an aficionado of computers—someone who enjoys making computers do new things, or do old things in a new and clever way.
Patent

Symmetric multiprocessor operating system for execution on non-independent lightweight thread contexts

TL;DR: In this article, a multiprocessing system with a plurality of thread contexts (TCs) and a virtual processing element (VPE) is described, where the interrupt requests are non-specific to the plurality of TCs, and the VPE is configured to select a non-exempt one of the plurality to service each of the interrupted requests.
Proceedings ArticleDOI

Division by invariant integers using multiplication

TL;DR: This paper presents code sequences for division by arbitrary nonzero integer constants and run-time invariants using integer multiplication using a two's complement architecture, and treats unsigned division, signed division, and division where the result is known a priori.
Patent

Integrated mechanism for suspension and deallocation of computational threads of execution in a processor

TL;DR: In this paper, a yield instruction for execution in a multithreaded microprocessor is disclosed and conditionally reschedules the thread based on the qualifier inputs and bit vector values, if the operand is zero or positive integer, the microprocessor terminates the program thread including the yield instruction.
Patent

Apparatus, method and instruction for initiation of concurrent instruction streams in a multithreading microprocessor

TL;DR: In this article, the fork instruction for a multithreaded microprocessor and occupying a single instruction issue slot is disclosed. The fork instruction, executing in a parent thread, includes a first operand specifying the initial instruction address of a new thread and a second operand.
References
More filters
Proceedings ArticleDOI

The Tera computer system

TL;DR: The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.
Proceedings Article

Tera computer system

TL;DR: The Tera architecture was designed with several goals in mind; it needed to be suitable for very high speed implementations, i.
Journal ArticleDOI

Computation of elementary functions on the IBM RISC System/6000 processor

TL;DR: New results are obtained which avoid the necessity of doing special testing to get the last bit rounded correctly in accordance with all of the IEEE rounding modes in the case of division and square root.
Journal ArticleDOI

Second-generation RISC floating point with multiply-add fused

TL;DR: Improved design techniques for logarithmic addition and higher order counters for multiplication complete this second-generation RISC floating-point unit design, and it allows for reduced overall system latency.