scispace - formally typeset
Search or ask a question
Author

J. Arjun Prabhu

Bio: J. Arjun Prabhu is an academic researcher from Sun Microsystems. The author has contributed to research in topics: Carry (arithmetic) & Quotient. The author has an hindex of 8, co-authored 9 publications receiving 215 citations.

Papers
More filters
Patent
15 May 1996
TL;DR: In this paper, an enhanced quotient digit selection function was proposed to prevent the working partial remainder from becoming negative if the result is exact, choosing a quotient of zero instead of a quantifier of one when the actual partial remainder is zero, which provides one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit.
Abstract: Quotient digit selection logic is modified so as to prevent a partial remainder equal to the negative divisor from occurring. An enhanced quotient digit selection function prevents the working partial remainder from becoming negative if the result is exact, choosing a quotient digit of zero instead of a quotient digit of one when the actual partial remainder is zero. Using a five bit estimated partial remainder where the upper four bits are zero, a possible carry propagation into fourth most significant bit is detected. This can be accomplished by looking at the fifth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of zero is chosen. This provides a one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit. Extra hardware is eliminated because it is no longer necessary to provide any extra mechanism for restoring the preliminary final partial remainder. Latency is improved because no additional cycle time is required to restore negative preliminary partial remainders. In an alternative embodiment, where the upper four bits of the estimated partial remainder are ones while the fifth most significant bit is zero, a quotient digit of negative one is chosen. This alternative embodiment allows correct exact results in all rounding modes including rounding toward plus or minus infinity.

69 citations

Patent
J. Arjun Prabhu1
16 Aug 1999
TL;DR: In this article, a first merge instruction copies the first and second single precision operands from respective first-and second-rows of the re-order buffer into first/second portions of a fifth row of the Reorder buffer, and then concatenates the third and fourth single-precision operands to represent a second double precision operand.
Abstract: Where it is desired to perform a double precision operation using single precision operands, first and second single precision operands are loaded into first and second respective rows of a re-order buffer, and third and fourth single precision operands are loaded into third and fourth respective rows of the re-order buffer. A first merge instruction copies the first and second single precision operands from respective first and second rows of the re-order buffer into first and second portions of a fifth row of the re-order buffer, thereby concatenating the first and second single precision operands to represent a first double precision operand. A second merge instruction copies the third and fourth single precision operands from respective third and fourth rows of the re-order buffer into first and second portions of a sixth row of the re-order buffer, thereby concatenating the third and fourth single precision operands to represent a second double precision operand. The first and second double precision operands stored in the fifth and sixth rows, respectively, of the re-order buffer are then provided directly to an associated FPU for execution.

39 citations

Patent
11 Sep 1997
TL;DR: In this article, a carry save adders speculatively compute the possible resulting partial remainders corresponding to each possible value, -1, 0, and + 1, of the quotient digit by adding the divisor, not adding anything, and adding the two's complement of the di erents, respectively, thus shortening the critical path of a single SRT iteration.
Abstract: In hardware SRT division and square root mantissa units maximal quotient selection overlapping for three quotient digits per cycle are used. An effective radix-8 implementation cascades three partial remainder computation circuits and overlaps three quotient selection circuits. Two carry save adders speculatively compute the possible resulting partial remainders corresponding to each possible value, -1, 0, and +1, of the quotient digit by adding the divisor, not adding anything, and adding the two's complement of the divisor, respectively, thus shortening the critical path of a single SRT iteration producing a single quotient digit. The propagation delays of two carry save adders which speculatively compute the possible resulting partial remainders are masked by a longer delay through quotient selection logic.

27 citations

Patent
05 Jul 1995
TL;DR: In this article, an enhanced quotient digit selection function was proposed to prevent the working partial remainder from becoming negative if the result is exact, and an optimized five-level circuit was shown which implements the enhanced quotients selection function.
Abstract: Quotient digit selection logic is modified so as to prevent a partial remainder equal to the negative divisor from occurring. An enhanced quotient digit selection function prevents the working partial remainder from becoming negative if the result is exact. The enhanced quotient digit selection logic chooses a quotient digit of zero instead of a quotient digit of one when the actual partial remainder is zero. Using a five bit estimated partial remainder where the upper four bits are zero, a possible carry propagation into fourth most significant bit is detected. This can be accomplished by looking at the fifth most significant sum and carry bits of the redundant partial remainder. If they are both zero, then a carry propagation out of that bit position into the least significant position of the estimated partial remainder is not possible, and a quotient digit of zero is chosen. In the alternative case in which one or both of the fifth most significant carry or sum bits of the redundant partial remainder are ones, a quotient digit of one is chosen. This provides a one cycle savings since negative partial remainders no longer need to be restored before calculating the sticky bit. Extra hardware is eliminated because it is no longer necessary to provide any extra mechanism for restoring the preliminary final partial remainder. Latency is improved because no additional cycle time is required to restore negative preliminary partial remainders. An optimized five-level circuit is shown which implements the enhanced quotient selection function.

22 citations

Patent
13 Aug 1999
TL;DR: In this paper, a method, apparatus, and computer program product for handling IEEE 754 standard exceptions for Single Instruction Multiple Data (SIMD) instructions is presented. But it does not specify the actual sub-operation(s) causing the exception.
Abstract: A method, apparatus, and computer program product for handling IEEE 754 standard exceptions for Single Instruction Multiple Data (SIMD) instructions. Each SIMD sub-operation's corresponding IEEE 754 exception flag is bit-wise “ORed” with an accrued exception field if a trap enable mask field is configured to mask the exception, with the “ORed” result written back in the accrued exception field. If the trap enable mask field is configured to enable the exception, the accrued exception field and a current exception field are cleared, and an unfinished floating-point exception flag is set in a floating-point trap type field. The actual sub-operation(s) causing the exception is determined through software.

18 citations


Cited by
More filters
Patent
27 Feb 2002
TL;DR: In this paper, a display has a first array of pixels connected to emit individually-controlled amounts of light onto a second array of RGB pixels, and the second array is connected to pass individually-controllable portions of the light.
Abstract: A display has a first array of pixels connected to emit individually-controlled amounts of light onto a second array of pixels. The second array of pixels is connected to pass individually-controllable portions of the light. The display has a controller connected to control the pixels of the first and second arrays of pixels. The controller is configured to control the pixels of the first array of pixels to provide on the second array of pixels an approximation of an image specified by image data; determine a difference between the image and the approximation of the image at individual pixels of the second array of pixels; and control the pixels of the second array of pixels to modulate the approximation of the image according to the corresponding determined differences.

356 citations

Patent
Jeffry E. Gonion1
17 Nov 2011
TL;DR: In this article, a macroscalar processor architecture is described, where a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel.
Abstract: A macroscalar processor architecture is described herein. In one embodiment, a processor receives instructions of a program loop having a vector block and a sequence block intended to be executed after the vector block, where the processor includes multiple slices and each of the slices is capable of executing an instruction of an iteration of the program loop substantially in parallel. For each iteration of the program loop, the processor executes an instruction of the sequence block using one of the slices while executing instructions of the vector block using a remainder of the slices substantially in parallel. Other methods and apparatuses are also described.

219 citations

Patent
23 Dec 1997
TL;DR: A very long instruction word (VLIW) processor as discussed by the authors exploits program level parallelism as well as Instruction Level Parallelism (ILP) by providing new instruction level mechanisms to separate processor execution into parallel threads.
Abstract: A very long instruction word (VLIW) processor exploits program level parallelism as well as instruction level parallelism. Unlike prior VLIW machines which obtain speed advantages using instruction level parallelism, the present processor exploits the parallelism inherent in a VLIW processor by providing new instruction level mechanisms to separate processor execution into parallel threads. This separation allows greater hardware use because more than one program can exploit instruction level parallelism on the system at the same time. A first program and a second program execute concurrently such that the second program executes using resources and cycles that would have been wasted by the first program. This construct is especially useful where the second program is an interrupt service routine because the interrupt service routine can be threaded through the machine with high or low priority while the functional units still process the first program stream. A superscalar version of the processor is also described.

211 citations

Patent
12 Jul 2004

138 citations