scispace - formally typeset
Search or ask a question

Showing papers on "Arithmetic logic unit published in 2003"


Proceedings ArticleDOI
25 May 2003
TL;DR: A novel design of a 1-bit full adder cell featuring a hybrid CMOS logic style is proposed, which is very power efficient and has lower power-delay product over a wide range of voltages.
Abstract: A novel design of a 1-bit full adder cell featuring a hybrid CMOS logic style is proposed. The simultaneous generation of XOR and XNOR outputs by pass logic is advantageously exploited in a novel complementary CMOS stage to produce full-swing and balanced outputs so that adder cells can be cascaded without buffer insertion. The increase in transistor count of the complementary CMOS stage is compensated by its reduction in layout complexity. Comparing with other 1-bit adder cells using different but uniform logic styles, simulation results show that it is very power efficient and has lower power-delay product over a wide range of voltages.

147 citations


Journal ArticleDOI
01 Feb 2003
TL;DR: This work proposes efficient self-checking implementations valid for all existing adder and arithmetic and logic unit (ALU) schemes (e.g., ripple carry, carry lookahead, skip carry schemes) that are substantially better than any other known solution.
Abstract: In this paper, we present efficient self-checking implementations valid for all existing adder and arithmetic and logic unit (ALU) schemes (e.g., ripple carry, carry lookahead, skip carry schemes). Among all the known self-checking adder and ALU designs, the parity prediction scheme has the advantage that it requires the minimum hardware overhead for the adder/ALU and the minimum hardware overhead for the other data-path blocks. It also has the advantage to be compatible with memory systems checked by parity codes. The drawback of this scheme is that it is not fault secure for single faults. The scheme proposed in this work has all the advantages of the parity prediction scheme. In addition, the new scheme is totally self-checking for single faults. Thus, the new scheme is substantially better than any other known solution.

129 citations


Journal ArticleDOI
09 Feb 2003
TL;DR: A shared N-well, dual-supply-voltage 64b ALU module in 0.18/spl mu/m, 1.8V 1P 5M CMOS technology operates at 1.16GHz on a 9mm/sup 2/ die for a target delay increase of 2.8%, energy savings are 25.3% using dual supplies.
Abstract: A shared n-well layout technique is developed for the design of dual-supply-voltage logic blocks. It is demonstrated on a design of a 64-bit arithmetic logic unit (ALU) module in domino logic. The second supply voltage is used to lower the power of noncritical paths in the sparse, radix-4 64-bit carry-lookahead adder and in the loopback bus. A 3 mm/sup 2/ test chip in 0.18-/spl mu/m 1.8-V five-metal with local interconnect CMOS technology that contains six ALUs and test circuitry operates at 1.16 GHz at the nominal supply. For target delay increase of 2.8% energy savings are 25.3% using dual supplies, while for 8.3% increase in delay, 33.3% can be saved.

49 citations


Proceedings ArticleDOI
15 Dec 2003
TL;DR: This work presents low cost FPGA floating-point arithmetic circuits for all the common operations, i.e. addition/subtraction, multiplication, division and square root, and considers the implementation of 64-bit double precision circuits that also provide rounding and exception handling.
Abstract: We present low cost FPGA floating-point arithmetic circuits for all the common operations, i.e. addition/subtraction, multiplication, division and square root. Such circuits can be extremely useful in the FPGA implementation of complex systems that benefit from the reprogrammability and parallelism of the FPGA device but also require a general purpose arithmetic unit. While previous work has considered circuits for low precision floating-point formats, we consider the implementation of 64-bit double precision circuits that also provide rounding and exception handling.

44 citations


Journal Article
TL;DR: In this article, a number of 4-bit, 8-operation arithmetic logic units (ALUs) are designed using the delay-insensitive NULL convention logic paradigm, and are characterized in terms of speed and area.
Abstract: In this paper, a number of 4-bit, 8-operation arithmetic logic units (ALUs) are designed using the delay-insensitive NULL convention logic paradigm, and are characterized in terms of speed and area. Both dual-rail and quad-rail, pipelined and non-pipelined versions are developed, and the tradeoffs and design considerations for each are discussed. Comparing the various architectures shows that the fastest dual-rail and quad-rail ALUs achieve average speedups of 1.72 and 1.59, respectively, over their non-pipelined counterparts, while requiring 133% and 119% more area, respectively. Overall, the dual-rail designs are both faster and require less area than their respective quad-rail counterparts; however, the quad-rail versions are expected to consume less power.

20 citations


Journal ArticleDOI
TL;DR: A VLSI design rule, namely, an embedded instruction code (EIC), for the discrete wavelet transform (DWT), derived from the essential computations of DWT, and a parallel arithmetic logic unit (PALU) with two multipliers and four adders, called 2M4A is proposed.
Abstract: The paper presents a VLSI design rule, namely, an embedded instruction code (EIC), for the discrete wavelet transform (DWT). Our approach derives from the essential computations of DWT, and we establish a set of multiplication instructions, MUL, and the addition instruction, ADD. In addition, we propose a parallel arithmetic logic unit (PALU) with two multipliers and four adders, called 2M4A. With these requirements, the DWT computation paths can be calculated more efficiently with limited PALUs. Furthermore, since the EIC is operated under the PALU, the number of needed inner registers depends on the wavelet filters' length. Besides, the boundary problem of DWT has also been resolved by the symmetric extension. Moreover, the two-dimensional inverse DWT (2D IDWT) can be completed using the same PALU as for 2D DWT; the only changes needed to be made are the instruction codes and coefficients. Our chip supports up to six levels of decomposition and versatile image specifications, e.g., VGA, MPEG-1, MPEG-2 and 1024/spl times/1024 image sizes.

18 citations


Patent
Martin Langhammer1, Nitin Prasad1
22 Jul 2003
TL;DR: In this paper, a function-specific block (FSB) is coupled to a subset of the logic regions to reduce the impact of use of the FSB on the general purpose interconnection resources.
Abstract: A programmable logic integrated circuit device has at least one function-specific circuit block (e.g., a parallel multiplier, a parallel barrel shifter, a parallel arithmetic logic unit, etc.) in addition to the usual multiple regions of programmable logic and the usual programmable interconnection circuit resources. To reduce the impact of use of the function-specific block (“FSB”) on the general purpose interconnection resources of the device, inputs and/or outputs of the FSB may be coupled relatively directly to a subset of the logic regions. In addition to conserving general purpose interconnect, resources of the logic regions to which the FSB are connected can be used by the FSB to reduce the amount of circuitry that must be dedicated to the FSB. If the FSB is a multiplier, additional features include facilitating accumulation of successive multiplier outputs (using either addition or subtraction and with sign extension if desired) and/or arithmetically combining the outputs of multiple multipliers.

12 citations



Patent
David Lewis1
24 Nov 2003
TL;DR: A logic device logic module includes multi-stage combinational logic circuitry (e.g., a four-input look-up table) into which EXCLUSIVE OR (“XOR”) circuitry is interposed to give the logic module arithmetic capabilities as mentioned in this paper.
Abstract: A logic device logic module includes multi-stage combinational logic circuitry (e.g., a four-input look-up table) into which EXCLUSIVE OR (“XOR”) circuitry is interposed to give the logic module arithmetic as well as combinational logic capabilities. The XOR circuitry is used to help form an arithmetic sum output signal (as an alternative to a combinational logic output signal) when arithmetic mode operation is desired. The logic module is also augmented with circuitry for providing a carry out signal in arithmetic mode. The logic module can perform such arithmetic operations as one digit or bit of binary addition, subtraction, or multiplication. In all cases a carry in signal is taken into account; and in the case of multiplication, a digit from another partial product or summation of other partial products is also taken into account.

11 citations


Journal ArticleDOI
TL;DR: In this paper, a superconducting 1-bit arithmetic logic unit (ALU) with the simplest function of AND, OR, ADD (addition), and SUB (subtraction) was proposed.
Abstract: In order to develop superconducting Digital Signal Processors (DSP's), we have been studying a superconducting 1-bit Arithmetic Logic Unit (ALU). This ALU has the simplest function of AND, OR, ADD (addition), and SUB (subtraction). The ALU operates in a 3-stage pipeline. All logic functions such as AND, OR, and SUM (summation) can be executed within a single stage of the pipeline. In order to achieve the high-speed operation of the ALU, we proposed and designed a novel 3-input XOR gate, which can operate in only one logic stage. Our simulation study showed that all components of the ALU can operate up to 50 GHz. These ALU components were fabricated and tested at low speed. Large bias margins of more than /spl plusmn/37% were achieved. The designed ALU's were laid out and fabricated with an Nb process. The ALU occupied the area of 1200 /spl mu/m /spl times/ 2600 /spl mu/m, which contains 560 Josephson junctions (JJ's).

11 citations


Patent
28 Feb 2003
TL;DR: In this article, a data processor has sixteen processing elements that each include a register file and an arithmetic logic unit, and a network unit connects between the register files of the processing elements and the arithmetic logic units of processing elements.
Abstract: A data processor has sixteen processing elements that each include a register file and an arithmetic logic unit. A network unit connects between the register files of the processing elements and the arithmetic logic units of the processing elements. The network unit has a selector for simultaneously performing a plurality of data transfers which are each made from a register file of one processing element to an operation unit of another processing element. With the provision of this selector that can perform such simultaneous data transfers, the processing efficiency of the processing elements can be maintained even if a change occurs in operand assignments and the like.

Proceedings ArticleDOI
25 May 2003
TL;DR: A 173-bit (m = 173) Type II Optimal Normal Basis (ONBII) representation is chosen in the implementation of the Galois Field GF(2/sup m/) arithmetic logic unit by asynchronous architecture and is especially aimed at low power consumption by reducing the switching activities in the latches.
Abstract: Elliptic curve cryptography is becoming popular in recent decades due to its high security strength per bit, less memory resources and low processing power which makes it attractive for application in energy constraint applications such as contact-less smart cards. In this paper, a 173-bit (m = 173) Type II Optimal Normal Basis (ONBII) representation is chosen in the implementation of the Galois Field GF(2/sup m/) arithmetic logic unit by asynchronous architecture. This proposed architecture uses the advantages of asynchronous properties and is especially aimed at low power consumption by reducing the switching activities in the latches, reducing the number of cycles to complete each multiplication process and reducing the number of squaring operations in each inversion process. The simulation results show that the resulting ALU consumes only 110.8 nW in 780 ns to complete each multiplication operation.

Patent
15 Aug 2003
TL;DR: In this paper, the arithmetic unit has a memory unit for storing and loading data and arithmetic units for performing arithmetic operations on the data, which are controlled by an arithmetic controller and an interface controller for managing communications between an arithmetic unit and a host processor.
Abstract: A crypto-engine for cryptographic processing has an arithmetic unit and an interface controller for managing communications between the arithmetic unit and a host processor. The arithmetic unit has a memory unit for storing and loading data and arithmetic units for performing arithmetic operations on the data. The memory and arithmetic units are controlled by an arithmetic controller.

Patent
Jon Skull1
01 Apr 2003
TL;DR: In this paper, a single-cycle arithmetic logic unit (ALU) was proposed, which combines the operation of a single cycle ALU with the processing speed of a pipelined ALU.
Abstract: Methods and apparatus for improving the efficiency of an arithmetic logic unit (ALU) are provided. The ALU of the invention combines the operation of a single-cycle ALU with the processing speed of a pipelined ALU. Arithmetic operations are performed in two stages: a first stage that produces separate sum and carry results in a first cycle, and a second stage that produces a final result in one or more immediately subsequent cycles. While this produces final results in two or more clock cycles, useable partial results are produced each cycle, thus maintaining a one operation per clock cycle throughput.

Proceedings ArticleDOI
21 Jan 2003
TL;DR: This paper proposes two optimization methods for telescopic arithmetic unit based methodology, one of representative methodologies to design synchronous control units for variable delay datapaths, and analyzes their performance improvement effects explicitly.
Abstract: Nowadays, variable delay arithmetic units have been used for implementing a datapath of a target system in pursuit of performance improvement. However, the adoption of variable delay arithmetic units requires modification of a typical synchronous control unit design methodology. A telescopic arithmetic unit based methodology is one of the representative methodologies used to design synchronous control units for variable delay datapaths. In this paper, we propose two optimization methods for it. The proposed optimization techniques are analyzed in order to show their performance improvement effects explicitly.

Proceedings ArticleDOI
27 Dec 2003
TL;DR: An arithmetic logic unit for elliptic curve cryptosystems over GF (2/sup m/) with the inclusion of a hardware inversion operation, which is as fast as a multiplication, is presented.
Abstract: In this paper the authors presented an arithmetic logic unit (ALU) for elliptic curve cryptosystems over GF (2/sup m/). The novelty of these ALU is the inclusion of a hardware inversion operation, which is as fast as a multiplication. So faster algorithms can be used for the computation of kP. Although a serial multiplication and inversion was used, a faster computation than other parallelized hardware implementations was achieved.

Patent
11 Jun 2003
TL;DR: In this paper, a processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder, which is operably coupled to interpret the instruction.
Abstract: A processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder. The instruction memory temporarily stores an instruction that includes at least one of: an operational code, destination information, and source information. The instruction decoder is operably coupled to interpret the instruction to identify the arithmetic logic unit and/or the finite field arithmetic unit to perform the operational code of the corresponding instruction. The instruction decoder then identifies at least one destination location within the digital storage device based on the destination information contained within the corresponding instruction. The instruction decoder then identifies at least one source location within the digital storage device based on the source information of the corresponding instruction. When the finite field arithmetic unit is to perform the operational code, it performs a finite field arithmetic function upon data stored in the at least one source location in accordance with the operational code and provides the resultant to the destination location.

Patent
16 Jan 2003
TL;DR: In this article, the arithmetic processing unit is provided with a data memory storing a plurality of floating point numbers, an interval setting part 13 acquiring two or more floating point number from the data memory 12 and setting an upper limit and a lower limit within a range including a true value in regard to each of the acquired two-or-more floating-point number, and an ALU 14 carrying out a predetermined arithmetic process from the upper-limit and lower-limit of each of these numbers set by the interval-set part.
Abstract: PROBLEM TO BE SOLVED: To provide an arithmetic processing unit for reducing processing loads for guaranteeing correctness, and improving arithmetic speed. SOLUTION: The arithmetic processing unit is provided with a data memory 12 storing a plurality of floating point numbers, an interval setting part 13 acquiring two or more floating point numbers from the data memory 12 and setting an upper limit and a lower limit within a range including a true value in regard to each of the acquired two or more floating point number, and an ALU 14 carrying out a predetermined arithmetic process from the upper limit and lower limit of each of the two or more floating point numbers set by the interval setting part. By the interval setting part 13 exclusively carrying out range setting including the true values and the ALU 14 exclusively carrying out interval arithmetic, the processing load for the guarantee of accuracy can be reduced more than conventional mechanical interval arithmetic by software, and the arithmetic speed is improved. COPYRIGHT: (C)2004,JPO

Patent
11 Jun 2003
TL;DR: In this article, a processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder, which is operably coupled to interpret the instruction.
Abstract: A processor includes an instruction memory, arithmetic logic unit, finite field arithmetic unit, at least one digital storage device, and an instruction decoder. The instruction memory temporarily stores an instruction that includes at least one of: an operational code, destination information, and source information. The instruction decoder is operably coupled to interpret the instruction to identify the arithmetic logic unit and/or the finite field arithmetic unit to perform the operational code of the corresponding instruction. The instruction decoder then identifies at least one destination location within the digital storage device based on the destination information contained within the corresponding instruction. The instruction decoder then identifies at least one source location within the digital storage device based on the source information of the corresponding instruction. When the finite field arithmetic unit is to perform the operational code, it performs a finite field arithmetic function upon data stored in the at least one source location in accordance with the operational code and provides the resultant to the destination location.

Patent
10 Jan 2003
TL;DR: In this article, an arithmetic unit for carrying out an arithmetic operation with at least two coded operands, comprising an arithmetic-logic unit provided with a first input (12) for the first coded operand (ak), a second input (14), a third input (16), and an output (18) for a coded result of the operation.
Abstract: Disclosed is an arithmetic unit for carrying out an arithmetic operation with at least two coded operands, comprising an arithmetic-logic unit provided with a first input (12) for the first coded operand (ak), a second input (14) for the second coded operand (bk), a third input (16) for a coding parameter (k), and an output (18) for a coded result of the operation. The arithmetic-logic unit (10) is configured such that the first input (12), the second input (14), and the third input (16) are linked by means of arithmetic sub-operations while taking into account the manner in which the operands are coded in such a way that a coded result is obtained at the output, which is equal to the variable that would be obtained if the first and second operands were subjected to the arithmetic operation in an uncoded state and an obtained result was then coded without the operands being decoded in the arithmetic-logic unit (10). In that manner, a processor system can be obtained in which no data at all is displayed in plain text, i.e. in an uncoded form, because no decoding is required before an arithmetic-logic unit and no coding is required thereafter because the arithmetic-logic unit operates with coded input operands so as to obtain a coded result, whereby the transmission lines of the arithmetic unit cannot be tapped.

Patent
Paul Metzgen1
10 Jan 2003
TL;DR: In this article, an arithmetic logic unit with a minimum number of routing delays is proposed. But it is not shown how to construct such a logic unit, and it cannot be used to perform arithmetic operations on the operands of the logic unit.
Abstract: An arithmetic logic unit is provided. The arithmetic logic unit preferably includes a minimum of routing delays. An arithmetic logic unit according to the invention preferably receives a plurality of operands from a plurality of operand registers, performs an arithmetic operation on the operands, obtains a result of the arithmetic operation and that transmits the result to a result register. The arithmetic logic unit includes a signal propagation path that includes no greater than two routing paths that connect non-immediately adjacent logic elements.

Patent
28 Feb 2003
TL;DR: In this paper, a data processor has sixteen processing elements that each include a register file and an arithmetic logic unit, and a network unit connects between the register files of the processing elements and the arithmetic logic units of processing elements.
Abstract: A data processor has sixteen processing elements that each include a register file and an arithmetic logic unit. A network unit connects between the register files of the processing elements and the arithmetic logic units of the processing elements. The network unit has a selector for simultaneously performing a plurality of data transfers which are each made from a register file of one processing element to an operation unit of another processing element. With the provision of this selector that can perform such simultaneous data transfers, the processing efficiency of the processing elements can be maintained even if a change occurs in operand assignments and the like.

Patent
26 Mar 2003
TL;DR: In this paper, a high speed information safety processor of the utility model consisting of an embedded central processing unit, a soft password engine, an internal bus, a data transceiver, a percutaneous coronary intervention/personal computer memory card international association bus interface, a control path and a data path.
Abstract: A high speed information safety processor of the utility model comprises an embedded central processing unit, a soft password engine, an internal bus, a data transceiver, a percutaneous coronary intervention/personal computer memory card international association bus interface, a control path and a data path. The soft password engine comprises a reconfigurable cipher arithmetic logic unit, a standard password arithmetic logic unit, a random number generator which is connected to the internal bus, a packet exploder, a password control register, an input queuing and output queuing. The high speed information safety processor dose not need to get through extra conversion circuit and supports various network protocols. The user can realize the self-definition password arithmetic logic unit of the utility model in a software programming way. The utility model has the advantages that the application mode is flexible, the cryptographic algorithm has more particularity and the security is more easily to realize.

Patent
25 Sep 2003
TL;DR: In this paper, the problem of providing an arithmetic unit and an encryption/decoding arithmetic unit for making common a part of a plurality of arithmetic processing including matrix operations, and for performing the partial matrix operations in parallel to realize a high speed operation is addressed.
Abstract: PROBLEM TO BE SOLVED: To provide an arithmetic unit and an encryption/decoding arithmetic unit for making common a part of a plurality of arithmetic processing including matrix operations, and for performing the partial matrix operations in parallel to realize a high speed operation. SOLUTION: This arithmetic unit for performing the arithmetic processing of both first arithmetic processing including a first matrix operation and second arithmetic processing including a second matrix operation is provided with a first arithmetic part 41 for performing the second matrix operation, at least one or more other arithmetic parts 42 for performing matrix operations in parallel with the first arithmetic part for performing the first matrix operation and a logic circuit 46 for performing the logical operation of each of arithmetic results between the first arithmetic part and the other arithmetic parts. Then, when the arithmetic result of the first matrix operation is requested, the arithmetic unit obtains it from the logic circuit 46. COPYRIGHT: (C)2005,JPO&NCIPI

Patent
07 Nov 2003
TL;DR: In this article, a vector unit includes a vector register file having a primary vector register and a secondary vector register, and a second input multiplexer, independent of the first input multiple-xer enabling data on the second data bus to be provided to the first data bus.
Abstract: A microprocessor includes a branch unit, a load/store unit (LSU), an arithmetic logic unit (ALU), and a vector unit to execute a vector instruction. The vector unit includes a vector register file having a primary vector register and a secondary vector register. The processor preferably further includes a first data bus and a second data bus wherein the first and second data busses couple the vector unit to the data memory. The vector unit includes a first input multiplexer enabling data on the first data bus to be provided to the primary register file or the secondary register file and a second input multiplexer, independent of the first input multiplexer enabling data on the second data bus to be provided to the second data bus. The first and second data busses may comprise first and second portions of a data memory bus.

Patent
01 Aug 2003
TL;DR: In this article, an arithmetic apparatus and an arithmetic method capable of executing arithmetic by reconfigurable hardware, shortening a processing time of arithmetic including conditional branches causing a heavy processing load and improving a processing speed even when conditional branches exist in a loop of performing repeating arithmetic processing.
Abstract: An arithmetic apparatus and an arithmetic method capable of executing arithmetic by reconfigurable hardware, shortening a processing time of arithmetic including conditional branches causing a heavy processing load and improving a processing speed even when conditional branches exist in a loop of performing repeating arithmetic processing, wherein arithmetic processing including conditional branches is divided to first processing of unconditional branches and second processing with conditional branches, the first processing of unconditional branches is assigned to reconfigurable arithmetic means, configuration information of hardware is generated based on the first processing, the first processing is executed by the reconfigured arithmetic means based on the configuration information, the second processing with conditional branches is assigned to a CPU or other arithmetic means, the assigned second processing with conditional branches is executed by the CPU, and a result of the processing is used for correcting a result of said first processing, so that a result of arithmetic processing including conditional branches is obtained.

Proceedings ArticleDOI
14 Oct 2003
TL;DR: This work discusses multimedia systems operating in different scenarios, such as wireless communication and image manipulation, and verifies that floating-point operations are simulated by real systems using different formats.
Abstract: The IEEE defined a standard for floating-point arithmetic used by processing systems (ANSI/IEEE Std 754-1985). This directive encodes floating point numbers using a maximum of 64 bits: 23 bits of fractional in single precision format and 52 bits of fractional in double precision format. The new multimedia terminals require low-power applications; the most important floating-point units (adders and multipliers) represent a significant part of total power wasted by a modern system-on-chip. They might dissipate less power by using a reduced format representation. To verify this possibility, floating-point operations are simulated by real systems using different formats. We discuss multimedia systems operating in different scenarios, such as wireless communication and image manipulation.

Patent
23 Apr 2003
TL;DR: In this article, a first data storage unit, a two-dimensional arithmetic unit, and a main control unit are used to store data in an arithmetic processing apparatus, which includes an input address calculation unit which calculates the addresses of a set of input data necessary for a designated type of operation.
Abstract: An arithmetic processing apparatus includes a first data storage unit, two-dimensional arithmetic unit, and main control unit. The first data storage unit stores data to be processed. The two-dimensional arithmetic unit performs two-dimensional operation. The main control unit controls the two-dimensional arithmetic unit. The two-dimensional arithmetic unit includes an input address calculation unit which calculates the addresses of a set of input data necessary for a designated type of operation in the first data storage unit in accordance with an execution start instruction which designates the type of operation and a parameter from the main control unit, and an arithmetic execution unit which performs the designated type of operation for the set of input data which are stored at the calculated addresses in the first data storage unit.

01 Jan 2003
TL;DR: In this paper, the authors acknowledge certain people who have encouraged, supported and helped me complete my thesis at LSU and extend a special thank-you to Srinivas, my constant companion and beloved friend.
Abstract: ii To My parents and in loving memory of my grandmother iii Acknowledgements I would like to acknowledge certain people who have encouraged, supported and helped me complete my thesis at LSU. I am very grateful to my advisor Dr. A. Srivastava for his guidance, patience and understanding throughout this work. His suggestions, discussions and constant encouragement has helped me to get a deep insight in the field of VLSI design. I would like to thank Dr. M. Feldman and Dr S. Kak for sparing their time to be a part of my thesis advisory committee. I am very thankful to Electrical Engineering Department for supporting me financially during my stay at LSU. I take this opportunity to thank my friends Harish, Kavitha, Sajida, Sunitha and Anand for their help and encouragement. I would also like to thank all my friends here who made my stay at LSU an enjoyable and a memorable one. I extend a special thank-you to Srinivas, my constant companion and beloved friend. Last of all I thank the Almighty Lord for keeping me in good health and spirits throughout my stay at LSU.

Patent
David A. Luick1
05 Jun 2003
TL;DR: In this article, a high-frequency compound instruction mechanism and method is proposed to reduce the number of cycles to perform the add-compare function by using a branch mispredict signal.
Abstract: A high-frequency compound instruction mechanism and method allows performing a common compare immediate function before an add function has completed, thereby reducing the number of cycles to perform the add-compare function. By increasing the speed of performing the add-compare function, a branch mispredict signal may be provided to an instruction pipeline before data registers are affected by the pipelined instructions. The compound instruction mechanism of the preferred embodiments may be implemented within space that is primarily unused within arithmetic logic units, resulting in an implementation that only marginally increases space requirements on an integrated circuit.