scispace - formally typeset
Search or ask a question

Showing papers on "Arithmetic logic unit published in 2012"


Journal ArticleDOI
TL;DR: The ALU design is based on a Kogge-Stone adder and employs an asynchronous wave-pipelined approach scalable for wide datapath processors, and chip design and high-speed test results for the 8-bit ALU circuit are presented.

100 citations


Journal ArticleDOI
TL;DR: In this article, a spin transfer torque (STT)-based switching has been proposed to accelerate the development of the MTJ-based programmable logic devices for future reconfigurable and nonvolatile computation devices and systems.
Abstract: Magnetic tunneling junction (MTJ)-based programmable logic devices have been proposed and studied for future reconfigurable and nonvolatile computation devices and systems. Spin transfer torque (STT)-based switching has advantages in device scaling compared to the field-switching mechanism. However, the previously proposed MTJ logic devices have operated independently and, therefore, are limited to only basic logic operations. Consequently, the MTJ device has only been used as an ancillary device, rather than the main computation device. As a result, the full benefits of MTJ-based computation have not been explored. New designs are needed to accelerate the development of the MTJ-based logic devices. Specifically the realization of direct communication between the MTJ devices is crucial to fully utilize the MTJ devices in the circuits to implement more advanced logic functions. In this paper, new MTJ-based spintronic logic units (building blocks) for spintronic circuits using the STT switching mechanism have been proposed and investigated, which includes the designs of a basic STT-MTJ logic cell, a direct communication between the MTJ logic cells, a three-MTJ logic unit and a spintronic logic circuit acting as an arithmetic logic unit.

82 citations


Journal ArticleDOI
TL;DR: This work is communicating the trinary arithmetic and logic unit (TALU) in modified trinary number (MTN) system, which is suitable for the optical computation and other applications in multivalued logic system.
Abstract: Arithmetic logic unit (ALU) is the most important unit in any computing system. Optical computing is becoming popular day-by-day because of its ultrahigh processing speed and huge data handling capability. Obviously for the fast processing we need the optical TALU compatible with the multivalued logic. In this regard we are communicating the trinary arithmetic and logic unit (TALU) in modified trinary number (MTN) system, which is suitable for the optical computation and other applications in multivalued logic system. Here the savart plate and spatial light modulator (SLM) based optoelectronic circuits have been used to exploit the optical tree architecture (OTA) in optical interconnection network.

32 citations


Proceedings ArticleDOI
01 Dec 2012
TL;DR: A new tree multiplication structure based architecture is proposed to design this Vedic multiplier and it is found that Vedic Urdhva Triyambakam multiplication algorithm is the best algorithm as it generates partial products in the parallel manner.
Abstract: Now days most of the circuits which are going to be designed to perform any specific or safety critical operations are mainly based upon the digital domain, where microprocessors and microcontrollers plays an important role to design these digital circuits ALU is the heart of these processors By optimizing this co-processor a highly efficient digital processor can be obtained So this paper is totally devoted to design speed, energy and power efficient Arithmetic Logic Unit Speed of ALU is greatly depends upon the speed of multiplication unit There are so many multiplication techniques have been devised at algorithmic and structural level After a thorough study and deep analysis we have found that Vedic Urdhva Triyambakam multiplication algorithm is the best algorithm as it generates partial products in the parallel manner In this paper we have proposed a new tree multiplication structure based architecture to design this Vedic multiplier To generate partially generated products divide and conquer approach has been used For the addition of partially generated products a new addition tree structure has been proposed It provides better speed in comparison Array, Booth, Wallace, Modified Booth Wallace, Karatsuba and Vedic Karatsuba Multiplier as well as it is faster than Vedic multiplier which has been proposed by L Shriraman [2], and Devika Jaina [3] To make ALU energy and power efficient, a new reversible logic gate has been proposed which is similar to Fredkin Gate [18] After integrating these modules we have obtained the speed, energy and power efficient ALU The proposed Arithmetic Logic Unit is coded in Verilog HDL, synthesized and simulated using Xilinx ISE 92i software

24 citations


Proceedings ArticleDOI
19 Aug 2012
TL;DR: A 2*2 Swap gate which is a reduced implementation in terms of quantum cost and delay to the previous Swap gate is presented and its advantages over the Toffoli and Peres gates are discussed.
Abstract: Programmable reversible logic is gain wide consideration as a logic design style for modern nanotechnology and quantum computing with minimal impact on circuit heat generation in improved computer architecture and arithmetic logic unit designs. In this paper, a 2*2 Swap gate which is a reduced implementation in terms of quantum cost and delay to the previous Swap gate is presented. Then, a novel 3*3 programmable UPG gate capable of calculating the universal logic calculations is presented and verified, and its advantages over the Toffoli and Peres gates are discussed. The UPG is then implemented in a reduced design for calculating n-bit AND, n-bit OR and n-bit ZERO calculations. Then, two 3*3 RMUX gates capable of multiplexing two input values with reduced quantum cost and delay compared to the previously existing Fred kin gate is presented and verified. Next, a novel 4*4 reversible programmable RC gate capable of nine unique logical calculations at low cost and delay is presented and verified. The UPG and RC are implemented in the design of novel sequential and tree-based comparators. These designs are compared to previously existing designs, and their advantages in terms of cost and delay are analyzed. Then, the RMUX is used to improve a reversible SRAM cell we previously presented. The memory cell and comparator are implemented in the design of a Min/Max Comparator device.

24 citations


Patent
James M. Simkins1, Wayne E. Wennekamp1, John M. Thendean1, Adam Elkins1, Walke Richard L1 
09 Nov 2012
TL;DR: In this article, a digital signal processing (DSP) block with a preadder-register block coupled to receive first through fourth input operands is described. But the complexity of the DSP is not discussed.
Abstract: An apparatus is disclosed. This apparatus includes a digital signal processing (“DSP”) block having a preadder-register block coupled to receive first through fourth input operands. A multiplier is coupled to the preadder-register block to receive a multiplicand operand and a multiplier operand. A first register block is coupled to the multiplier to receive sets of partial products from the multiplier. A second register block coupled to receive the third operand input. An arithmetic logic unit (“ALU”) block is coupled to the pre-adder-register block, the first register block and the second register block. The ALU block includes four input multiplexers and an ALU, where the ALU is coupled to receive outputs from each of the four input multiplexers.

21 citations


Patent
31 Oct 2012
TL;DR: In this article, an arithmetic logic unit (320) including a first routing grid (408) connected to multiple data lanes (400) to drive first data to the data lanes, and a second routing grid connecting to the second data lanes to drive second data to data lanes.
Abstract: An arithmetic logic unit (320) including a first routing grid (408) connected to multiple data lanes (400) to drive first data to the data lanes (400). A second routing grid (412) is connected to the data lanes (400) to drive second data to the data lanes (400). Each of the data lanes (400) include multiple, e.g. N, functional units with first inputs from the first routing grid and second inputs from the second routing grid. The functional units compute pairwise a function of the respective first data on the respective first inputs and the respective second data on the respective second inputs. Each of the data lanes include a reduction unit with inputs adapted to receive K' bits per word from the functional units. The reduction unit is configured to perform a reduction operation configured to output an output result having a reduced number J' bits per word, wherein J' is less than N multiplied by K'.

17 citations


01 Jan 2012
TL;DR: The main objective of this paper is to design and implement an 8-bit Reduced Instruction Set (RISC) processor using XILINX Spartan 3E tool and the enhanced feature of Spartan-3E deliberately reduces the cost per logic cell designed.
Abstract: RISC or Reduced Instruction Set Computer is a design philosophy that has become a mainstream in Scientific and engineering applications. Increasing performance and gate capacity of recent FPGA devices permits complex logic systems to be implemented on a single programmable device. So the main objective of this paper is to design and implement an 8-bit Reduced Instruction Set (RISC) processor using XILINX Spartan 3E tool. The enhanced feature of Spartan-3E deliberately reduces the cost per logic cell designed. The most important feature of the RISC processor is that this processor is very simple and support load/store architecture. The important components of this processor include the Arithmetic Logic Unit, Shifter, Rotator and Control unit. The module functionality and performance issues like area, power dissipation and propagation delay are analyzed at 90 nm process technology using SPARTAN 3E XCS500E XILINX tool

15 citations


01 Jan 2012
TL;DR: This work proved that Vedic multiplication technique is the best algorithm in terms of speed.
Abstract: This paper is devoted for designing high speed arithmetic logic unit. All of us know that ALU is a module which can perform arithmetic and logic operations. The reason behind choosing this topic as a research work is that, ALU is the key element of digital processors like as microprocessors, microcontrollers, central processing unit etc. Every digital domain based technology depends upon the operations performed by ALU either partially or whole. That's why it highly required designing high speed ALU, which can enhance the efficiency of those modules which lies upon the operations performed by ALU. The speed of ALU greatly depends upon the speed of multiplier. There are so many multiplication algorithms exist now-a-days at algorithmic and structural level. Our work proved that Vedic multiplication technique is the best algorithm in terms of speed. Further we have seen that the conventional Vedic multiplication hard wares have some limitations. So to overcome those limitations a novel approach has been proposed to design the Vedic multiplier with the use of unique addition tree structure, which is used to add partially generated products. For designing the two bit Vedic multiplier conventional hardware of Vedic multiplier has been used. For designing the four and eight bit level Vedic multiplier divide and conquer approach has been used. After designing the proposed Vedic multiplier, it has been integrated into an eight bit module of arithmetic logic unit along with the conventional adder, subtractor, and basic logic gates. The proposed ALU is able to perform three different arithmetic and eight different logical operations at high speed. All of these operational sub- modules (adder, subtractor, multiplier and logical gates) have been designed as the combinatorial circuit. And for the synchronization of these operational sub-modules, the multiplexers which have been used to integrate these sub- modules in a single unit have been triggered by positive edge clock To design proposed arithmetic logic unit verilog hardware description language (HDL) has been used. For designing operational sub-modules data flow modeling and for integration purpose behavioral modeling style has been used. For this design the target FPGA which we have takes belongs to Virtex-2P (family), XC2VP2 (device), FG256 (package) with speed grade of -7. For synthesis purpose Xilinx synthesis tool (XST) of Xilinx ISE-9.2i has been used. The behavioral simulation purpose ISE simulator has been used. The maximum combinational path delay of proposed multiplier is 11.886 ns. And the ALU that has been designed can operate at the maximum frequency of 741.455 MHZ.

14 citations


Patent
22 Aug 2012
TL;DR: In this paper, the authors present a method for executing applications in adaptation to the load of the CPU and GPU in a terminal equipped with a central processing unit and a graphics processing unit (GPU).
Abstract: A terminal equipped with a central processing unit (CPU) and a graphics processing unit (GPU) performs a method for executing applications in adaptation to the load of the CPU and GPU. The application execution method of the present invention includes checking, when a code of application to be executed is input, workloads of a central processing unit and a graphics processing unit. The method also includes comparing the workloads of the central processing unit and the graphics processing unit with respective workload threshold values, and compiling the code according to comparison result. The method further includes generating a binary for executing the application at one of the central processing unit and the graphics processing unit using the compiled code, and executing the application with the generated binary. The method reduces application execution time by adjusting the workloads of the CPU and GPU according to the total workload, thereby saving power.

11 citations


Proceedings ArticleDOI
19 Dec 2012
TL;DR: The proposed designs of reversible ALU in QCA (Quantum-dot Cellular Automata) framework are verified and evaluated over the existing ALU designs and found to be more efficient in terms of design complexity and quantum cost.
Abstract: This work targets design of reversible ALU (arithmetic logic unit) in QCA (Quantum-dot Cellular Automata) framework. The design is based on the reversible QCA structure (RQCA) introduced in this paper. A fault tolerant architecture of reversible ALU is also synthesized. The proposed designs are verified and evaluated over the existing ALU designs and found to be more efficient in terms of design complexity and quantum cost.

01 Jan 2012
TL;DR: The standard cell gate library described here is a first investigation towards a computer aided design flow for reversible logic that includes cell placement and routing and the connection between the standard cells and a combinator-based reversible functional languages is described.
Abstract: This technical report shows the design and layout of a library of three reversible logic gates designed with the standard cell methodology. The reversible gates are based on complementary pass-transistor logic and have been validated with simulations, a layout vs. schematic check, and a design rule check. The standard cells have been used in the design and layout of a novel 4-bit reversible arithmetic logic unit. After validation this ALU has been fabricated and packaged in a DIL48 chip. The standard cell gate library described here is a first investigation towards a computer aided design flow for reversible logic that includes cell placement and routing. The connection between the standard cells and a combinator-based reversible functional languages is described.

01 Jan 2012
TL;DR: This paper primarily deals with the construction of arithmetic Logic Unit (ALU) using Hardware Description Language (HDL) using Xilinx ISE 9.2i and SPARTAN 3E FPGA board to implement them on Field Programmable Gate Arrays (FPGAs) to analyze the design parameters.
Abstract: This paper primarily deals with the construction of arithmetic Logic Unit (ALU) using Hardware Description Language (HDL) using Xilinx ISE 9.2i and implement them on Field Programmable Gate Arrays (FPGAs) to analyze the design parameters.. ALU of digital computers is an aspect of logic design with the objective of developing appropriate algorithms in order to achieve an efficient utilization of the available hardware. The hardware can only perform a relatively simple and primitive set of Boolean & arithmetic operations and are based on a hierarchy of operations that are built by using algorithms employing the hardware. Speed, power and utilization of ALU are the measures of the efficiency of an algorithm. In this paper, we have simulated and synthesized the various parameters of ALUs by using VHDL on Xilinx ISE 9.2i and SPARTAN 3E FPGA board.

Patent
Albert Meixner1
21 Dec 2012
TL;DR: In this article, a general processing unit as a programmable function unit of a graphics pipeline and a method of manufacturing a graphics processing unit was described. But the authors did not specify a specific implementation of the GPGPU.
Abstract: Employing a general processing unit as a programmable function unit of a graphics pipeline and a method of manufacturing a graphics processing unit are disclosed. In one embodiment, the graphics pipeline includes: (1) accelerators, (2) an input output interface coupled to each of the accelerators and (3) a general processing unit coupled to the input output interface and configured as a programmable function unit of the graphics pipeline, the general processing unit configured to issue vector instructions via the input output interface to vector data paths for the programmable function unit.

01 Jan 2012
TL;DR: The proposed fault tolerant reversible ALU is a versatile approach to the implementation of quantum computing with having both a remarkable low power consumption and nano scaling.
Abstract: Since the Arithmetic Logic Unit (ALU) is one of the essential components of the Central Processing Unit (CPU), its well performance is the most important factor in obtaining the high reliability. The reversible logic has also found emerging attention in nanotechnology, optical computing, quantum computing and low power CMOS design. In this paper we are going to propose and analyze a basic model of fault tolerant reversible ALU and show that the realization of an efficient fault tolerant reversible ALU is possible with both minimum constant inputs and garbage outputs. The proposed fault tolerant reversible ALU is a versatile approach to the implementation of quantum computing with having both a remarkable low power consumption and nano scaling.

Patent
12 Sep 2012
TL;DR: A microcontroller has a central processing unit (CPU), a plurality of peripherals, and a programmable scheduler unit with: - a timer being clocked by an independent clock signal; a comparator coupled with a timer register of said timer and having an output generating an output signal; - an event register coupled with said comparator; - a delta time register; and - an arithmetic logic unit controlled by the output signal of the comparator and with first and second inputs and an output, wherein the first input is coupled with the timer register or the event register and the
Abstract: A microcontroller has a central processing unit (CPU), a plurality of peripherals, and a programmable scheduler unit with: - a timer being clocked by an independent clock signal; - a comparator coupled with a timer register of said timer and having an output generating an output signal; - an event register coupled with said comparator; - a delta time register; and - an arithmetic logic unit controlled by the output signal of the comparator and with first and second inputs and an output, wherein the first input is coupled with the timer register or the event register and the second input is coupled with the delta time register and the output is coupled with the event register.

Journal ArticleDOI
TL;DR: VHDL implementation of 8-bit arithmetic logic unit (ALU) was implemented using VHDL Xilinx Synthesis tool ISE 13.1 and targeted for Spartan device.
Abstract: In this paper VHDL implementation of 8-bit arithmetic logic unit (ALU) is presented. The design was implemented using VHDL Xilinx Synthesis tool ISE 13.1 and targeted for Spartan device. ALU was designed to perform arithmetic operations such as addition and subtraction using 8-bit fast adder, logical operations such as AND, OR, XOR and NOT operations, 1's and 2's complement operations and compare. ALU consist of two input registers to hold the data during operation, one output register to hold the result of operation, 8-bit fast adder with 2's complement circuit to perform subtraction and logic gates to perform logical operation. The maximum propagation delay is 13.588ns and power dissipation is 38mW. The ALU was designed for controller used in network interface card.

Proceedings ArticleDOI
10 Jul 2012
TL;DR: An innovative arithmetic logic unit (ALU) architecture that supports true dynamic precision operations on the fly is proposed and the overhead can be minimized if the ALU structure and configuration are chosen carefully for specific applications.
Abstract: Exploiting computational precision can improve performance significantly without losing accuracy in many applications. To enable this, we propose an innovative arithmetic logic unit (ALU) architecture that supports true dynamic precision operations on the fly. The proposed architecture targets both fixed-point and floating-point ALUs, but in this paper we focus mainly on the precision-controlling mechanism and the corresponding implementations for fixed-point adders and multipliers. We implemented the architecture on Xilinx Virtex-5 XC5VLX110T FPGAs, and the results show that the area and latency overheads are 1% ~ 24% depending on the structure and configuration. This implies the overhead can be minimized if the ALU structure and configuration are chosen carefully for specific applications. As a case study, we apply this architecture to binary cascade iterative refinement (BCIR). 4X speedup is observed in this case study.

Proceedings ArticleDOI
25 May 2012
TL;DR: A novel multi-path FAS unit is presented to accelerate the basic FAS architecture and results in FP single precision data format show that the multi- path FAS is 48.7% faster and 16.9% smaller than the basicFAS unit.
Abstract: The task of computing both the summation and difference of a pair of Floating-Point (FP) data is often needed in some Digital Signal Processing (DSP) algorithms and other applications. A basic fused add-subtract unit (FAS) is introduced [1] to perform simultaneously both addition and subtraction operation for a couple of operands and has less hardware overhead than the general approach using two FP adders. In this paper, a novel multi-path FAS unit is presented to accelerate the basic FAS architecture. The implementation results in FP single precision data format show that the multi-path FAS is 48.7% faster and 16.9% smaller than the basic FAS unit.

Proceedings ArticleDOI
01 Dec 2012
TL;DR: A new design based on Chain structure provides a green solution and an alternative way to solve the problems of power supply distribution, interconnection and interfacing in VLSI circuits and a comparison experiment was performed over two circuit layouts of 16bit ALIT implementations.
Abstract: With the scaling of technology and the need for high performance and more functionality, power dissipation becomes a major bottleneck for microprocessor systems design The power dissipation has become an obsessive concern in the design and implementation of VLSI circuits with the development of the technology In this paper, conventional low-power design approaches are discussed, and a new design based on Chain structure is presented It provides a green solution and an alternative way to solve the problems of power supply distribution, interconnection and interfacing in VLSI circuits and a comparison experiment was performed over two circuit layouts of 16bit ALU implementations and using standard CMOS that saves more than 30% of power consumption, and 0423 mg of CO2 emission

Proceedings ArticleDOI
01 Sep 2012
TL;DR: It is demonstrated that using a universal circuit, basic Boolean functions like AND, NAND, OR, NOR, Exclusive-OR and Exclusive-NOR can be configured using Multiple-Input Floating-Gate (MIFG) Transistors or neu-MOS.
Abstract: A methodology is proposed for the design of a 4-Bit Arithmetic Logic Unit (ALU) based on Soft-Hardware-Logic (SHL). The core of the implementation is based on the device known as neu-MOS (ν-MOS), a floating-gate MOS transistor with more than one control gate used for the digital signal processing. This configuration is reconfigurable modifying only the external voltages applied to an intermediate stage of programmable CMOS inverters, without any circuitry change, in contrast with conventional digital implementations. Here it is demonstrated that using a universal circuit, basic Boolean functions like AND, NAND, OR, NOR, Exclusive-OR and Exclusive-NOR can be configured using Multiple-Input Floating-Gate (MIFG) Transistors or neu-MOS. Based on a graphical method called Floating-gate Potential Diagram (FPD), a very basic 4-Bit ALU was designed and simulated for a couple of arithmetic and logic functions taking advantage of the weighted sum performed at the floating gate of the neu-MOS. Weighted inputs can be obtained from the FPD and then converted to effective capacitances choosing a given CMOS technology, OnSemi's 0.5 μm technology, for instance. Results obtained from simulations of the proposed design are compared with experimental results of ALUs configured with a FPGA evaluation kit and Motorola's MC14581B ALU chip.

Patent
03 Oct 2012
TL;DR: In this paper, a digital signal processor based on a parallel data channel is proposed, where each stage of the arithmetic logic channel performs the parallel arithmetic logic operation through the plurality of parallel logic units, and each arithmetic logic unit is used for performing addition, subtraction, comparison, displacement, or absolute value operation; the bypass operation can be carried out to the layer.
Abstract: The invention provides a digital signal processor based on a parallel data channel. The parallel data channel sequentially comprises a parallel multiplication unit, a parallel operation unit set and a parallel accumulating unit, wherein the parallel multiplication unit comprises a plurality of parallel multiplying units and has the capabilities of carrying out multiplex real multiplication or complex multiplication as well as implementing bypass operation; the parallel operation unit set comprises a plurality of arithmetic logic units, and is formed by connecting multiple stages of arithmetic logic channels and a switching network composed by the plurality of arithmetic logic units in each layer, wherein each stage of arithmetic logic channel performs the parallel arithmetic logic operation through the plurality of parallel arithmetic logic units, and the operating result of the previous stage of arithmetic logic channel can be transmitted to the next stage of arithmetic logic channel through the switching network; each arithmetic logic unit is used for performing addition, subtraction, comparison, displacement, or absolute value operation; the bypass operation can be carried out to the layer; and the parallel accumulating unit is formed by a plurality of parallel accumulating units and is used for performing accumulating and post-processing. The digital signal processor based on the parallel data channel improves the processing performance and efficiency of the digital signal processor.

Journal ArticleDOI
TL;DR: In this paper, the architecture of a floating point MAC unit is presented and Floating Point RNS arithmetic units have obvious advantages over fixed point multiply & accumulate (MAC) units which are the key units in Digital Signal Processors.
Abstract: Execution of arithmetic operations at a very high speed in real time is the major concern in compute intensive digital signal processing (DSP) algorithms Residue Number Systems are being considered as alternative to binary number system because of their capabilities of performing "carry free" arithmetic operations. However, RNS systems have so far been used to handle integer numbers only. Floating Point RNS arithmetic units have obvious advantages over fixed point multiply & accumulate (MAC) units which are the key units in Digital Signal Processors. Keeping this in view, in this paper, the architecture of a floating point MAC unit is presented.

Patent
15 Feb 2012
TL;DR: In this paper, a method for combating counterfeiting and tampering of integrated circuits includes the steps providing a programmable logic device, the programmable Logic device including an arithmetic circuit implemented into the substrate, and constructing an arithmetic feedback oscillator using the arithmetic circuit.
Abstract: A method for combating counterfeiting and tampering of integrated circuits includes the steps providing a programmable logic device, the programmable logic device including an arithmetic circuit implemented into the substrate, and constructing an arithmetic feedback oscillator using the arithmetic circuit. The step of constructing the arithmetic feedback oscillator includes incorporating a feedback loop into the arithmetic circuit and feeding output bits back into an input of the arithmetic circuit. The method also includes the step of selecting input values producing repeating values in a lesser order bit of a product of the arithmetic circuit when first and second input are applied to the arithmetic circuit and monitoring the lesser order bit and determining a predicted pattern.

Proceedings ArticleDOI
24 Sep 2012
TL;DR: An FPGA(Field Programmable Gate Array) based processor is implemented for ECC, which parallelizes the computing of ECC at bit-level and gains a considerable speed-up.
Abstract: As the networks have been broadly used everywhere such as national defense, military, bank and so on, security of data transported on network has become a hot issue. Public key cryptographic algorithms are widely applied in network communication. RSA has been used for a long time as a traditional public key cryptographic system, but it seems not able to meet user's higher security demands. In recent years, ECC(Elliptic Curve Cryptography) has been adopted more and more broadly because of its highest security of the same length bit. In addition, it also has the advantage of less computation overheads, less bandwidth demand and so on. The speed of encryption and decryption of ECC is greatly affected by point multiplication, which is very time-consuming. In this study, an FPGA(Field Programmable Gate Array) based processor is implemented for ECC, which parallelizes the computing of ECC at bit-level and gains a considerable speed-up. The ECC processor is fully implemented with hardware which supports key length of 113-bit, 163-bit and 193-bit. Algorithms suitable for hardware implementation are applied to make the processor more efficient. There are four kinds of unit in the processor: arithmetic logic unit, controlling unit, and input/output system. The units communicate with each other thought bus in FPGA device.

Journal ArticleDOI
TL;DR: This paper designs an ALU which mainly consists of two adders, and takes advantage of Adaptive Logic Module (ALM) architecture, and employs verilog to describe the ALU.

Patent
14 Feb 2012
TL;DR: In this paper, a central processing unit (CPU) and a memory system module are used to coordinate retrieval of the set of encoded ingress data slices from memory and storage of encoded egress data slices in the memory.
Abstract: A computing device includes a central processing unit (CPU) and a memory system module. The CPU includes a data dispersed storage error coding (DSEC) module operable to DSEC decode a set of encoded ingress data slices to recapture ingress data and DSEC encode egress data to produce a set of encoded egress data slices, an instruction DSEC module operable to DSEC decode a set of encoded instruction slices to recapture an instruction, and an arithmetic logic unit (ALU) operable to, execute the instruction on the ingress data and execute the instruction to produce the egress data. The memory system module is operable to coordinate retrieval of the set of encoded ingress data slices from memory, coordinate retrieval of the set of encoded instruction slices from the memory, and coordinate storage of the set of encoded egress data slices in the memory.

Journal ArticleDOI
TL;DR: A framework in which a design style, such as software-oriented or application-specific instruction-set processor (ASIP)-oriented design, can be specified is built and an exploration process is proposed that allows targeting the main aspects that limit acceleration and the actions that can be made to improve it.
Abstract: Design space exploration is a delicate process whose success lays on the designers' shoulders. It is often based on a trial-and-error approach. Some basic metrics can be used to guide this process. In this paper, we explore accelerating loops from C-based specifications. We built a framework in which a design style, such as software-oriented or application-specific instruction-set processor (ASIP)-oriented design, can be specified. We also propose an exploration process that allows targeting the main aspects that limit acceleration and the actions that can be made to improve it. The process is based on new loop-oriented metrics that provide insight in key design issues. They help to determine which aspects of the design between data accesses and arithmetic logic unit (ALU)/control operations limit or allow leveraging loop acceleration opportunities. We profile some benchmarks from the signal and image processing fields, such as the Turbo Decoder and the JPEG algorithms, to illustrate how loop-oriented metrics help to point out aspects that limit or improve loop acceleration. The loop acceleration process was also used to explore design architectures that can leverage, as much as possible, the loop acceleration opportunities of the sum of absolute differences (SAD) algorithm.

Dissertation
01 Dec 2012
TL;DR: Ayala et al. as mentioned in this paper designed and evaluated energy-efficient, high-speed 32-bit integer arithmetic logic units (ALUs) using superconductor logic based on Rapid Single Flux Quantum (RSFQ) logic.
Abstract: of the Dissertation Energy-Efficient Wide Datapath Integer Arithmetic Logic Units Using Superconductor Logic by Christopher Lawrence Ayala Doctor of Philosophy in Computer Engineering Stony Brook University 2012 Complementary Metal-Oxide-Semiconductor (CMOS) technology is currently the most widely used integrated circuit technology today. As CMOS approaches the physical limitations of scaling, it is unclear whether or not it can provide long-term support for niche areas such as high-performance computing and telecommunication infrastructure, particularly with the emergence of cloud computing. Alternatively, superconductor technologies based on Josephson junction (JJ) switching elements such as Rapid Single Flux Quantum (RSFQ) logic and especially its new variant, Energy-Efficient Rapid Single Flux Quantum (ERSFQ) logic have the capability to provide an ultra-high-speed, low power platform for digital systems. The objective of this research is to design and evaluate energyefficient, high-speed 32-bit integer Arithmetic Logic Units (ALUs)

Patent
06 Jun 2012
TL;DR: In this paper, an arithmetic logic unit for computing a one-dimensional score between a feature vector and a Gaussian probability distribution vector is provided, which includes an apparatus, method, and system for acoustic modeling.
Abstract: Embodiments of the present invention include an apparatus, method, and system for acoustic modeling. In an embodiment, an arithmetic logic unit for computing a one-dimensional score between a feature vector and a Gaussian probability distribution vector is provided. The arithmetic logic unit includes a computational logic unit configured to compute a first value based on a mean value and a variance value associated with a dimension of the Gaussian probability distribution vector and a dimension of a feature vector, a look up table module configured to output a second value based on the variance value, and a combination module configured to combine the first value and the second value to generate the one-dimensional score.