scispace - formally typeset
Search or ask a question

Showing papers on "Adder published in 2009"


Journal ArticleDOI
TL;DR: The unique QCA characteristics are utilizes to design a carry flow adder that is fast and efficient and the design of serial parallel multipliers is explored, which indicates very attractive performance.
Abstract: Quantum-dot cellular automata (QCA) is an emerging nanotechnology, with the potential for faster speed, smaller size, and lower power consumption than transistor-based technology. Quantum-dot cellular automata has a simple cell as the basic element. The cell is used as a building block to construct gates and wires. Previously, adder designs based on conventional designs were examined for implementation with QCA technology. That work demonstrated that the design trade-offs are very different in QCA. This paper utilizes the unique QCA characteristics to design a carry flow adder that is fast and efficient. Simulations indicate very attractive performance (i.e., complexity, area, and delay). This paper also explores the design of serial parallel multipliers. A serial parallel multiplier is designed and simulated with several different operand sizes.

342 citations


01 Dec 2009
TL;DR: In this article, a novel error-tolerant adversary, named the Error-Tolerant Adder (ETAII), has been proposed to overcome all possible errors in modern VLSI technology.
Abstract: The occurrence of errors are inevitable in modern VLSI technology and to overcome all possible errors is an expensive task. It not only consumes a lot of power but degrades the speed performance. By adopting an emerging concept in VLSI design and test—Error- Tolerance (ET), we managed to develop a novel Error-Tolerant Adder which we named the Type II (ETAII). The circuit to some extent is able to ease the strict restriction on accuracy to achieve tremendous improvements in both the power consumption and speed performance. When compared to its conventional counterparts, the proposed ETAII is able to achieve more than 60% improvement in the Power-Delay Product (PDP). The proposed ETAII is an enhancement of our earlier design, the ETAI, which has problem adding small number inputs.

173 citations


Journal ArticleDOI
TL;DR: Two novel low-power 1-bit Full Adder cells are proposed, based on majority-not gates, which are designed with new methods in each cell, and demonstrate improvement in terms of power consumption and power-delay product (PDP).

133 citations


Patent
26 Oct 2009
TL;DR: In this paper, a system for fast determination of a horizontal minimum of multiple digital values including a difference circuit and a compare circuit was proposed, where the first adder compares upper bits of a first digital value with lower bits of the second digital value and provides a second carry output.
Abstract: A system for fast determination of a horizontal minimum of multiple digital values including a difference circuit and a compare circuit. The difference circuit may include first and second adders in which the first adder compares upper bits of a first digital value with upper bits of a second digital value and provides a first carry output and a propagate output. The second adder compares lower bits of the first digital value with lower bits of the second digital value and provides a second carry output. The compare circuit determines whether the first digital value is greater than the second digital value based on the carry and propagate outputs. Multiple difference circuits may be used to compare each of multiple digital values with every other digital value to provide corresponding compare bits, which are then used to determine a minimum one of the digital values and its corresponding location.

123 citations


Journal ArticleDOI
TL;DR: Two new 4 × 4 bit reversible multiplier designs are presented which have lower hardware complexity, less garbage bits, less quantum cost and less constant inputs than previous ones, and can be generalized to construct efficient reversible n × n bit multipliers.
Abstract: Reversible logic circuits have received significant attention in quantum computing, low power CMOS design, optical information processing, DNA computing, bioinformatics, and nanotechnology. This paper presents two new 4 × 4 bit reversible multiplier designs which have lower hardware complexity, less garbage bits, less quantum cost and less constant inputs than previous ones, and can be generalized to construct efficient reversible n × n bit multipliers. An implementation of reversible HNG is also presented. This implementation shows that the full adder design using HNG is one of the best designs in term of quantum cost. An implementation of MKG is also presented in order to have a fair comparison between our proposed reversible multiplier designs and the existing counterparts. The proposed reversible multipliers are optimized in terms of quantum cost, number of constant inputs, number of garbage outputs and hardware complexity. They can be used to construct more complex systems in nanotechnology.

114 citations


Journal ArticleDOI
TL;DR: A novel low-power majority function-based 1-bit full adder that uses MOS capacitors (MOSCAP) in its structure that can work reliably at low supply voltage and consumes 30% less power than transmission function adder (TFA) and is 1.11 times faster.

107 citations


Journal ArticleDOI
TL;DR: A methodology for energy-delay optimization of digital circuits is presented and the result of the optimization is demonstrated on a design of the fastest adder found, a 240-ps Ling sparse domino adder in 1 V, 90 nm CMOS.
Abstract: A methodology for energy-delay optimization of digital circuits is presented. This methodology is applied to minimizing the delay of representative carry-lookahead adders under energy constraints. Impact of various design choices, including the carry-lookahead tree structure and logic style, are analyzed in the energy-delay space and verified through optimization. The result of the optimization is demonstrated on a design of the fastest adder found, a 240-ps Ling sparse domino adder in 1 V, 90 nm CMOS. The optimality of the results is assessed against the impact of technology scaling.

98 citations


Journal ArticleDOI
TL;DR: In this article, the authors studied the difficulty and challenges of implementing time-delay integration (TDI) functionality in a CMOS technology, including synchronization of the samples forming a TDI pixel, adder matrix outside the array, and addition noise.
Abstract: Difficulty and challenges of implementing time-delay-integration (TDI) functionality in a CMOS technology are studied: synchronization of the samples forming a TDI pixel, adder matrix outside the array, and addition noise. Existing and new TDI sensor architecture concepts with snapshot shutter, rolling shutter, or orthogonal readout are presented. An optimization method is then introduced to inject modulation transfer function and quantum efficiency specification in the architecture definition. Moderate spatial and temporal oversamplings are combined to achieve near charge-coupled device (CCD) class performances, resulting in an acceptable design complexity. Finally, CCD and CMOS dynamic range and signal-to-noise ratio are conceptually compared.

92 citations


Journal ArticleDOI
TL;DR: The linear transformer driver (LTD) is a new method for constructing high-current, high-voltage pulsed accelerators as discussed by the authors, which switches and inductively adds the pulses at low voltage straight out of the capacitors through low inductance transfer and soft iron core isolation.
Abstract: The linear transformer driver (LTD) is a new method for constructing high current, high-voltage pulsed accelerators. The salient feature of the approach is switching and inductively adding the pulses at low voltage straight out of the capacitors through low inductance transfer and soft iron core isolation. Sandia National Laboratories are actively pursuing the development of a new class of accelerator based on the LTD technology. Presently, the high current LTD experimental research is concentrated on two aspects: first, to study the repetition rate capabilities, reliability, reproducibility of the output pulses, switch prefires, jitter, electrical power and energy efficiency, and lifetime measurements of the cavity active components; second, to study how a multicavity linear array performs in a voltage adder configuration relative to current transmission, energy and power addition, and wall plug to output pulse electrical efficiency. Here we report the repetition rate and lifetime studies performed in the Sandia High Current LTD Laboratory. We first utilized the prototype � 0:4-MA, LTD I cavity which could be reliably operated up to � 90-kV capacitor charging. Later we obtained an improved 0.5-MA, LTD II version that can be operated at � 100 kV maximum charging voltage. The experimental results presented here were obtained with both cavities and pertain to evaluating the maximum achievable repetition rate and LTD cavity performance. The voltage adder experiments with a series of double sized cavities (1 MA, � 100 kV) will be reported in future publications.

87 citations


Journal ArticleDOI
TL;DR: The CMOS image sensor computes two-dimensional convolution of video frames with a programmable digital kernel of up to 8 times 8 pixels in parallel directly on the focal plane and is experimentally validated in discrete wavelet transform (DWT) video compression and frame differencing.
Abstract: The CMOS image sensor computes two-dimensional convolution of video frames with a programmable digital kernel of up to 8 times 8 pixels in parallel directly on the focal plane. Three operations, a temporal difference, a multiplication and an accumulation are performed for each pixel readout. A dual-memory pixel stores two video frames. Selective pixel output sampling controlled by binary kernel coefficients implements binary-analog multiplication. Cross-pixel column-parallel bit-level accumulation and frame differencing are implemented by switched-capacitor integrators. Binary-weighted summation and concurrent quantization is performed by a bank of column-parallel multiplying analog-to-digital converters (MADCs). A simple digital adder performs row-wise accumulation during ADC readout. A 128 times 128 active pixel array integrated with a bank of 128 MADCs was fabricated in a 0.35 mum standard CMOS technology. The 4.4 mm times 2.9 mm prototype is experimentally validated in discrete wavelet transform (DWT) video compression and frame differencing.

73 citations


Journal ArticleDOI
TL;DR: A Built-In Self-Test (BIST) method to accurately measure the combinatorial circuit delays on an FPGA, paving the way for matching timing requirements in designs to FPGAs as a means of combating the problem of process variations.
Abstract: This article proposes a Built-In Self-Test (BIST) method to accurately measure the combinatorial circuit delays on an FPGA. The flexibility of the on-chip clock generation capability found in modern FPGAs is employed to step through a range of frequencies until timing failure in the combinatorial circuit is detected. In this way, the delay of any combinatorial circuit can be determined with a timing resolution of the order of picoseconds. Parallel and optimized implementations of the method for self-characterization of the delay of all the LUTs on an FPGA are also proposed. The method was applied to Altera Cyclone II and III FPGAs . A complete self-characterization of LUTs on a Cyclone II was achieved in 2.5 seconds, utilizing only 13kbit of block RAM to store the results. More extensive tests were carried out on the Cyclone III and the delays of adder circuits and embedded multiplier blocks were successfully measured. This self-measurement method paves the way for matching timing requirements in designs to FPGAs as a means of combating the problem of process variations.

Proceedings ArticleDOI
Dan Wang1, Maofeng Yang1, Wu Cheng1, Xuguang Guan1, Zhangming Zhu1, Yintang Yang1 
25 May 2009
TL;DR: In comparison with Static Energy Recovery Full (SERF) adder cell module, the proposed four full adder cells demonstrate their advantages, including lower power consumption, smaller area, and higher speed.
Abstract: This paper proposes four low power adder cells using different XOR and XNOR gate architectures. Two sets of circuit designs are presented. One implements full adders with 3 transistors (3-T) XOR and XNOR gates. The other applies Gate-Diffusion-Input (GDI) technique to full adders. Simulations are performed by using Hspice based on 180nm CMOS technology. In comparison with Static Energy Recovery Full (SERF) adder cell module, the proposed four full adder cells demonstrate their advantages, including lower power consumption, smaller area, and higher speed.

Patent
Keiichi Sakurai1, Takaaki Yui1
08 Jan 2009
TL;DR: In this article, a digital camera including an image adder 5d for synthesizing a plurality of continuously taken image frames to produce a synthesized image, an image processing apparatus 5 for executing image brightness adjusting processing, and a display device for displaying an image which is being synthesized by the adder in the image brightness adjustment processing.
Abstract: An image pickup apparatus is provided which informs a user of a state of brightness adjusting operation and which has excellent usability for the user. Disclosed is a digital camera including an image adder 5d for synthesizing a plurality of continuously taken image frames to produce a synthesized image, an image processing apparatus 5 for executing image brightness adjusting processing for synthesizing a required synthesis number of image frames and adjusting brightness of the synthesized image at the time of continuous picture-taking of a subject, and a display device for displaying an image which is being synthesized by the image adder in the image brightness adjusting processing.

Journal ArticleDOI
TL;DR: A low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed, which considerably lowers the switching activity of conventional multipliers.
Abstract: In this paper, a low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed. The architecture considerably lowers the switching activity of conventional multipliers. The modifications to the multiplier which multiplies A by B include the removal of the shifting the B register, direct feeding of A to the adder, bypassing the adder whenever possible, using a ring counter instead of a binary counter and removal of the partial product shift. The architecture makes use of a low-power ring counter proposed in this work. Simulation results for 32-bit radix-2 multipliers show that the BZ-FAD architecture lowers the total switching activity up to 76% and power consumption up to 30% when compared to the conventional architecture. The proposed multiplier can be used for low-power applications where the speed is not a primary design parameter.

Patent
06 Apr 2009
TL;DR: In this article, column-parallel ADCs are used to add pixels in a plurality of rows without using additional circuits, such as an adder and a line memory device, and the frame rate can be increased while maintaining constant sensitivity.
Abstract: A CMOS image sensor includes column-parallel ADCs. Each of the ADCs includes a comparator and an up/down counter. With this configuration, digital values of pixels in a plurality of rows can be added without using additional circuits, such as an adder and a line memory device, and the frame rate can be increased while maintaining constant sensitivity.

Proceedings ArticleDOI
13 May 2009
TL;DR: This work proposed new test methods for designs partitioned at the circuits level, in which the gates and transistors of individual circuits could be split across multiple die layers, which represents the most difficult circuit-partitioned design to test pre-bond.
Abstract: 3D integration is an emerging technology that allows for the vertical stacking of multiple silicon die. These stacked die are tightly integrated with through-silicon vias and promise significant power and area reductions by replacing long global wires with short vertical connections. This technology necessitates that neighboring logical blocks exist on different layers in the stack. However, such functional partitions disable intra-chip communication pre-bond and thus disrupt traditional test techniques.Previous work has described a general test architecture that enables pre-bond testability of an architecturally partitioned 3D processor and provided mechanisms for basic layer functionality. This work proposes new test methods for designs partitioned at the circuits level,in which the gates and transistors of individual circuits could be split across multiple die layers. We investigated a bit-partitioned adder unit and a port-split register file, which represents the most difficult circuit-partitioned design to test pre-bond but which is used widely in many circuits. Two layouts of each circuit, planar and 3D, are produced. Our experiments verify the performance and power results and examine the test coverage achieved.

Proceedings ArticleDOI
29 Sep 2009
TL;DR: This paper demonstrates that the carry chains can be used to build compressor trees, i.e., multi-input addition circuits used for parallel accumulation and partial product reduction for parallel multipliers implemented in FPGA logic.
Abstract: Fast carry chains featuring dedicated adder circuitry is a distinctive feature of modern FPGAs. The carry chains bypass the general routing network and are embedded in the logic blocks of FPGAs for fast addition. Conventional intuition is that such carry chains can be used only for implementing carry-propagate addition; state-of-the-art FPGA synthesizers can only exploit the carry chains for these specific circuits. This paper demonstrates that the carry chains can be used to build compressor trees, i.e., multi-input addition circuits used for parallel accumulation and partial product reduction for parallel multipliers implemented in FPGA logic. The key to our technique is to program the lookup tables (LUTs) in the logic blocks to stop the propagation of carry bits along the carry chain at appropriate points. This approach improves the area of compressor trees significantly compared to previous methods that synthesized compressor trees solely on LUTs, without compromising the performance gain over trees built from ternary carry-propagate adders.

Journal ArticleDOI
29 May 2009
TL;DR: This paper describes a reconfigurable 4-way SIMD engine fabricated in 45 nm high-k/metal-gate CMOS, targeted for on-die acceleration of vector processing in power-constrained mobile microprocessors.
Abstract: High-throughput parallel SIMD vector computations are the most performance and power-critical operations in multimedia, graphics and signal processing workloads. An array of SIMD vector processing engines delivers high-throughput short bit-width arithmetic operations on large data sets with orders of magnitude higher energy efficiencies vs. general-purpose cores [1, 2]. A reconfigurable 4-way SIMD engine targeted for on-die acceleration of vector processing in power-constrained mobile microprocessors is fabricated in 45nm High-K/Metal-gate CMOS [3]. The accelerator is reconfigured to perform 4-way 16b×16b multiplies, 32b×32b multiply, 4-way 16b additions, 2-way 32b additions and 72b addition with single-cycle throughput and wide dynamic supply voltage range of operation (1.3V to 230mV). A reconfigurable 2×2 tile of signed 2's complement 16b multipliers, with conditional carry gating in the 72b sparse tree adder, dual-supplies for voltage hopping, and fine-grained power-gating enables peak energy efficiency of 494GOPS/W (measured at 300mV, 50°C) with a dense layout occupying 0.081mm2 (Fig. 14.6.7) while achieving: (i) scalable performance up to 2.8GHz, 278mW measured at 1.3V, (ii) fast single-cycle switching between any operating/idle mode, (iii) configuration-dependent power consumption with 41% total power reduction and 6.5× active leakage power savings, (iv) 10× standby leakage reduction during sleep mode, (v) deep subthreshold operation measured at 230mV, 8.8MHz, 87µW, and (vi) compensation for up to 3× performance variation in ultra-low voltage mode.

Journal ArticleDOI
TL;DR: A new high-level design methodology for RCM by scheduling each mobile adder into a control step within its legitimate time window with the minimum opportunity cost, mutually exclusive adders can be merged with significantly reduced adder and multiplexing cost.
Abstract: Multiplying a signal by a known constant is an essential operation in digital signal processing algorithms. In many application scenarios, an input or output signal is repeatedly multiplied by several predefined constants at different instances. These temporal redundancies can be exploited for the design of an efficient reconfigurable constant multiplier (RCM). An RCM achieves greater hardware savings than the conventional multiple constant multiplication architecture, limited only by the available latency of the subsystem. Motivated by a number of lucrative examples, this paper presents a new high-level design methodology for RCM. Common subexpressions in the preset constants represented in minimum signed-digit system are first eliminated to obtain a minimum depth multiroot directed acyclic graph (DAG). The DAG is converted into a primitive data flow graph (DFG) where mobile adders are identified. By scheduling each mobile adder into a control step within its legitimate time window with the minimum opportunity cost, mutually exclusive adders can be merged with significantly reduced adder and multiplexing cost. The opportunity cost for each scheduling decision is assessed by the probability displacement and disparity measures of the scheduled node as well as its predecessors and successors in the DFG. The algorithm is runtime efficient as exhaustive search for the best fusion of independently optimized constant multipliers has been avoided. Simulation results on randomly generated 12-b constant sets show that the solutions generated by the proposed algorithm are on average 19% to 25% more area-time efficient than the best reported solutions.

Journal ArticleDOI
TL;DR: A recently proposed class of high-radix redundant RNS based on the stored-unibit-transfer representation for modulo 2n + 1 that improves the power-delay-product performance of conventional redundant R NS is discussed.
Abstract: The residue number system (RNS) is suitable for implementing high-speed digital processing devices because it supports parallel, modular, fault-tolerant, and carry-bounded arithmetic. The carry propagation is restricted to inside the modulus. The remaining intramoduli carry propagation limits the speed of arithmetic operation. Therefore, the carry-free property of a redundant arithmetic can be used. In this paper, we discuss a recently proposed class of high-radix redundant RNS based on the stored-unibit-transfer representation for modulo 2n + 1 that improves the power-delay-product performance of conventional redundant RNS. In addition, subtraction and multiplication circuits are designed in the proposed system.

Journal ArticleDOI
TL;DR: A novel finite-impulse response (FIR) filter synthesis technique that allows for aggressive voltage scaling by exploiting the fact that all filter coefficients are not equally important to obtain a ldquoreasonably accuraterdquo filter response is presented.
Abstract: In this paper, we present a novel finite-impulse response (FIR) filter synthesis technique that allows for aggressive voltage scaling by exploiting the fact that all filter coefficients are not equally important to obtain a ldquoreasonably accuraterdquo filter response. Our technique implements a level-constrained common-subexpression-elimination algorithm, where we can constrain the number of adder levels (ALs) required to compute each of the coefficient outputs. By specifying a tighter constraint (in terms of the number of adders in the critical path) on the important coefficients, we ensure that the later computational steps compute only the less important coefficient outputs. In case of delay variations due to voltage scaling and/or process variations, only the less important outputs are affected, resulting in graceful degradation of filter quality. The proposed architecture, therefore, lends itself to aggressive voltage scaling for low-power dissipation even under process parameter variations. Under extreme process variation and supply voltage scaling (0.8 V), filters implemented in the predictive technology model (PTM) 70 nm technology show an average power savings of 25%-30% with minor degradation in filter response in terms of normalized passband/stopband ripple (0.02 at a scaled voltage of 0.8 V compared with 0.005 at a nominal supply).

Journal ArticleDOI
TL;DR: A hardware architecture for ANNs that takes advantage of the dedicated adder blocks, commonly called MACs to compute both the weighted sum and the activation function, which is as fast as existing ones as it is massively parallel.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: In this article, the area and latency of Shor's factoring were optimized by balancing the use of ancilla generators, error correction, and tuning the core adder circuits.
Abstract: We optimize the area and latency of Shor's factoring while simultaneously improving fault tolerance through: (1) balancing the use of ancilla generators, (2) aggressive optimization of error correction, and (3) tuning the core adder circuits. Our custom CAD flow produces detailed layouts of the physical components and utilizes simulation to analyze circuits in terms of area, latency, and success probability. We introduce a metric, called ADCR, which is the probabilistic equivalent of the classic Area-Delay product. Our error correction optimization can reduce ADCR by order of magnitude or more. Contrary to conventional wisdom, we show that the area of an optimized quantum circuit is not dominated exclusively by errorcorrection. Further, our adder evaluation shows that quantum carry-lookahead adders (QCLA) beat ripple-carry adders in ADCR, despite being larger and more complex. We conclude with what we believe is one of most accurate estimates of the area and latency required for 1024-bit Shor's factorization: 7659 mm2 for the smallest circuit and 6 x 108 seconds for the fastest circuit.

Journal ArticleDOI
TL;DR: Two ultra high speed carbon nanotube Full-Adder cells are presented and simulation results illustrate significant improvement in terms of speed and Power-Delay Product (PDP).
Abstract: In this paper two ultra high speed carbon nanotube Full-Adder cells are presented First design uses two transistors, two resistors and seven capacitors and the second one uses four transistors and seven capacitors The first design is faster and the second one consumes less power Simulation results illustrate significant improvement in terms of speed and Power-Delay Product (PDP)

Journal ArticleDOI
TL;DR: Galois field (GF) algebraic expressions have been found to be promising choices for reversible and quantum implementation of multivalued logic and for the first time to the authors' knowledge, GF(4) adderMultivalued (four valued) logic circuits in an all-optical domain are developed.
Abstract: Galois field (GF) algebraic expressions have been found to be promising choices for reversible and quantum implementation of multivalued logic. For the first time to our knowledge, we developed GF(4) adder multivalued (four valued) logic circuits in an all-optical domain. The principle and possibilities of an all-optical GF(4) adder circuit are described. The theoretical model is presented and verified through numerical simulation. The quaternary inverter, successor, clockwise cycle, and counterclockwise cycle gates are proposed with the help of the all-optical GF(4) adder circuit. In this scheme different quaternary logical states are represented by different polarized light. A terahertz optical asymmetric demultiplexer interferometric switch plays an important role in this scheme.

01 Jan 2009
TL;DR: Design of two different array multipliers are presented, one by using carry-look-ahead (CLA) logic for addition of partial product terms and another by introducing Carry Save Adder (CSA) in partial product lines.
Abstract: In this paper, design of two different array multipliers are presented, one by using carry-look-ahead (CLA) logic for addition of partial product terms and another by introducing Carry Save Adder (CSA) in partial product lines. The multipliers presented in this paper were all modeled using VHDL (Very High Speed Integration Hardware Description Language) for 32-bit unsigned data. The comparison is done on the basis of three performance parameters i.e. Area, Speed and Power consumption. To design an efficient integrated circuit in terms of area, power and speed, has become a challenging task in modern VLSI design field. Previously in the literature, performance analysis was carried out between multiplier using Ripple carry adder (RCA) and by using CLA. In this work, same multiplier is designed by using CSA logic and compare it's performance with the multiplier designed by using CLA logic. Multiplier with CSA gives better result in terms of speed (78.3% improvement), area (reduced by 4.2%) and power consumption (decreased by 1.4%).

Posted Content
TL;DR: It is shown that the area of an optimized quantum circuit is not dominated exclusively by error correction, and quantum carry-lookahead adders (QCLA) beat ripple-carry adders in ADCR, despite being larger and more complex.
Abstract: We optimize the area and latency of Shor's factoring while simultaneously improving fault tolerance through: (1) balancing the use of ancilla generators, (2) aggressive optimization of error correction, and (3) tuning the core adder circuits. Our custom CAD flow produces detailed layouts of the physical components and utilizes simulation to analyze circuits in terms of area, latency, and success probability. We introduce a metric, called ADCR, which is the probabilistic equivalent of the classic Area-Delay product. Our error correction optimization can reduce ADCR by an order of magnitude or more. Contrary to conventional wisdom, we show that the area of an optimized quantum circuit is not dominated exclusively by error correction. Further, our adder evaluation shows that quantum carry-lookahead adders (QCLA) beat ripple-carry adders in ADCR, despite being larger and more complex. We conclude with what we believe is one of most accurate estimates of the area and latency required for 1024-bit Shor's factorization: 7659 mm$^{2}$ for the smallest circuit and $6 * 10^8$ seconds for the fastest circuit.

Journal ArticleDOI
TL;DR: A modular synthesis method to realize a reversible BCD-full adder (BCD-FA) and subtractor circuit using genetic algorithm and don't care concept is proposed and a binary to BCD converter is presented.
Abstract: Reversible logic and binary coded decimal (BCD) arithmetic are two concerning subjects of hardware. This paper proposes a modular synthesis method to realize a reversible BCD-full adder (BCD-FA) and subtractor circuit. We propose three approaches to design and optimize all parts of a BCD-FA circuit using genetic algorithm and don't care concept. Our first approach is based on the Hafiz's work, and the second one is based on the whole BCD-FA circuit design. In the third approach, a binary to BCD converter is presented. Optimizations are done in terms of number of gates, number of garbage inputs/outputs, and the quantum cost of the circuit. We present four designs for BCD-FA with four different goals: minimum garbage inputs/outputs, minimum quantum cost, minimum number of gates, and optimum circuit in terms of all the above parameters.

Proceedings ArticleDOI
02 Jun 2009
TL;DR: In this article, the authors explored the optimization of parallel multipliers for Quantum-Dot Cellular Automata (QCA) using coplanar layouts and compared with other QCA multipliers (bit-serial and array multipliers).
Abstract: This paper explores the optimization of parallel multipliers for Quantum-Dot Cellular Automata. To reduce the complexity, multipliers are designed with quasi-modularity to accommodate large word sizes. The regular quasi-modular product method is used to make n × n multipliers using 4 (n/2 × n/2) modules. This may be continued with further decomposition to 16 (n/4×n/4) modules, etc. The last two rows in Wallace or Dadda reduction trees are summed by an adder that is 3n/2 – 1 bits long to produce the final product. This design is constructed using coplanar layouts and compared with other QCA multipliers (bit-serial and array multipliers). The delay, area and complexity are compared for several different operand sizes using the QCADesigner simulator.

Journal ArticleDOI
TL;DR: An overview of DFP arithmetic in IEEE 754-2008 is given and novel designs for a DFP adder and a D FP multifunction unit (DFP MFU) that comply with the standard are presented that are roughly 21 percent faster and 1.6 percent smaller than a previous DFP addition design, when implemented in the same technology.
Abstract: Decimal arithmetic is often used in commercial, financial, and Internet-based applications. Due to the growing importance of decimal floating-point (DFP) arithmetic, the IEEE 754-2008 Standard for Floating-Point Arithmetic (IEEE 754-2008) includes specifications for DFP arithmetic. IBM recently announced adding DFP instructions to their POWER6, z9, and z10 microprocessor architectures. As processor support for DFP arithmetic emerges, it is important to investigate efficient arithmetic algorithms and hardware designs for common DFP arithmetic operations. This paper gives an overview of DFP arithmetic in IEEE 754-2008 and discusses previous research on decimal fixed-point and floating-point addition. It also presents novel designs for a DFP adder and a DFP multifunction unit (DFP MFU) that comply with IEEE 754-2008. To reduce their delay, the DFP adder and MFU use decimal injection-based rounding, a new form of decimal operand alignment, and a fast flag-based method for rounding and overflow detection. Synthesis results indicate that the proposed DFP adder is roughly 21 percent faster and 1.6 percent smaller than a previous DFP adder design, when implemented in the same technology. Compared to the DFP adder, the DFP MFU provides six additional operations, yet only has 2.8 percent more delay and 9.7 percent more area. A pipelined version of the DFP MFU has a latency of six cycles, a throughput of one result per cycle, an estimated critical path delay of 12.9 fanout-of-four (F04) inverter delays, and an estimated area of 45, 681 NAND2 equivalent gates.