Showing papers on "Adder published in 2009"

PDF

Open Access

Journal Article•DOI•

Adder and Multiplier Design in Quantum-Dot Cellular Automata

[...]

H. Cho¹, Earl E. Swartzlander²•Institutions (2)

Qualcomm¹, University of Texas at Austin²

01 Jun 2009-IEEE Transactions on Computers

TL;DR: The unique QCA characteristics are utilizes to design a carry flow adder that is fast and efficient and the design of serial parallel multipliers is explored, which indicates very attractive performance.

...read moreread less

Abstract: Quantum-dot cellular automata (QCA) is an emerging nanotechnology, with the potential for faster speed, smaller size, and lower power consumption than transistor-based technology. Quantum-dot cellular automata has a simple cell as the basic element. The cell is used as a building block to construct gates and wires. Previously, adder designs based on conventional designs were examined for implementation with QCA technology. That work demonstrated that the design trade-offs are very different in QCA. This paper utilizes the unique QCA characteristics to design a carry flow adder that is fast and efficient. Simulations indicate very attractive performance (i.e., complexity, area, and delay). This paper also explores the design of serial parallel multipliers. A serial parallel multiplier is designed and simulated with several different operand sizes.

...read moreread less

342 citations

An enhanced low-power high-speed Adder For Error-Tolerant application

[...]

Ning Zhu¹, Wang Ling Goh¹, Kiat Seng Yeo¹•Institutions (1)

Nanyang Technological University¹

01 Dec 2009

TL;DR: In this article, a novel error-tolerant adversary, named the Error-Tolerant Adder (ETAII), has been proposed to overcome all possible errors in modern VLSI technology.

...read moreread less

Abstract: The occurrence of errors are inevitable in modern VLSI technology and to overcome all possible errors is an expensive task. It not only consumes a lot of power but degrades the speed performance. By adopting an emerging concept in VLSI design and test—Error- Tolerance (ET), we managed to develop a novel Error-Tolerant Adder which we named the Type II (ETAII). The circuit to some extent is able to ease the strict restriction on accuracy to achieve tremendous improvements in both the power consumption and speed performance. When compared to its conventional counterparts, the proposed ETAII is able to achieve more than 60% improvement in the Power-Delay Product (PDP). The proposed ETAII is an enhancement of our earlier design, the ETAI, which has problem adding small number inputs.

...read moreread less

173 citations

Journal Article•DOI•

Two new low-power Full Adders based on majority-not gates

[...]

Keivan Navi, Mohammad Hossein Moaiyeri¹, Reza Faghih Mirzaee¹, Omid Hashemipour, Babak Mazloom Nezhad - Show less +1 more•Institutions (1)

Shahid Beheshti University¹

01 Jan 2009-Microelectronics Journal

TL;DR: Two novel low-power 1-bit Full Adder cells are proposed, based on majority-not gates, which are designed with new methods in each cell, and demonstrate improvement in terms of power consumption and power-delay product (PDP).

...read moreread less

133 citations

Patent•

System and method for determination of a horizontal minimum of digital values

[...]

Rochelle L. Stortz¹, Raymond A. Bertram¹•Institutions (1)

VIA Technologies¹

26 Oct 2009

TL;DR: In this paper, a system for fast determination of a horizontal minimum of multiple digital values including a difference circuit and a compare circuit was proposed, where the first adder compares upper bits of a first digital value with lower bits of the second digital value and provides a second carry output.

...read moreread less

Abstract: A system for fast determination of a horizontal minimum of multiple digital values including a difference circuit and a compare circuit. The difference circuit may include first and second adders in which the first adder compares upper bits of a first digital value with upper bits of a second digital value and provides a first carry output and a propagate output. The second adder compares lower bits of the first digital value with lower bits of the second digital value and provides a second carry output. The compare circuit determines whether the first digital value is greater than the second digital value based on the carry and propagate outputs. Multiple difference circuits may be used to compare each of multiple digital values with every other digital value to provide corresponding compare bits, which are then used to determine a minimum one of the digital values and its corresponding location.

...read moreread less

123 citations

Journal Article•DOI•

Optimized reversible multiplier circuit

[...]

Majid Haghparast¹, Majid Mohammadi², Keivan Navi², Mohammad Eshghi²•Institutions (2)

Islamic Azad University¹, Shahid Beheshti University²

01 Apr 2009-Journal of Circuits, Systems, and Computers

TL;DR: Two new 4 × 4 bit reversible multiplier designs are presented which have lower hardware complexity, less garbage bits, less quantum cost and less constant inputs than previous ones, and can be generalized to construct efficient reversible n × n bit multipliers.

...read moreread less

Abstract: Reversible logic circuits have received significant attention in quantum computing, low power CMOS design, optical information processing, DNA computing, bioinformatics, and nanotechnology. This paper presents two new 4 × 4 bit reversible multiplier designs which have lower hardware complexity, less garbage bits, less quantum cost and less constant inputs than previous ones, and can be generalized to construct efficient reversible n × n bit multipliers. An implementation of reversible HNG is also presented. This implementation shows that the full adder design using HNG is one of the best designs in term of quantum cost. An implementation of MKG is also presented in order to have a fair comparison between our proposed reversible multiplier designs and the existing counterparts. The proposed reversible multipliers are optimized in terms of quantum cost, number of constant inputs, number of garbage outputs and hardware complexity. They can be used to construct more complex systems in nanotechnology.

...read moreread less

114 citations

Journal Article•DOI•

A novel low-power full-adder cell for low voltage

[...]

Keivan Navi¹, Mehrdad Maeen¹, Vahid Foroutan¹, Somayeh Timarchi¹, Omid Kavehei² - Show less +1 more•Institutions (2)

Shahid Beheshti University¹, University of Adelaide²

01 Sep 2009-Integration

TL;DR: A novel low-power majority function-based 1-bit full adder that uses MOS capacitors (MOSCAP) in its structure that can work reliably at low supply voltage and consumes 30% less power than transmission function adder (TFA) and is 1.11 times faster.

...read moreread less

107 citations

Journal Article•DOI•

Energy–Delay Optimization of 64-Bit Carry-Lookahead Adders With a 240 ps 90 nm CMOS Design Example

[...]

R. Zlatanovici¹, Sean Kao, Borivoje Nikolic²•Institutions (2)

Cadence Design Systems¹, University of California, Berkeley²

27 Jan 2009-IEEE Journal of Solid-state Circuits

TL;DR: A methodology for energy-delay optimization of digital circuits is presented and the result of the optimization is demonstrated on a design of the fastest adder found, a 240-ps Ling sparse domino adder in 1 V, 90 nm CMOS.

...read moreread less

Abstract: A methodology for energy-delay optimization of digital circuits is presented. This methodology is applied to minimizing the delay of representative carry-lookahead adders under energy constraints. Impact of various design choices, including the carry-lookahead tree structure and logic style, are analyzed in the energy-delay space and verified through optimization. The result of the optimization is demonstrated on a design of the fastest adder found, a 240-ps Ling sparse domino adder in 1 V, 90 nm CMOS. The optimality of the results is assessed against the impact of technology scaling.

...read moreread less

98 citations

Journal Article•DOI•

Time-Delay-Integration Architectures in CMOS Image Sensors

[...]

G. Lepage, Jan Bogaerts, Guy Meynants

29 Sep 2009-IEEE Transactions on Electron Devices

TL;DR: In this article, the authors studied the difficulty and challenges of implementing time-delay integration (TDI) functionality in a CMOS technology, including synchronization of the samples forming a TDI pixel, adder matrix outside the array, and addition noise.

...read moreread less

Abstract: Difficulty and challenges of implementing time-delay-integration (TDI) functionality in a CMOS technology are studied: synchronization of the samples forming a TDI pixel, adder matrix outside the array, and addition noise. Existing and new TDI sensor architecture concepts with snapshot shutter, rolling shutter, or orthogonal readout are presented. An optimization method is then introduced to inject modulation transfer function and quantum efficiency specification in the architecture definition. Moderate spatial and temporal oversamplings are combined to achieve near charge-coupled device (CCD) class performances, resulting in an acceptable design complexity. Finally, CCD and CMOS dynamic range and signal-to-noise ratio are conceptually compared.

...read moreread less

92 citations

Journal Article•DOI•

High current, 0.5-MA, fast, 100-ns, linear transformer driver experiments

[...]

Michael G. Mazarakis¹, William E. Fowler¹, A.A. Kim, V.A. Sinebryukhov, S.T. Rogowski¹, R. A. Sharpe¹, Dillon H. McDaniel¹, Craig L. Olson¹, John L. Porter¹, Kenneth W. Struve¹, William A. Stygar¹, Joseph Ray Woodworth¹ - Show less +8 more•Institutions (1)

Sandia National Laboratories¹

14 May 2009-Physical Review Special Topics-accelerators and Beams

TL;DR: The linear transformer driver (LTD) is a new method for constructing high-current, high-voltage pulsed accelerators as discussed by the authors, which switches and inductively adds the pulses at low voltage straight out of the capacitors through low inductance transfer and soft iron core isolation.

...read moreread less

Abstract: The linear transformer driver (LTD) is a new method for constructing high current, high-voltage pulsed accelerators. The salient feature of the approach is switching and inductively adding the pulses at low voltage straight out of the capacitors through low inductance transfer and soft iron core isolation. Sandia National Laboratories are actively pursuing the development of a new class of accelerator based on the LTD technology. Presently, the high current LTD experimental research is concentrated on two aspects: first, to study the repetition rate capabilities, reliability, reproducibility of the output pulses, switch prefires, jitter, electrical power and energy efficiency, and lifetime measurements of the cavity active components; second, to study how a multicavity linear array performs in a voltage adder configuration relative to current transmission, energy and power addition, and wall plug to output pulse electrical efficiency. Here we report the repetition rate and lifetime studies performed in the Sandia High Current LTD Laboratory. We first utilized the prototype � 0:4-MA, LTD I cavity which could be reliably operated up to � 90-kV capacitor charging. Later we obtained an improved 0.5-MA, LTD II version that can be operated at � 100 kV maximum charging voltage. The experimental results presented here were obtained with both cavities and pertain to evaluating the maximum achievable repetition rate and LTD cavity performance. The voltage adder experiments with a series of double sized cavities (1 MA, � 100 kV) will be reported in future publications.

...read moreread less

87 citations

Journal Article•DOI•

Focal-Plane Algorithmically-Multiplying CMOS Computational Image Sensor

[...]

Alireza Nilchi¹, J.N.Y. Aziz¹, Roman Genov¹•Institutions (1)

University of Toronto¹

27 May 2009-IEEE Journal of Solid-state Circuits

TL;DR: The CMOS image sensor computes two-dimensional convolution of video frames with a programmable digital kernel of up to 8 times 8 pixels in parallel directly on the focal plane and is experimentally validated in discrete wavelet transform (DWT) video compression and frame differencing.

...read moreread less

Abstract: The CMOS image sensor computes two-dimensional convolution of video frames with a programmable digital kernel of up to 8 times 8 pixels in parallel directly on the focal plane. Three operations, a temporal difference, a multiplication and an accumulation are performed for each pixel readout. A dual-memory pixel stores two video frames. Selective pixel output sampling controlled by binary kernel coefficients implements binary-analog multiplication. Cross-pixel column-parallel bit-level accumulation and frame differencing are implemented by switched-capacitor integrators. Binary-weighted summation and concurrent quantization is performed by a bank of column-parallel multiplying analog-to-digital converters (MADCs). A simple digital adder performs row-wise accumulation during ADC readout. A 128 times 128 active pixel array integrated with a bank of 128 MADCs was fabricated in a 0.35 mum standard CMOS technology. The 4.4 mm times 2.9 mm prototype is experimentally validated in discrete wavelet transform (DWT) video compression and frame differencing.

...read moreread less

73 citations

Journal Article•DOI•

Self-Measurement of Combinatorial Circuit Delays in FPGAs

[...]

Justin S. J. Wong¹, P. Sedcole¹, Peter Y. K. Cheung¹•Institutions (1)

Imperial College London¹

01 Jun 2009-ACM Transactions on Reconfigurable Technology and Systems

TL;DR: A Built-In Self-Test (BIST) method to accurately measure the combinatorial circuit delays on an FPGA, paving the way for matching timing requirements in designs to FPGAs as a means of combating the problem of process variations.

...read moreread less

Abstract: This article proposes a Built-In Self-Test (BIST) method to accurately measure the combinatorial circuit delays on an FPGA. The flexibility of the on-chip clock generation capability found in modern FPGAs is employed to step through a range of frequencies until timing failure in the combinatorial circuit is detected. In this way, the delay of any combinatorial circuit can be determined with a timing resolution of the order of picoseconds. Parallel and optimized implementations of the method for self-characterization of the delay of all the LUTs on an FPGA are also proposed. The method was applied to Altera Cyclone II and III FPGAs . A complete self-characterization of LUTs on a Cyclone II was achieved in 2.5 seconds, utilizing only 13kbit of block RAM to store the results. More extensive tests were carried out on the Cyclone III and the delays of adder circuits and embedded multiplier blocks were successfully measured. This self-measurement method paves the way for matching timing requirements in designs to FPGAs as a means of combating the problem of process variations.

...read moreread less

Proceedings Article•DOI•

Novel low power full adder cells in 180nm CMOS technology

[...]

Dan Wang¹, Maofeng Yang¹, Wu Cheng¹, Xuguang Guan¹, Zhangming Zhu¹, Yintang Yang¹ - Show less +2 more•Institutions (1)

Xidian University¹

25 May 2009

TL;DR: In comparison with Static Energy Recovery Full (SERF) adder cell module, the proposed four full adder cells demonstrate their advantages, including lower power consumption, smaller area, and higher speed.

...read moreread less

Abstract: This paper proposes four low power adder cells using different XOR and XNOR gate architectures. Two sets of circuit designs are presented. One implements full adders with 3 transistors (3-T) XOR and XNOR gates. The other applies Gate-Diffusion-Input (GDI) technique to full adders. Simulations are performed by using Hspice based on 180nm CMOS technology. In comparison with Static Energy Recovery Full (SERF) adder cell module, the proposed four full adder cells demonstrate their advantages, including lower power consumption, smaller area, and higher speed.

...read moreread less

Patent•

Image pickup apparatus

[...]

Keiichi Sakurai¹, Takaaki Yui¹•Institutions (1)

Casio¹

08 Jan 2009

TL;DR: In this article, a digital camera including an image adder 5d for synthesizing a plurality of continuously taken image frames to produce a synthesized image, an image processing apparatus 5 for executing image brightness adjusting processing, and a display device for displaying an image which is being synthesized by the adder in the image brightness adjustment processing.

...read moreread less

Abstract: An image pickup apparatus is provided which informs a user of a state of brightness adjusting operation and which has excellent usability for the user. Disclosed is a digital camera including an image adder 5d for synthesizing a plurality of continuously taken image frames to produce a synthesized image, an image processing apparatus 5 for executing image brightness adjusting processing for synthesizing a required synthesis number of image frames and adjusting brightness of the synthesized image at the time of continuous picture-taking of a subject, and a display device for displaying an image which is being synthesized by the image adder in the image brightness adjusting processing.

...read moreread less

Journal Article•DOI•

BZ-FAD: A Low-Power Low-Area Multiplier Based on Shift-and-Add Architecture

[...]

M. Mottaghi-Dastjerdi¹, Ali Afzali-Kusha¹, Massoud Pedram²•Institutions (2)

University of Tehran¹, University of Southern California²

01 Feb 2009-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed, which considerably lowers the switching activity of conventional multipliers.

...read moreread less

Abstract: In this paper, a low-power structure called bypass zero, feed A directly (BZ-FAD) for shift-and-add multipliers is proposed. The architecture considerably lowers the switching activity of conventional multipliers. The modifications to the multiplier which multiplies A by B include the removal of the shifting the B register, direct feeding of A to the adder, bypassing the adder whenever possible, using a ring counter instead of a binary counter and removal of the partial product shift. The architecture makes use of a low-power ring counter proposed in this work. Simulation results for 32-bit radix-2 multipliers show that the BZ-FAD architecture lowers the total switching activity up to 76% and power consumption up to 30% when compared to the conventional architecture. The proposed multiplier can be used for low-power applications where the speed is not a primary design parameter.

...read moreread less

Patent•

Solid-state image pickup device having analog-digital converters configured to sum values of multiple pixels in the array and method for driving the same

[...]

Yoshikazu Nitta¹, Noriyuki Fukushima¹, Yoshinori Muramatsu¹, Yukihiko Yasui¹•Institutions (1)

Sony Broadcast & Professional Research Laboratories¹

06 Apr 2009

TL;DR: In this article, column-parallel ADCs are used to add pixels in a plurality of rows without using additional circuits, such as an adder and a line memory device, and the frame rate can be increased while maintaining constant sensitivity.

...read moreread less

Abstract: A CMOS image sensor includes column-parallel ADCs. Each of the ADCs includes a comparator and an up/down counter. With this configuration, digital values of pixels in a plurality of rows can be added without using additional circuits, such as an adder and a line memory device, and the frame rate can be increased while maintaining constant sensitivity.

...read moreread less

Proceedings Article•DOI•

Testing Circuit-Partitioned 3D IC Designs

[...]

Dean L. Lewis¹, Hsien-Hsin S. Lee¹•Institutions (1)

Georgia Institute of Technology¹

13 May 2009

TL;DR: This work proposed new test methods for designs partitioned at the circuits level, in which the gates and transistors of individual circuits could be split across multiple die layers, which represents the most difficult circuit-partitioned design to test pre-bond.

...read moreread less

Abstract: 3D integration is an emerging technology that allows for the vertical stacking of multiple silicon die. These stacked die are tightly integrated with through-silicon vias and promise significant power and area reductions by replacing long global wires with short vertical connections. This technology necessitates that neighboring logical blocks exist on different layers in the stack. However, such functional partitions disable intra-chip communication pre-bond and thus disrupt traditional test techniques.Previous work has described a general test architecture that enables pre-bond testability of an architecturally partitioned 3D processor and provided mechanisms for basic layer functionality. This work proposes new test methods for designs partitioned at the circuits level,in which the gates and transistors of individual circuits could be split across multiple die layers. We investigated a bit-partitioned adder unit and a port-split register file, which represents the most difficult circuit-partitioned design to test pre-bond but which is used widely in many circuits. Two layouts of each circuit, planar and 3D, are produced. Our experiments verify the performance and power results and examine the test coverage achieved.

...read moreread less

Proceedings Article•DOI•

Exploiting fast carry-chains of FPGAs for designing compressor trees

[...]

Hadi Parandeh-Afshar¹, Philip Brisk¹, Paolo Ienne¹•Institutions (1)

École Polytechnique¹

29 Sep 2009

TL;DR: This paper demonstrates that the carry chains can be used to build compressor trees, i.e., multi-input addition circuits used for parallel accumulation and partial product reduction for parallel multipliers implemented in FPGA logic.

...read moreread less

Abstract: Fast carry chains featuring dedicated adder circuitry is a distinctive feature of modern FPGAs. The carry chains bypass the general routing network and are embedded in the logic blocks of FPGAs for fast addition. Conventional intuition is that such carry chains can be used only for implementing carry-propagate addition; state-of-the-art FPGA synthesizers can only exploit the carry chains for these specific circuits. This paper demonstrates that the carry chains can be used to build compressor trees, i.e., multi-input addition circuits used for parallel accumulation and partial product reduction for parallel multipliers implemented in FPGA logic. The key to our technique is to program the lookup tables (LUTs) in the logic blocks to stop the propagation of carry bits along the carry chain at appropriate points. This approach improves the area of compressor trees significantly compared to previous methods that synthesized compressor trees solely on LUTs, without compromising the performance gain over trees built from ternary carry-propagate adders.

...read moreread less

Journal Article•DOI•

A 300 mV 494GOPS/W Reconfigurable Dual-Supply 4-Way SIMD Vector Processing Accelerator in 45 nm CMOS

[...]

Himanshu Kaul¹, Mark A. Anders¹, Sanu Mathew¹, Steven K. Hsu¹, Amit Agarwal¹, Ram Krishnamurthy¹, Shekhar Borkar¹ - Show less +3 more•Institutions (1)

Intel¹

29 May 2009

TL;DR: This paper describes a reconfigurable 4-way SIMD engine fabricated in 45 nm high-k/metal-gate CMOS, targeted for on-die acceleration of vector processing in power-constrained mobile microprocessors.

...read moreread less

Abstract: High-throughput parallel SIMD vector computations are the most performance and power-critical operations in multimedia, graphics and signal processing workloads. An array of SIMD vector processing engines delivers high-throughput short bit-width arithmetic operations on large data sets with orders of magnitude higher energy efficiencies vs. general-purpose cores [1, 2]. A reconfigurable 4-way SIMD engine targeted for on-die acceleration of vector processing in power-constrained mobile microprocessors is fabricated in 45nm High-K/Metal-gate CMOS [3]. The accelerator is reconfigured to perform 4-way 16b×16b multiplies, 32b×32b multiply, 4-way 16b additions, 2-way 32b additions and 72b addition with single-cycle throughput and wide dynamic supply voltage range of operation (1.3V to 230mV). A reconfigurable 2×2 tile of signed 2's complement 16b multipliers, with conditional carry gating in the 72b sparse tree adder, dual-supplies for voltage hopping, and fine-grained power-gating enables peak energy efficiency of 494GOPS/W (measured at 300mV, 50°C) with a dense layout occupying 0.081mm2 (Fig. 14.6.7) while achieving: (i) scalable performance up to 2.8GHz, 278mW measured at 1.3V, (ii) fast single-cycle switching between any operating/idle mode, (iii) configuration-dependent power consumption with 41% total power reduction and 6.5× active leakage power savings, (iv) 10× standby leakage reduction during sleep mode, (v) deep subthreshold operation measured at 230mV, 8.8MHz, 87µW, and (vi) compensation for up to 3× performance variation in ultra-low voltage mode.

...read moreread less

Journal Article•DOI•

High-Level Synthesis Algorithm for the Design of Reconfigurable Constant Multiplier

[...]

Jiajia Chen, Chip-Hong Chang¹•Institutions (1)

Nanyang Technological University¹

01 Dec 2009-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A new high-level design methodology for RCM by scheduling each mobile adder into a control step within its legitimate time window with the minimum opportunity cost, mutually exclusive adders can be merged with significantly reduced adder and multiplexing cost.

...read moreread less

Abstract: Multiplying a signal by a known constant is an essential operation in digital signal processing algorithms. In many application scenarios, an input or output signal is repeatedly multiplied by several predefined constants at different instances. These temporal redundancies can be exploited for the design of an efficient reconfigurable constant multiplier (RCM). An RCM achieves greater hardware savings than the conventional multiple constant multiplication architecture, limited only by the available latency of the subsystem. Motivated by a number of lucrative examples, this paper presents a new high-level design methodology for RCM. Common subexpressions in the preset constants represented in minimum signed-digit system are first eliminated to obtain a minimum depth multiroot directed acyclic graph (DAG). The DAG is converted into a primitive data flow graph (DFG) where mobile adders are identified. By scheduling each mobile adder into a control step within its legitimate time window with the minimum opportunity cost, mutually exclusive adders can be merged with significantly reduced adder and multiplexing cost. The opportunity cost for each scheduling decision is assessed by the probability displacement and disparity measures of the scheduled node as well as its predecessors and successors in the DFG. The algorithm is runtime efficient as exhaustive search for the best fusion of independently optimized constant multipliers has been avoided. Simulation results on randomly generated 12-b constant sets show that the solutions generated by the proposed algorithm are on average 19% to 25% more area-time efficient than the best reported solutions.

...read moreread less

Journal Article•DOI•

Arithmetic Circuits of Redundant SUT-RNS

[...]

Somayeh Timarchi¹, Keivan Navi¹•Institutions (1)

Shahid Beheshti University¹

11 Aug 2009-IEEE Transactions on Instrumentation and Measurement

TL;DR: A recently proposed class of high-radix redundant RNS based on the stored-unibit-transfer representation for modulo 2n + 1 that improves the power-delay-product performance of conventional redundant R NS is discussed.

...read moreread less

Abstract: The residue number system (RNS) is suitable for implementing high-speed digital processing devices because it supports parallel, modular, fault-tolerant, and carry-bounded arithmetic. The carry propagation is restricted to inside the modulus. The remaining intramoduli carry propagation limits the speed of arithmetic operation. Therefore, the carry-free property of a redundant arithmetic can be used. In this paper, we discuss a recently proposed class of high-radix redundant RNS based on the stored-unibit-transfer representation for modulo 2n + 1 that improves the power-delay-product performance of conventional redundant RNS. In addition, subtraction and multiplication circuits are designed in the proposed system.

...read moreread less

Journal Article•DOI•

Variation-Aware Low-Power Synthesis Methodology for Fixed-Point FIR Filters

[...]

Jung Hwan Choi¹, N. Banerjee², Kaushik Roy¹•Institutions (2)

Purdue University¹, Intel²

01 Jan 2009-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: A novel finite-impulse response (FIR) filter synthesis technique that allows for aggressive voltage scaling by exploiting the fact that all filter coefficients are not equally important to obtain a ldquoreasonably accuraterdquo filter response is presented.

...read moreread less

Abstract: In this paper, we present a novel finite-impulse response (FIR) filter synthesis technique that allows for aggressive voltage scaling by exploiting the fact that all filter coefficients are not equally important to obtain a ldquoreasonably accuraterdquo filter response. Our technique implements a level-constrained common-subexpression-elimination algorithm, where we can constrain the number of adder levels (ALs) required to compute each of the coefficient outputs. By specifying a tighter constraint (in terms of the number of adders in the critical path) on the important coefficients, we ensure that the later computational steps compute only the less important coefficient outputs. In case of delay variations due to voltage scaling and/or process variations, only the less important outputs are affected, resulting in graceful degradation of filter quality. The proposed architecture, therefore, lends itself to aggressive voltage scaling for low-power dissipation even under process parameter variations. Under extreme process variation and supply voltage scaling (0.8 V), filters implemented in the predictive technology model (PTM) 70 nm technology show an average power savings of 25%-30% with minor degradation in filter response in terms of normalized passband/stopband ripple (0.02 at a scaled voltage of 0.8 V compared with 0.005 at a nominal supply).

...read moreread less

Journal Article•DOI•

Dynamic MAC-based architecture of artificial neural networks suitable for hardware implementation on FPGAs

[...]

Nadia Nedjah¹, R. M. da Silva¹, Luiza de Macedo Mourelle¹, M. V. C. da Silva¹•Institutions (1)

Rio de Janeiro State University¹

01 Jun 2009-Neurocomputing

TL;DR: A hardware architecture for ANNs that takes advantage of the dedicated adder blocks, commonly called MACs to compute both the weighted sum and the activation function, which is as fast as existing ones as it is massively parallel.

...read moreread less

Proceedings Article•DOI•

A fault tolerant, area efficient architecture for Shor's factoring algorithm

[...]

Mark Whitney¹, Nemanja Isailovic¹, Yatish Patel¹, John Kubiatowicz¹•Institutions (1)

University of California, Berkeley¹

20 Jun 2009

TL;DR: In this article, the area and latency of Shor's factoring were optimized by balancing the use of ancilla generators, error correction, and tuning the core adder circuits.

...read moreread less

Abstract: We optimize the area and latency of Shor's factoring while simultaneously improving fault tolerance through: (1) balancing the use of ancilla generators, (2) aggressive optimization of error correction, and (3) tuning the core adder circuits. Our custom CAD flow produces detailed layouts of the physical components and utilizes simulation to analyze circuits in terms of area, latency, and success probability. We introduce a metric, called ADCR, which is the probabilistic equivalent of the classic Area-Delay product. Our error correction optimization can reduce ADCR by order of magnitude or more. Contrary to conventional wisdom, we show that the area of an optimized quantum circuit is not dominated exclusively by errorcorrection. Further, our adder evaluation shows that quantum carry-lookahead adders (QCLA) beat ripple-carry adders in ADCR, despite being larger and more complex. We conclude with what we believe is one of most accurate estimates of the area and latency required for 1024-bit Shor's factorization: 7659 mm2 for the smallest circuit and 6 x 108 seconds for the fastest circuit.

...read moreread less

Journal Article•DOI•

Two novel ultra high speed carbon nanotube Full-Adder cells

[...]

Keivan Navi¹, Amir Momeni¹, Fazel Sharifi¹, Peiman Keshavarzian¹•Institutions (1)

Shahid Beheshti University¹

01 Jan 2009-IEICE Electronics Express

TL;DR: Two ultra high speed carbon nanotube Full-Adder cells are presented and simulation results illustrate significant improvement in terms of speed and Power-Delay Product (PDP).

...read moreread less

Abstract: In this paper two ultra high speed carbon nanotube Full-Adder cells are presented First design uses two transistors, two resistors and seven capacitors and the second one uses four transistors and seven capacitors The first design is faster and the second one consumes less power Simulation results illustrate significant improvement in terms of speed and Power-Delay Product (PDP)

...read moreread less

Journal Article•DOI•

Quaternary Galois field adder based all-optical multivalued logic circuits

[...]

Tanay Chattopadhyay¹, Chinmoy Taraphdar², Jitendra Nath Roy¹•Institutions (2)

College of Engineering and Management, Kolaghat¹, Bankura Christian College²

01 Aug 2009-Applied Optics

TL;DR: Galois field (GF) algebraic expressions have been found to be promising choices for reversible and quantum implementation of multivalued logic and for the first time to the authors' knowledge, GF(4) adderMultivalued (four valued) logic circuits in an all-optical domain are developed.

...read moreread less

Abstract: Galois field (GF) algebraic expressions have been found to be promising choices for reversible and quantum implementation of multivalued logic. For the first time to our knowledge, we developed GF(4) adder multivalued (four valued) logic circuits in an all-optical domain. The principle and possibilities of an all-optical GF(4) adder circuit are described. The theoretical model is presented and verified through numerical simulation. The quaternary inverter, successor, clockwise cycle, and counterclockwise cycle gates are proposed with the help of the all-optical GF(4) adder circuit. In this scheme different quaternary logical states are represented by different polarized light. A terahertz optical asymmetric demultiplexer interferometric switch plays an important role in this scheme.

...read moreread less

Performance Analysis of 32-Bit Array Multiplier with a Carry Save Adder and with a Carry-Look- Ahead Adder

[...]

Raminder Preet, Pal Singh, Parveen Kumar, Balwinder Singh, Sri Sai, Dept Ece - Show less +2 more

01 Jan 2009

TL;DR: Design of two different array multipliers are presented, one by using carry-look-ahead (CLA) logic for addition of partial product terms and another by introducing Carry Save Adder (CSA) in partial product lines.

...read moreread less

Abstract: In this paper, design of two different array multipliers are presented, one by using carry-look-ahead (CLA) logic for addition of partial product terms and another by introducing Carry Save Adder (CSA) in partial product lines. The multipliers presented in this paper were all modeled using VHDL (Very High Speed Integration Hardware Description Language) for 32-bit unsigned data. The comparison is done on the basis of three performance parameters i.e. Area, Speed and Power consumption. To design an efficient integrated circuit in terms of area, power and speed, has become a challenging task in modern VLSI design field. Previously in the literature, performance analysis was carried out between multiplier using Ripple carry adder (RCA) and by using CLA. In this work, same multiplier is designed by using CSA logic and compare it's performance with the multiplier designed by using CLA logic. Multiplier with CSA gives better result in terms of speed (78.3% improvement), area (reduced by 4.2%) and power consumption (decreased by 1.4%).

...read moreread less

Posted Content•

A Fault Tolerant, Area Efficient Architecture for Shor's Factoring Algorithm

[...]

Mark Whitney¹, Nemanja Isailovic¹, Yatish Patel¹, John Kubiatowicz¹•Institutions (1)

University of California, Berkeley¹

11 Sep 2009-arXiv: Quantum Physics

TL;DR: It is shown that the area of an optimized quantum circuit is not dominated exclusively by error correction, and quantum carry-lookahead adders (QCLA) beat ripple-carry adders in ADCR, despite being larger and more complex.

...read moreread less

Abstract: We optimize the area and latency of Shor's factoring while simultaneously improving fault tolerance through: (1) balancing the use of ancilla generators, (2) aggressive optimization of error correction, and (3) tuning the core adder circuits. Our custom CAD flow produces detailed layouts of the physical components and utilizes simulation to analyze circuits in terms of area, latency, and success probability. We introduce a metric, called ADCR, which is the probabilistic equivalent of the classic Area-Delay product. Our error correction optimization can reduce ADCR by an order of magnitude or more. Contrary to conventional wisdom, we show that the area of an optimized quantum circuit is not dominated exclusively by error correction. Further, our adder evaluation shows that quantum carry-lookahead adders (QCLA) beat ripple-carry adders in ADCR, despite being larger and more complex. We conclude with what we believe is one of most accurate estimates of the area and latency required for 1024-bit Shor's factorization: 7659 mm$^{2}$ for the smallest circuit and $6 * 10^8$ seconds for the fastest circuit.

...read moreread less

Journal Article•DOI•

Minimization and optimization of reversible bcd-full adder/subtractor using genetic algorithm and don't care concept

[...]

Majid Mohammadi¹, Majid Haghparast², Mohammad Eshghi¹, Keivan Navi¹•Institutions (2)

Shahid Beheshti University¹, Islamic Azad University²

01 Aug 2009-International Journal of Quantum Information

TL;DR: A modular synthesis method to realize a reversible BCD-full adder (BCD-FA) and subtractor circuit using genetic algorithm and don't care concept is proposed and a binary to BCD converter is presented.

...read moreread less

Abstract: Reversible logic and binary coded decimal (BCD) arithmetic are two concerning subjects of hardware. This paper proposes a modular synthesis method to realize a reversible BCD-full adder (BCD-FA) and subtractor circuit. We propose three approaches to design and optimize all parts of a BCD-FA circuit using genetic algorithm and don't care concept. Our first approach is based on the Hafiz's work, and the second one is based on the whole BCD-FA circuit design. In the third approach, a binary to BCD converter is presented. Optimizations are done in terms of number of gates, number of garbage inputs/outputs, and the quantum cost of the circuit. We present four designs for BCD-FA with four different goals: minimum garbage inputs/outputs, minimum quantum cost, minimum number of gates, and optimum circuit in terms of all the above parameters.

...read moreread less

Proceedings Article•DOI•

Parallel multipliers for Quantum-Dot Cellular Automata

[...]

Seong-Wan Kim¹, Earl E. Swartzlander¹•Institutions (1)

University of Texas at Austin¹

02 Jun 2009

TL;DR: In this article, the authors explored the optimization of parallel multipliers for Quantum-Dot Cellular Automata (QCA) using coplanar layouts and compared with other QCA multipliers (bit-serial and array multipliers).

...read moreread less

Abstract: This paper explores the optimization of parallel multipliers for Quantum-Dot Cellular Automata. To reduce the complexity, multipliers are designed with quasi-modularity to accommodate large word sizes. The regular quasi-modular product method is used to make n × n multipliers using 4 (n/2 × n/2) modules. This may be continued with further decomposition to 16 (n/4×n/4) modules, etc. The last two rows in Wallace or Dadda reduction trees are summed by an adder that is 3n/2 – 1 bits long to produce the final product. This design is constructed using coplanar layouts and compared with other QCA multipliers (bit-serial and array multipliers). The delay, area and complexity are compared for several different operand sizes using the QCADesigner simulator.

...read moreread less

Journal Article•DOI•

Hardware Designs for Decimal Floating-Point Addition and Related Operations

[...]

Liang-Kai Wang¹, Michael J. Schulte², J.D. Thompson³, N. Jairam⁴•Institutions (4)

Advanced Micro Devices¹, University of Wisconsin-Madison², Cray³, Intel⁴

01 Mar 2009-IEEE Transactions on Computers

TL;DR: An overview of DFP arithmetic in IEEE 754-2008 is given and novel designs for a DFP adder and a D FP multifunction unit (DFP MFU) that comply with the standard are presented that are roughly 21 percent faster and 1.6 percent smaller than a previous DFP addition design, when implemented in the same technology.

...read moreread less

Abstract: Decimal arithmetic is often used in commercial, financial, and Internet-based applications. Due to the growing importance of decimal floating-point (DFP) arithmetic, the IEEE 754-2008 Standard for Floating-Point Arithmetic (IEEE 754-2008) includes specifications for DFP arithmetic. IBM recently announced adding DFP instructions to their POWER6, z9, and z10 microprocessor architectures. As processor support for DFP arithmetic emerges, it is important to investigate efficient arithmetic algorithms and hardware designs for common DFP arithmetic operations. This paper gives an overview of DFP arithmetic in IEEE 754-2008 and discusses previous research on decimal fixed-point and floating-point addition. It also presents novel designs for a DFP adder and a DFP multifunction unit (DFP MFU) that comply with IEEE 754-2008. To reduce their delay, the DFP adder and MFU use decimal injection-based rounding, a new form of decimal operand alignment, and a fast flag-based method for rounding and overflow detection. Synthesis results indicate that the proposed DFP adder is roughly 21 percent faster and 1.6 percent smaller than a previous DFP adder design, when implemented in the same technology. Compared to the DFP adder, the DFP MFU provides six additional operations, yet only has 2.8 percent more delay and 9.7 percent more area. A pipelined version of the DFP MFU has a latency of six cycles, a throughput of one result per cycle, an estimated critical path delay of 12.9 fanout-of-four (F04) inverter delays, and an estimated area of 45, 681 NAND2 equivalent gates.

...read moreread less

Collapse