Showing papers on "Adder published in 2017"

PDF

Open Access

Proceedings Article•DOI•

EvoApproxSb: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods

[...]

Vojtech Mrazek¹, Radek Hrbacek¹, Zdenek Vasicek¹, Lukas Sekanina¹•Institutions (1)

27 Mar 2017

TL;DR: The EvoApprox8b library provides Verilog, Matlab and C models of all approximate circuits and the error is given for seven different error metrics.

...read moreread less

Abstract: Approximate circuits and approximate circuit design methodologies attracted a significant attention of researchers as well as industry in recent years. In order to accelerate the approximate circuit and system design process and to support a fair benchmarking of circuit approximation methods, we propose a library of approximate adders and multipliers called EvoApprox8b. This library contains 430 non-dominated 8-bit approximate adders created from 13 conventional adders and 471 non-dominated 8-bit approximate multipliers created from 6 conventional multipliers. These implementations were evolved by a multi-objective Cartesian genetic programming. The EvoApprox8b library provides Verilog, Matlab and C models of all approximate circuits. In addition to standard circuit parameters, the error is given for seven different error metrics. The EvoApprox8b library is available at: www.fit.vutbr.cz/research/groups/ehw/approxlib

...read moreread less

241 citations

Proceedings Article•DOI•

Low complexity schemes for the random access Gaussian channel

[...]

Or Ordentlich¹, Yury Polyanskiy¹•Institutions (1)

Massachusetts Institute of Technology¹

25 Jun 2017

TL;DR: An uncoordinated Gaussian multiple access channel with a relatively large number of active users within each block is considered, and a low complexity coding scheme is proposed, which is based on a combination of compute-and-forward and coding for a binary adder channel.

...read moreread less

Abstract: We consider an uncoordinated Gaussian multiple access channel with a relatively large number of active users within each block. A low complexity coding scheme is proposed, which is based on a combination of compute-and-forward and coding for a binary adder channel. For a wide regime of parameters of practical interest, the energy-per-bit required by each user in the proposed scheme is significantly smaller than that required by popular solutions such as slotted-ALOHA and treating interference as noise.

...read moreread less

216 citations

Journal Article•DOI•

Large-scale design of robust genetic circuits with multiple inputs and outputs for mammalian cells.

[...]

Benjamin H. Weinberg¹, N. T. Hang Pham¹, Leidy D. Caraballo¹, Thomas Lozanoski¹, Adrien Engel¹, Adrien Engel², Swapnil Bhatia¹, Wilson W. Wong¹ - Show less +4 more•Institutions (2)

Boston University¹, ETH Zurich²

27 Mar 2017-Nature Biotechnology

TL;DR: This work presents a robust, general, scalable system, called 'Boolean logic and arithmetic through DNA excision' (BLADE), to engineer genetic circuits with multiple inputs and outputs in mammalian cells with minimal optimization.

...read moreread less

Abstract: Engineered genetic circuits for mammalian cells often require extensive fine-tuning to perform as intended. We present a robust, general, scalable system, called 'Boolean logic and arithmetic through DNA excision' (BLADE), to engineer genetic circuits with multiple inputs and outputs in mammalian cells with minimal optimization. The reliability of BLADE arises from its reliance on recombinases under the control of a single promoter, which integrates circuit signals on a single transcriptional layer. We used BLADE to build 113 circuits in human embryonic kidney and Jurkat T cells and devised a quantitative, vector-proximity metric to evaluate their performance. Of 113 circuits analyzed, 109 functioned (96.5%) as intended without optimization. The circuits, which are available through Addgene, include a 3-input, two-output full adder; a 6-input, one-output Boolean logic look-up table; circuits with small-molecule-inducible control; and circuits that incorporate CRISPR-Cas9 to regulate endogenous genes. BLADE enables execution of sophisticated cellular computation in mammalian cells, with applications in cell and tissue engineering.

...read moreread less

209 citations

Journal Article•DOI•

A Review, Classification, and Comparative Evaluation of Approximate Arithmetic Circuits

[...]

Honglan Jiang¹, Cong Liu¹, Leibo Liu², Fabrizio Lombardi³, Jie Han¹ - Show less +1 more•Institutions (3)

University of Alberta¹, Tsinghua University², Northeastern University³

11 Aug 2017-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: A review and classification are presented for the current designs of approximate arithmetic circuits including adders, multipliers, and dividers including improvements in delay, power, and area for the detection of differences in images by using approximate dividers.

...read moreread less

Abstract: Often as the most important arithmetic modules in a processor, adders, multipliers, and dividers determine the performance and energy efficiency of many computing tasks. The demand of higher speed and power efficiency, as well as the feature of error resilience in many applications (e.g., multimedia, recognition, and data analytics), have driven the development of approximate arithmetic design. In this article, a review and classification are presented for the current designs of approximate arithmetic circuits including adders, multipliers, and dividers. A comprehensive and comparative evaluation of their error and circuit characteristics is performed for understanding the features of various designs. By using approximate multipliers and adders, the circuit for an image processing application consumes as little as 47% of the power and 36% of the power-delay product of an accurate design while achieving similar image processing quality. Improvements in delay, power, and area are obtained for the detection of differences in images by using approximate dividers.

...read moreread less

197 citations

Journal Article•DOI•

Light-Gated Memristor with Integrated Logic and Memory Functions

[...]

Hongwei Tan, Gang Liu, Huali Yang, Xiaohui Yi, Liang Pan, Jie Shang, Shibing Long¹, Ming Liu¹, Yihong Wu², Run-Wei Li - Show less +6 more•Institutions (2)

Chinese Academy of Sciences¹, National University of Singapore²

18 Oct 2017-ACS Nano

TL;DR: The memlogic (memory logic) is proposed and demonstrated as a nonvolatile switch of logic operations integrated with memory function in a single light-gated memristor, able to achieve optical and electrical mixed basic Boolean logic of reconfigurable "AND", "OR", and "NOT" operations.

...read moreread less

Abstract: Memristive devices are able to store and process information, which offers several key advantages over the transistor-based architectures. However, most of the two-terminal memristive devices have fixed functions once made and cannot be reconfigured for other situations. Here, we propose and demonstrate a memristive device “memlogic” (memory logic) as a nonvolatile switch of logic operations integrated with memory function in a single light-gated memristor. Based on nonvolatile light-modulated memristive switching behavior, a single memlogic cell is able to achieve optical and electrical mixed basic Boolean logic of reconfigurable “AND”, “OR”, and “NOT” operations. Furthermore, the single memlogic cell is also capable of functioning as an optical adder and digital-to-analog converter. All the memlogic outputs are memristive for in situ data storage due to the nonvolatile resistive switching and persistent photoconductivity effects. Thus, as a memdevice, the memlogic has potential for not only simplifying ...

...read moreread less

152 citations

Journal Article•DOI•

Probabilistic Error Modeling for Approximate Adders

[...]

Sana Mazahir¹, Osman Hasan¹, Rehan Hafiz², Muhammad Shafique³, Jorg Henkel⁴ - Show less +1 more•Institutions (4)

University of the Sciences¹, Information Technology University², Vienna University of Technology³, Karlsruhe Institute of Technology⁴

01 Mar 2017-IEEE Transactions on Computers

TL;DR: A generic methodology for analytical modeling of probability of occurrence of error and the Probability Mass Function of error value in a selected class of approximate adders is presented, which can serve as performance metrics for the comparative analysis of various adders and their configurations.

...read moreread less

Abstract: Approximate adders are widely being advocated as a means to achieve performance gain in error resilient applications. In this paper, a generic methodology for analytical modeling of probability of occurrence of error and the Probability Mass Function (PMF) of error value in a selected class of approximate adders is presented, which can serve as performance metrics for the comparative analysis of various adders and their configurations. The proposed model is applicable to approximate adders that comprise of sub-adder units of uniform as well as non-uniform lengths. Using a systematic methodology, we derive closed form expressions for the probability of error for a number of state-of-the-art high-performance approximate adders. The probabilistic analysis is carried out for arbitrary input distributions. It can be used to study the dependence of error statistics in an adder’s output on its configuration and input distribution. Moreover, it is shown that by building upon the proposed error model, we can estimate the probability of error in circuits with multiple approximate adders. We also demonstrate that, using the proposed analysis, the comparative performance of different approximate adders can be correctly predicted in practical applications of image processing.

...read moreread less

88 citations

Journal Article•DOI•

Towards coplanar quantum-dot cellular automata adders based on efficient three-input XOR gate

[...]

Moslem Balali, Abdalhossein Rezai, Haideh Balali¹, Faranak Rabiei², Saeid Emadi - Show less +1 more•Institutions (2)

Isfahan University of Technology¹, Universiti Putra Malaysia²

01 Jan 2017-Results in physics

TL;DR: A novel 3-input XOR gate structure is proposed based on half distance and cell interaction that promises extra low-power, extremely dense and high-speed structures at a nano scale and indicates the efficiency and robustness of the proposed designs.

...read moreread less

Abstract: Quantum-dot cellular automata (QCA), which is a candidate technology to replace CMOS technology, promises extra low-power, extremely dense and high-speed structures at a nano scale. In this paper, a novel 3-input XOR gate structure is proposed based on half distance and cell interaction. Accordingly, a low-complexity and high-speed QCA one-bit full adder is designed by employing the proposed 3-input QCA XOR gate. Then a new 4-bit QCA Ripple Carry Adder (RCA) is proposed based on the proposed 3-input QCA XOR gate. The proposed designs are simulated using the both coherence and bi-stable simulation engines of QCADesigner version 2.0.3. Our simulation results indicate the efficiency and robustness of the proposed designs. The simulation results show 50% area improvement for the proposed 3-input XOR gate, 76% and 50% improvements in terms of cell count and latency, respectively for the proposed robust QCA full-adder, 58% and 52% improvements in terms of latency and cost, respectively for 4-bit QCA RCA compared to the previous designs.

...read moreread less

82 citations

Journal Article•DOI•

MAD Gates—Memristor Logic Design Using Driver Circuitry

[...]

Lauren Guckert¹, Earl E. Swartzlander¹•Institutions (1)

University of Texas at Austin¹

01 Feb 2017-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: A new low-power gate design, i.e., memristors-as-drivers gates, is proposed, which overcomes each of these issues by combining sense circuitry with the IMPLY operation.

...read moreread less

Abstract: Memristors have recently begun to be explored in arithmetic applications. However, all prior designs for memristor-based gates have had shortcomings in terms of scalability, applicability, completeness, and performance. In this brief, a new low-power gate design, i.e., memristors-as-drivers gates, is proposed, which overcomes each of these issues by combining sense circuitry with the IMPLY operation. By sensing the values of the input memristors as the driver for the output memristor, the delay is reduced to a single step for any Boolean operation, including xor. The area is reduced to at most three memristors for each gate and consumes only 30 fJ. An ${N}$-bit ripple carry adder implementation is proposed, which uses these gates to achieve a total delay of ${N}+1$ with an area of 8${N}$ memristors and their drivers. The individual bits of the proposed adder can be also pipelined, reducing the latency to four steps per addition.

...read moreread less

81 citations

Proceedings Article•DOI•

Energy-efficient hybrid stochastic-binary neural networks for near-sensor computing

[...]

Vincent T. Lee¹, Armin Alaghi¹, John P. Hayes², Visvesh S. Sathe¹, Luis Ceze¹ - Show less +1 more•Institutions (2)

University of Washington¹, University of Michigan²

27 Mar 2017

TL;DR: This paper proposes a stochastic-binary hybrid design which splits the computation between the Stochastic and binary domains for near-sensor NN applications, and shows that retraining the binary portion of the NN computation can compensate for precision losses introduced by shorter stoChastic bit-streams.

...read moreread less

Abstract: Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data movement. However, near-sensor computing presents its own set of challenges such as operating power constraints, energy budgets, and communication bandwidth capacities. In this paper, we propose a stochastic-binary hybrid design which splits the computation between the stochastic and binary domains for near-sensor NN applications. In addition, our design uses a new stochastic adder and multiplier that are significantly more accurate than existing adders and multipliers. We also show that retraining the binary portion of the NN computation can compensate for precision losses introduced by shorter stochastic bit-streams, allowing faster run times at minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary design can achieve 9.8x energy efficiency savings, and application-level accuracies within 0.05% compared to conventional all-binary designs.

...read moreread less

80 citations

Journal Article•DOI•

An ultra-compact all optical full adder based on nonlinear photonic crystal resonant cavities

[...]

F. Cheraghi¹, Mohammad Soroosh¹, Gholamreza Akbarizadeh¹•Institutions (1)

Shahid Chamran University of Ahvaz¹

01 Nov 2017-Superlattices and Microstructures

TL;DR: In this paper, the authors proposed and designed an all optical full adder based on photonic crystal, which used four nonlinear resonant cavities inside a two-dimensional photonic lattice.

...read moreread less

79 citations

Journal Article•DOI•

Low-Barrier Nanomagnets as p-Bits for Spin Logic

[...]

Rafatul Faria¹, Kerem Y. Camsari¹, Supriyo Datta¹•Institutions (1)

Purdue University¹

21 Mar 2017-IEEE Magnetics Letters

TL;DR: In this article, the authors use simulations based on the stochastic Landau-Lifshitz-Gilbert (sLLG) equation to demonstrate that similar impressive functions can be performed using unstable nanomagnets with energy barriers as low as a fraction of a kT.

...read moreread less

Abstract: It has recently been shown that a suitably interconnected network of tunable telegraphic noise generators or “p-bits” can be used to perform even precise arithmetic functions like a 32-bit adder. In this letter, we use simulations based on the stochastic Landau–Lifshitz–Gilbert (sLLG) equation to demonstrate that similar impressive functions can be performed using unstable nanomagnets with energy barriers as low as a fraction of a kT. This is surprising because the magnetization of low-barrier nanomagnets is not telegraphic with discrete values of $\pm 1$ . Rather, it fluctuates randomly among all values between $-$ 1 and +1, and the output magnets are read with a thresholding device that translates all positive values to one and all negative values to zero. We present sLLG-based simulations demonstrating the operation of a 32-bit adder, with a network of several hundred nanomagnets, exhibiting a remarkably precise correlation: The input magnets { A } and { B } as well as the output magnets { S } all fluctuate randomly and yet the quantity $A+B$ $-$ $S$ is sharply peaked around zero! If we fix { A } and { B }, the sum magnets { S } rapidly converge to a unique state with $S=A+B$ so that the system acts as an adder. But unlike standard adders, the operation is invertible. If we fix { S } and { B }, the remaining magnets { A } converge to the difference $A=S-B$ . These examples emphasize a new direction for the field of nanomagnetics away from stable high-barrier magnets toward stochastic low-barrier magnets that not only operate with lower currents, but are also more promising for continued downscaling.

...read moreread less

Proceedings Article•DOI•

Hardware-Software Codesign of Accurate, Multiplier-free Deep Neural Networks

[...]

Hokchhay Tann¹, Soheil Hashemi¹, R. Iris Bahar¹, Sherief Reda¹•Institutions (1)

Brown University¹

18 Jun 2017

TL;DR: In this article, the authors propose an approach to map floating-point based DNNs to 8-bit dynamic fixed-point networks with integer power-of-two weights with no change in network architecture.

...read moreread less

Abstract: While Deep Neural Networks (DNNs) push the state-of-the-art in many machine learning applications, they often require millions of expensive floating-point operations for each input classification. This computation overhead limits the applicability of DNNs to low-power, embedded platforms and incurs high cost in data centers. This motivates recent interests in designing low-power, low-latency DNNs based on fixed-point, ternary, or even binary data precision. While recent works in this area offer promising results, they often lead to large accuracy drops when compared to the floating-point networks. We propose a novel approach to map floating-point based DNNs to 8-bit dynamic fixed-point networks with integer power-of-two weights with no change in network architecture. Our dynamic fixed-point DNNs allow different radix points between layers. During inference, power-of-two weights allow multiplications to be replaced with arithmetic shifts, while the 8-bit fixed-point representation simplifies both the buffer and adder design. In addition, we propose a hardware accelerator design to achieve low-power, low-latency inference with insignificant degradation in accuracy. Using our custom accelerator design with the CIFAR-10 and ImageNet datasets, we show that our method achieves significant power and energy savings while increasing the classification accuracy.

...read moreread less

Journal Article•DOI•

Photonic crystal based 1-bit full-adder optical circuit by using ring resonators in a nonlinear structure

[...]

Hamed Alipour-Banaei¹, Hamed Seif-Dargahi¹•Institutions (1)

Islamic Azad University¹

01 May 2017-Photonics and Nanostructures: Fundamentals and Applications

TL;DR: In this article, the authors proposed a novel design for realizing all optical 1*bit fulladder based on photonic crystals, which was realized by cascading two optical 1-bit half-adders.

...read moreread less

Abstract: In this paper we proposed a novel design for realizing all optical 1*bit full-adder based on photonic crystals. The proposed structure was realized by cascading two optical 1-bit half-adders. The final structure is consisted of eight optical waveguides and two nonlinear resonant rings, created inside rod type two dimensional photonic crystal with square lattice. The structure has “X”, “Y” and “Z” as input and “SUM” and “CARRY” as output ports. The performance and functionality of the proposed structure was validated by means of finite difference time domain method.

...read moreread less

Journal Article•DOI•

Design and Applications of Approximate Circuits by Gate-Level Pruning

[...]

Jeremy Schlachter¹, Vincent Camus¹, Krishna V. Palem², Christian Enz¹•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Rice University²

01 May 2017-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this paper, a CAD tool is built and integrated into a standard digital flow to offer a wide range of cost-accuracy tradeoffs for any conventional design, including area, power, and delay savings.

...read moreread less

Abstract: Energy-efficiency is a critical concern for many systems, ranging from Internet of things objects and mobile devices to high-performance computers. Moreover, after 40 years of prosperity, Moore’s law is starting to show its economic and technical limits. Noticing that many circuits are over-engineered and that many applications are error-resilient or require less precision than offered by the existing hardware, approximate computing has emerged as a potential solution to pursue improvements of digital circuits. In this regard, a technique to systematically tradeoff accuracy in exchange for area, power, and delay savings in digital circuits is proposed: gate-level pruning (GLP). A CAD tool is build and integrated into a standard digital flow to offer a wide range of cost-accuracy tradeoffs for any conventional design. The methodology is first demonstrated on adders, achieving up to 78% energy-delay-area reduction for 10% mean relative error. It is then detailed how this methodology can be applied on a more complex system composed of a multitude of arithmetic blocks and memory: the discrete cosine transform (DCT), which is a key building block for image and video processing applications. Even though arithmetic circuits represent less than 4% of the entire DCT area, it is shown that the GLP technique can lead to 21% energy-delay-area savings over the entire system for a reasonable image quality loss of 24 dB. This significant saving is achieved thanks to the pruned arithmetic circuits, which sets some nodes at constant values, enabling the synthesis tool to further simplify the circuit and memory.

...read moreread less

Journal Article•DOI•

An Integrated Row-Based Cell Placement and Interconnect Synthesis Tool for Large SFQ Logic Circuits

[...]

Soheil Nazar Shahsavani¹, Ting-Ru Lin¹, Alireza Shafaei¹, Coenrad J. Fourie², Massoud Pedram¹ - Show less +1 more•Institutions (2)

University of Southern California¹, Stellenbosch University²

01 Mar 2017-IEEE Transactions on Applied Superconductivity

TL;DR: This paper presents a row-based design methodology covering cell placement, clock tree synthesis, and routing steps for large SFQ circuits, which can be reduced by 27% compared with the results of a conventional CMOS placement accompanied by an H-tree clock network.

...read moreread less

Abstract: This paper presents a row-based design methodology covering cell placement, clock tree synthesis, and routing steps for large SFQ circuits. The proposed placement tool initiates by running a state-of-the-art CMOS placer, which places fixed-height but variable-width cells in rows on the chip. Cells in each row are then grouped together such that each group contains at most $k$ cells with the same logic level. Next, for clock routing, this paper proposes HL-tree, which adopts an H-tree with passive transmission line connections to distribute the clock to groups, and within each group, a linear path composed of splitters and Josephson transmission lines (JTLs) provides the clock to cells. Increasing $k$ reduces the chip area, but also may incur a performance loss. To evaluate the effectiveness of the proposed approach, place-and-route results of a 32-bit Kogge–Stone adder for different values of $k$ are reported. By using this new design methodology, the overall chip area can be reduced by 27% compared with the results of a conventional CMOS placement accompanied by an H-tree clock network.

...read moreread less

Journal Article•DOI•

Hardware emulation of stochastic p-bits for invertible logic

[...]

Ahmed Zeeshan Pervaiz¹, Lakshmi Anirudh Ghantasala¹, Kerem Y. Camsari¹, Supriyo Datta¹•Institutions (1)

Purdue University¹

08 Sep 2017-Scientific Reports

TL;DR: This paper uses individual micro controllers to emulate p-bits, and presents results for a 4-bit ripple carry adder with 48 p-bit and a 5-bit multiplier working in inverted mode as a factorizer, a first step towards implementing p- bits with nano devices, like stochastic Magnetic Tunnel Junctions.

...read moreread less

Abstract: The common feature of nearly all logic and memory devices is that they make use of stable units to represent 0’s and 1’s. A completely different paradigm is based on three-terminal stochastic units which could be called “p-bits”, where the output is a random telegraphic signal continuously fluctuating between 0 and 1 with a tunable mean. p-bits can be interconnected to receive weighted contributions from others in a network, and these weighted contributions can be chosen to not only solve problems of optimization and inference but also to implement precise Boolean functions in an inverted mode. This inverted operation of Boolean gates is particularly striking: They provide inputs consistent to a given output along with unique outputs to a given set of inputs. The existing demonstrations of accurate invertible logic are intriguing, but will these striking properties observed in computer simulations carry over to hardware implementations? This paper uses individual micro controllers to emulate p-bits, and we present results for a 4-bit ripple carry adder with 48 p-bits and a 4-bit multiplier with 46 p-bits working in inverted mode as a factorizer. Our results constitute a first step towards implementing p-bits with nano devices, like stochastic Magnetic Tunnel Junctions.

...read moreread less

Journal Article•DOI•

Analysis of Noise Mechanisms in Cell-Size Control.

[...]

Saurabh Modi¹, Cesar Augusto Vargas-Garcia¹, Khem Raj Ghusinga¹, Abhyudai Singh¹•Institutions (1)

University of Delaware¹

06 Jun 2017-Biophysical Journal

TL;DR: Key insights are provided into the role of noise mechanisms in size homeostasis, and an inextricable link between timer-based models of size control and heavy-tailed cell-size distributions is suggested.

...read moreread less

Journal Article•DOI•

New alternatives for analog implementation of fractional-order integrators, differentiators and PID controllers based on integer-order integrators

[...]

Carlos Muñiz-Montero¹, Luis V. García-Jiménez¹, Luis Abraham Sánchez-Gaspariano², Carlos Sánchez-López³, Victor R. Gonzalez-Diaz², Esteban Tlelo-Cuautle⁴ - Show less +2 more•Institutions (4)

Polytechnic University of Puerto Rico¹, Benemérita Universidad Autónoma de Puebla², Autonomous University of Tlaxcala³, CINVESTAV⁴

17 Jul 2017-Nonlinear Dynamics

TL;DR: In this article, an alternative for the circuital realization of analog fractional-order differentiators and integrators without using ladder networks is presented by a mathematical manipulation of a rational function in a similar way to the reported for the synthesis of the variable-state filters.

...read moreread less

Abstract: In this work, we propose an alternative for the circuital realization of analog fractional-order differentiators and integrators without using ladder networks. This alternative is obtained by a mathematical manipulation of a rational function in a similar way to the reported for the synthesis of the variable-state filters. The advantage of the proposed implementation is the requirement of only simple analog design blocks, such as integrators (of integer order), differential amplifiers and two-input adder amplifiers. Most important, contrarily to other reported solutions, the proposed realization can be fulfilled using commercially available resistors and capacitors, with a reduced number of calculations, and without negative impedance converters or inductors. In addition, the orders of the fractional derivative and integral can be modified just varying the gain of the differential amplifiers and adders. To validate the proposal of implementation, and as example of application, we present simulations (HSPICE, MATLAB) and experimental results of a first-order plus dead time plant controlled by fractional-order PI and PID controllers. The experimental results were obtained from a realization using field-programmable analog arrays. A comparison analysis highlights that the proposed alternative of implementation presents advantages regarding a Cauer-network-based realization in terms of number of active and passive elements, number of passive elements with non-commercial available values and design complexity.

...read moreread less

Journal Article•DOI•

An Efficient and Flexible Hardware Implementation of the Dual-Field Elliptic Curve Cryptographic Processor

[...]

Zilong Liu¹, Dongsheng Liu¹, Xuecheng Zou¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Mar 2017-IEEE Transactions on Industrial Electronics

TL;DR: An efficient and flexible dual-field ECC processor which can support arbitrary elliptic curve standards and algorithms using the hardware–software approach is presented.

...read moreread less

Abstract: Elliptic curve cryptography (ECC) has been widely used for the digital signature to ensure the security in communication. It is important for the ECC processor to support a variety of ECC standards to be compatible with different security applications. Thus, a flexible processor which can support different standards and algorithms is desired. In this paper, an efficient and flexible dual-field ECC processor using the hardware–software approach is presented. The proposed processor can support arbitrary elliptic curve. An elaborate modular arithmetic logic unit is designed. It can perform basic modular arithmetic operations and achieve high efficiency. Based on our designed instruction set, the processor can be programmed to perform various point operations based on different algorithms. To demonstrate the flexibility of our processor, a point multiplication algorithm with power analysis resistance is adopted. Our design is implemented in the field-programmable gate array platform and also in the application-specified integrated circuit. After implemented in the 55 nm CMOS process, the processor takes between 0.60 ms (163 bits ECC) and 6.75 ms (571 bits ECC) to finish one-point multiplication. Compared to other related works, the merits of our ECC processor are the high hardware efficiency and flexibility.

...read moreread less

Journal Article•DOI•

Power-Efficient Sum of Absolute Differences Hardware Architecture Using Adder Compressors for Integer Motion Estimation Design

[...]

Bianca Silveira¹, Guilherme Paim², Brunno Abreu², Mateus Grellert², Claudio Machado Diniz¹, Eduardo Costa¹, Sergio Bampi² - Show less +3 more•Institutions (2)

Universidade Católica de Pelotas¹, Universidade Federal do Rio Grande do Sul²

04 Aug 2017-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: In this paper, the authors exploit different adder compressors structures into the SAD hardware architecture and synthesize an 8-2 compressor with 4-2 compressors and Kogge-Stone adder in the recombination line.

...read moreread less

Abstract: Sum of absolute differences (SAD) calculation is one of the most time-consuming operations of video encoders compatible with the high efficiency video coding standard. SAD hardware architectures employ an adder tree to accumulate the coefficients from absolute difference between two video blocks. This paper exploits different adder compressors structures into the SAD hardware architecture. The architectures were synthesized to 45-nm CMOS standard cells. Synthesis results show that SAD architecture using 8–2 compressor composed with 4–2 compressors and Kogge–Stone adder in the recombination line reduces power dissipation by 25.5% on average when compared with the SAD architecture using conventional adders from a state-of-the-art synthesis tool. Our throughput analysis shows that the designed SAD units are capable of encoding full HD ( $1920\times 1080$ ) videos in real time at 30 frames/s.

...read moreread less

Posted Content•

Energy-Efficient Hybrid Stochastic-Binary Neural Networks for Near-Sensor Computing

[...]

Vincent T. Lee¹, Armin Alaghi¹, John P. Hayes², Visvesh S. Sathe¹, Luis Ceze¹ - Show less +1 more•Institutions (2)

University of Washington¹, University of Michigan²

07 Jun 2017-arXiv: Hardware Architecture

TL;DR: In this article, the authors propose a stochastic-binary hybrid neural network architecture for near-sensor NN applications, which splits the computation between the stochastastic and binary domains.

...read moreread less

Abstract: Recent advances in neural networks (NNs) exhibit unprecedented success at transforming large, unstructured data streams into compact higher-level semantic information for tasks such as handwriting recognition, image classification, and speech recognition. Ideally, systems would employ near-sensor computation to execute these tasks at sensor endpoints to maximize data reduction and minimize data movement. However, near- sensor computing presents its own set of challenges such as operating power constraints, energy budgets, and communication bandwidth capacities. In this paper, we propose a stochastic- binary hybrid design which splits the computation between the stochastic and binary domains for near-sensor NN applications. In addition, our design uses a new stochastic adder and multiplier that are significantly more accurate than existing adders and multipliers. We also show that retraining the binary portion of the NN computation can compensate for precision losses introduced by shorter stochastic bit-streams, allowing faster run times at minimal accuracy losses. Our evaluation shows that our hybrid stochastic-binary design can achieve 9.8x energy efficiency savings, and application-level accuracies within 0.05% compared to conventional all-binary designs.

...read moreread less

Journal Article•DOI•

A 1.1-mW Ground Effect-Resilient Body-Coupled Communication Transceiver With Pseudo OFDM for Head and Body Area Network

[...]

Wala Saadeh¹, Muhammad Awais Bin Altaf¹, Haneen Alsuradi¹, Jerald Yoo¹•Institutions (1)

Masdar Institute of Science and Technology¹

07 Jul 2017-IEEE Journal of Solid-state Circuits

TL;DR: A body-coupled communication (BCC) transceiver (TRX) that mitigates all the practical impairments of the body channel at once is presented, which has been the two major issues on the BCC.

...read moreread less

Abstract: This paper presents a body-coupled communication (BCC) transceiver (TRX) that mitigates all the practical impairments of the body channel at once. The proposed pseudo orthogonal frequency-division multiplexing (P-OFDM) TRX combines baseband BPSK–OFDM with frequency-shift keying (FSK) to alleviate the impacts of variable ground effect and variable skin-electrode contact impedance, which have been the two major issues on the BCC. It can tolerate up to 20 dB of channel gain variation with measured bit error rate improvement of >70% compared to FSK modulation alone. The RC relaxed contact impedance monitor continuously monitors and compensates the variable skin-electrode contact impedance at both transmitter (TX) and receiver (RX). The proposed power-gated 8-point inverse fast Fourier transform/fast Fourier transform with no floating-point multipliers (FPMs) reduces the gate count and power by 54% and 30% compared to conventional FPMs, respectively. Additionally, the simple floating-point adder (FPA) reduces the gate count and energy consumption by 34% and 20% compared to conventional FPAs, respectively. A high input impedance glitch-free FSK demodulation RX with variable threshold limiter and all digital cycle correction is also proposed to support a scalable data rate (200 Kbps–2 Mbps). The 0.54 mm2 TRX in 65-nm CMOS consumes 1.1 mW.

...read moreread less

Journal Article•DOI•

A Three-Layer Full Adder/Subtractor Structure in Quantum-Dot Cellular Automata

[...]

Yashar Zirak Barughi¹, Saeed Rasouli Heikalabad¹•Institutions (1)

Islamic Azad University¹

29 Jun 2017-International Journal of Theoretical Physics

TL;DR: This paper is designing and simulating a fulladder/subtractor with minimum number of cells and complexities in three layers based on quantum-dot cellular automata technology.

...read moreread less

Abstract: Nowadays, quantum-dot cellular automata (QCA) is one of the paramount modern technologies for designing logical structures at the nano-scale. This technology is being used in molecular levels and it is based on QCA cells. High speed data transfer and low consumable power are the advantages of this technology. In this paper, we are designing and simulating a fulladder/subtractor with minimum number of cells and complexities in three layers. QCA designer software has been used to simulate the proposed design.

...read moreread less

Journal Article•DOI•

Voltage-Based Concatenatable Full Adder Using Spin Hall Effect Switching

[...]

Arman Roohi¹, Ramtin Zand¹, Deliang Fan¹, Ronald F. DeMara¹•Institutions (1)

University of Central Florida¹

31 Jan 2017-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: Magnetic tunnel junction (MTJ) devices are leveraged to develop a novel full adder (FA) based on 3- and 5-input majority gates based on Spin Hall effect (SHE) for changing the MTJ states resulting in low-energy switching behavior.

...read moreread less

Abstract: Magnetic tunnel junction (MTJ)-based devices have been studied extensively as a promising candidate to implement hybrid energy-efficient computing circuits due to their nonvolatility, high integration density, and CMOS compatibility. In this paper, MTJs are leveraged to develop a novel full adder (FA) based on 3- and 5-input majority gates. Spin Hall effect (SHE) is utilized for changing the MTJ states resulting in low-energy switching behavior. SHE-MTJ devices are modeled in Verilog-A using precise physical equations. SPICE circuit simulator is used to validate the functionality of 1-bit SHE-based FA. The simulation results show 76% and 32% improvement over previous voltage-mode MTJ-based FA in terms of energy consumption and device count, respectively. The concatanatability of our proposed 1-bit SHE-FA is investigated through developing a 4-bit SHE-FA. Finally, delay and power consumption of an ${ {n}}$ -bit SHE-based adder has been formulated to provide a basis for developing an energy efficient SHE-based ${n}$ -bit arithmetic logic unit.

...read moreread less

Journal Article•DOI•

Halving the cost of quantum addition

[...]

Craig Gidney¹•Institutions (1)

Google¹

19 Sep 2017-arXiv: Quantum Physics

TL;DR: An n-bit controlled adder circuit with T-count of 8n+O(1), a temporary adder that can be computed for the same cost as the normal adder but whose result can be kept until it is later uncomputed without using T gates, and some other constructions whose T- Count is improved by the temporary logical-AND.

...read moreread less

Abstract: We improve the number of T gates needed to perform an n-bit adder from 8n + O(1) to 4n + O(1). We do so via a "temporary logical-AND" construction which uses four T gates to store the logical-AND of two qubits into an ancilla and zero T gates to later erase the ancilla. This construction is equivalent to one by Jones, except that our framing makes it clear that the technique is far more widely applicable than previously realized. Temporary logical-ANDs can be applied to integer arithmetic, modular arithmetic, rotation synthesis, the quantum Fourier transform, Shor's algorithm, Grover oracles, and many other circuits. Because T gates dominate the cost of quantum computation based on the surface code, and temporary logical-ANDs are widely applicable, this represents a significant reduction in projected costs of quantum computation. In addition to our n-bit adder, we present an n-bit controlled adder circuit with T-count of 8n + O(1), a temporary adder that can be computed for the same cost as the normal adder but whose result can be kept until it is later uncomputed without using T gates, and discuss some other constructions whose T-count is improved by the temporary logical-AND.

...read moreread less

Journal Article•DOI•

A new method on designing and simulating CNTFET_based ternary gates and arithmetic circuits

[...]

Hadi Samadi¹, Ali Shahhoseini¹, Faramarz Aghaei-liavali¹•Institutions (1)

Islamic Azad University¹

01 May 2017-Microelectronics Journal

TL;DR: This paper presents a new design of three-valued logic gates on the basis of carbon nanotube transistors, and the proposed circuit is compared with the existing models of circuits to indicate that the proposed model outperform the existing model in terms of power and delay.

...read moreread less

Journal Article•DOI•

High-Performance Ternary Adder Using CNTFET

[...]

Subhendu Kumar Sahoo¹, Gangishetty Akhilesh¹, Rasmita Sahoo, Manasi Muglikar²•Institutions (2)

Birla Institute of Technology and Science¹, Carnegie Mellon University²

01 May 2017-IEEE Transactions on Nanotechnology

TL;DR: Two new designs to implement a ternary half adder using Carbon Nanotubes Field Effect Transistors (CNFETs) show delay and power advantage up to 40 and 39% with less transistor count, so use of these half adders in complex arithmetic circuits will be advantageous.

...read moreread less

Abstract: Ternary logic is a promising alternative to the conventional binary logic in VLSI design as it provides the advantages of reduced interconnects, higher operating speeds, and smaller chip area. This paper presents a pair of circuits for implementing a ternary half adder using carbon nanotube field-effect transistors. The proposed designs combine both futuristic ternary and conventional binary logic design approach. One of the proposed circuits for ternary to binary decoder simplifies further circuit implementation and provides excellent delay and power advantages in data path circuit such as adder. These circuits have been extensively simulated using HSPICE to obtain power, delay, and power delay product. The circuit performances are compared with alternative designs reported in recent literature. One of the proposed ternary adders has been demonstrated power, power delay product improvement up to 63% and 66% respectively, with lesser transistor count. So, the use of these half adders in complex arithmetic circuits will be advantageous.

...read moreread less

Journal Article•DOI•

Comparative study of 16-order FIR filter design using different multiplication techniques

[...]

Anubhuti Mittal, Ashutosh Nandi, Disha Yadav

01 May 2017-Iet Circuits Devices & Systems

TL;DR: The comparison of simulation results of all the filters show that FIR filter with WT multiplier is the best optimised filter.

...read moreread less

Abstract: This study represents designing and implementation of a low power and high speed 16 order FIR filter. To optimise filter area, delay and power, different multiplication techniques such as Vedic multiplier, add and shift method and Wallace tree (WT) multiplier are used for the multiplication of filter coefficient with filter input. Various adders such as ripple carry adder, Kogge Stone adder, Brent Kung adder, Ladner Fischer adder and Han Carlson adder are analysed for optimum performance study for further use in various multiplication techniques along with barrel shifter. Secondly optimisation of filter area and delay is done by using add and shift method for multiplication, although it increases power dissipation of the filter. To reduce the complexity of filter, coefficients are represented in canonical signed digit representation as it is more efficient than traditional binary representation. The finite impulse-response (FIR) filter is designed in MATLAB using equiripple method and the same filter is synthesised on Xilinx Spartan 3E XC3S500E target field-programmable gate array device using Very High Speed Integrated Circuit Hardware Description Language (VHDL) subsequently the total on-chip power is calculated in Vivado2014.4. The comparison of simulation results of all the filters show that FIR filter with WT multiplier is the best optimised filter.

...read moreread less

Journal Article•DOI•

High-performance full adder architecture in quantum-dot cellular automata

[...]

Hamid Rashidi, Abdalhossein Rezai

01 Jul 2017-The Journal of Engineering

TL;DR: Two QCA full adder architectures are presented and evaluated: a new and efficient 1-bit QCAFull adder architecture and a 4-bitQCA ripple carry adder (RCA) architecture that outperform most results so far in the literature.

...read moreread less

Abstract: Quantum-dot cellular automata (QCA) is a new and promising computation paradigm, which can be a viable replacement for the complementary metal–oxide–semiconductor technology at nano-scale level. This technology provides a possible solution for improving the computation in various computational applications. Two QCA full adder architectures are presented and evaluated: a new and efficient 1-bit QCA full adder architecture and a 4-bit QCA ripple carry adder (RCA) architecture. The proposed architectures are simulated using QCADesigner tool version 2.0.1. These architectures are implemented with the coplanar crossover approach. The simulation results show that the proposed 1-bit QCA full adder and 4-bit QCA RCA architectures utilise 33 and 175 QCA cells, respectively. Our simulation results show that the proposed architectures outperform most results so far in the literature.

...read moreread less

Journal Article•DOI•

Design of an ultra-efficient reversible full adder-subtractor in quantum-dot cellular automata

[...]

Elham Taherkhani¹, Mohammad Hossein Moaiyeri¹, Shaahin Angizi²•Institutions (2)

Shahid Beheshti University¹, University of Central Florida²

01 Aug 2017-Optik

TL;DR: A novel reversible full adder-subtractor circuit based on QCA is proposed which improves the cell count, area and total energy dissipation by almost 45% and 50% and 48%, respectively, as compared to the existing QCA-based single-layer and multilayerversible full adders.

...read moreread less

Collapse