Showing papers on "Adder published in 2020"

PDF

Open Access

Journal Article•DOI•

Current-driven magnetic domain-wall logic.

[...]

Zhaochu Luo¹, Aleš Hrabec¹, Aleš Hrabec², Trong Phuong Dao¹, Trong Phuong Dao², Giacomo Sala¹, Simone Finizio², Junxiao Feng¹, Sina Mayr², Sina Mayr¹, J. Raabe², Pietro Gambardella¹, Laura J. Heyderman², Laura J. Heyderman¹ - Show less +10 more•Institutions (2)

ETH Zurich¹, Paul Scherrer Institute²

11 Mar 2020-Nature

TL;DR: This work provides a viable platform for scalable all-electric magnetic logic, paving the way for memory-in-logic applications and demonstrates electrical control of magnetic data and device interconnection in logic circuits.

...read moreread less

Abstract: Spin-based logic architectures provide nonvolatile data retention, near-zero leakage, and scalability, extending the technology roadmap beyond complementary metal-oxide-semiconductor logic1-13. Architectures based on magnetic domain walls take advantage of the fast motion, high density, non-volatility and flexible design of domain walls to process and store information1,3,14-16. Such schemes, however, rely on domain-wall manipulation and clocking using an external magnetic field, which limits their implementation in dense, large-scale chips. Here we demonstrate a method for performing all-electric logic operations and cascading using domain-wall racetracks. We exploit the chiral coupling between neighbouring magnetic domains induced by the interfacial Dzyaloshinskii-Moriya interaction17-20, which promotes non-collinear spin alignment, to realize a domain-wall inverter, the essential basic building block in all implementations of Boolean logic. We then fabricate reconfigurable NAND and NOR logic gates, and perform operations with current-induced domain-wall motion. Finally, we cascade several NAND gates to build XOR and full adder gates, demonstrating electrical control of magnetic data and device interconnection in logic circuits. Our work provides a viable platform for scalable all-electric magnetic logic, paving the way for memory-in-logic applications.

...read moreread less

247 citations

Journal Article•DOI•

Reconfigurable logic and neuromorphic circuits based on electrically tunable two-dimensional homojunctions

[...]

Chen Pan¹, Chenyu Wang¹, Shi-Jun Liang¹, Yu Wang¹, Tianjun Cao¹, Pengfei Wang¹, Cong Wang¹, Shuang Wang¹, Bin Cheng¹, Anyuan Gao¹, Erfu Liu¹, Kenji Watanabe², Takashi Taniguchi², Feng Miao¹ - Show less +10 more•Institutions (2)

Nanjing University¹, National Institute for Materials Science²

01 Jul 2020

TL;DR: It is shown that a homojunction device made from two-dimensional tungsten diselenide can exhibit diverse field-effect characteristics controlled by polarity combinations of the gate and drain voltage inputs, which suggests that the devices could be cascaded to create complex circuits.

...read moreread less

Abstract: Reconfigurable logic and neuromorphic devices are crucial for the development of high-performance computing. However, creating reconfigurable devices based on conventional complementary metal–oxide–semiconductor technology is challenging due to the limited field-effect characteristics of the fundamental silicon devices. Here we show that a homojunction device made from two-dimensional tungsten diselenide can exhibit diverse field-effect characteristics controlled by polarity combinations of the gate and drain voltage inputs. These electrically tunable devices can achieve reconfigurable multifunctional logic and neuromorphic capabilities. With the same logic circuit, we demonstrate a 2:1 multiplexer, D-latch and 1-bit full adder and subtractor. These functions exhibit a full-swing output voltage and the same supply and signal voltage, which suggests that the devices could be cascaded to create complex circuits. We also show that synaptic circuits based on only three homojunction devices can achieve reconfigurable spiking-timing-dependent plasticity and pulse-tunable synaptic potentiation or depression characteristics; the same function using complementary metal–oxide–semiconductor devices would require more than ten transistors. A homojunction device made from two-dimensional tungsten diselenide can be used to create circuits that exhibit multifunctional logic and neuromorphic capabilities with simpler designs than conventional silicon-based systems.

...read moreread less

159 citations

Proceedings Article•DOI•

AdderNet: Do We Really Need Multiplications in Deep Learning?

[...]

Hanting Chen¹, Yunhe Wang², Chunjing Xu², Boxin Shi¹, Chao Xu¹, Qi Tian², Chang Xu³ - Show less +3 more•Institutions (3)

Peking University¹, Huawei², University of Sydney³

14 Jun 2020

TL;DR: This paper develops a special back-propagation approach for AdderNets by investigating the full-precision gradient, and proposes an adaptive learning rate strategy to enhance the training procedure of Ad DerNets according to the magnitude of each neuron's gradient.

...read moreread less

Abstract: Compared with cheap addition operation, multiplication operation is of much higher computation complexity. The widely-used convolutions in deep neural networks are exactly cross-correlation to measure the similarity between input feature and convolution filters, which involves massive multiplications between float values. In this paper, we present adder networks (AdderNets) to trade these massive multiplications in deep neural networks, especially convolutional neural networks (CNNs), for much cheaper additions to reduce computation costs. In AdderNets, we take the L1-norm distance between filters and input feature as the output response. The influence of this new similarity measure on the optimization of neural network have been thoroughly analyzed. To achieve a better performance, we develop a special back-propagation approach for AdderNets by investigating the full-precision gradient. We then propose an adaptive learning rate strategy to enhance the training procedure of AdderNets according to the magnitude of each neuron's gradient. As a result, the proposed AdderNets can achieve 74.9% Top-1 accuracy 91.7% Top-5 accuracy using ResNet-50 on the ImageNet dataset without any multiplication in convolutional layer. The codes are publicly available at: (https://github.com/huaweinoah/AdderNet).

...read moreread less

155 citations

Journal Article•DOI•

Approximate Arithmetic Circuits: A Survey, Characterization, and Recent Applications

[...]

Honglan Jiang¹, Francisco Javier Hernandez Santiago², Hai Mo¹, Leibo Liu¹, Jie Han¹ - Show less +1 more•Institutions (2)

Tsinghua University¹, University of Alberta²

12 Aug 2020

TL;DR: A comprehensive survey and a comparative evaluation of recently developed approximate arithmetic circuits under different design constraints, synthesized and characterized under optimizations for performance and area.

...read moreread less

Abstract: Approximate computing has emerged as a new paradigm for high-performance and energy-efficient design of circuits and systems. For the many approximate arithmetic circuits proposed, it has become critical to understand a design or approximation technique for a specific application to improve performance and energy efficiency with a minimal loss in accuracy. This article aims to provide a comprehensive survey and a comparative evaluation of recently developed approximate arithmetic circuits under different design constraints. Specifically, approximate adders, multipliers, and dividers are synthesized and characterized under optimizations for performance and area. The error and circuit characteristics are then generalized for different classes of designs. The applications of these circuits in image processing and deep neural networks indicate that the circuits with lower error rates or error biases perform better in simple computations, such as the sum of products, whereas more complex accumulative computations that involve multiple matrix multiplications and convolutions are vulnerable to single-sided errors that lead to a large error bias in the computed result. Such complex computations are more sensitive to errors in addition than those in multiplication, so a larger approximation can be tolerated in multipliers than in adders. The use of approximate arithmetic circuits can improve the quality of image processing and deep learning in addition to the benefits in performance and power consumption for these applications.

...read moreread less

143 citations

Journal Article•DOI•

Extracting Success from IBM’s 20-Qubit Machines Using Error-Aware Compilation

[...]

Shin Nishio¹, Yulu Pan¹, Takahiko Satoh¹, Hideharu Amano¹, Rodney Van Meter¹ - Show less +1 more•Institutions (1)

Keio University¹

27 May 2020-ACM Journal on Emerging Technologies in Computing Systems

TL;DR: In this paper, the authors focus on the fact that the error rates of individual qubits are not equal, with a goal of maximizing the success probability of real-world subroutines such as an adder circuit.

...read moreread less

Abstract: NISQ (Noisy, Intermediate-Scale Quantum) computing requires error mitigation to achieve meaningful computation. Our compilation tool development focuses on the fact that the error rates of individual qubits are not equal, with a goal of maximizing the success probability of real-world subroutines such as an adder circuit. We begin by establishing a metric for choosing among possible paths and circuit alternatives for executing gates between variables placed far apart within the processor, and test our approach on two IBM 20-qubit systems named Tokyo and Poughkeepsie. We find that a single-number metric describing the fidelity of individual gates is a useful but imperfect guide. Our compiler uses this subsystem and maps complete circuits onto the machine using a beam search-based heuristic that will scale as processor and program sizes grow. To evaluate the whole compilation process, we compiled and executed adder circuits, then calculated the Kullback–Leibler divergence (KL-divergence, a measure of the distance between two probability distributions). For a circuit within the capabilities of the hardware, our compilation increases estimated success probability and reduces KL-divergence relative to an error-oblivious placement.

...read moreread less

80 citations

Journal Article•DOI•

An Efficient Quantum-Dot Cellular Automata Full Adder Based on a New Convertible 7-Input Majority-Not Gate

[...]

Hossein Mohammadi¹, Keivan Navi¹, Mehdi Hosseinzadeh²•Institutions (2)

Islamic Azad University¹, Iran University of Medical Sciences²

05 Nov 2020-Iete Journal of Research

TL;DR: In this article, the authors proposed Quantum-dot cellular automata (QCA) is one of the most promising emerging paradigms offered for substitution of ongoing MOSFET technology.

...read moreread less

Abstract: Quantum-dot cellular automata (QCA) is one of the most promising emerging paradigms offered for substitution of ongoing MOSFET technology. In order to qualify the QCA technology, all the previously...

...read moreread less

78 citations

Journal Article•DOI•

Design of a Scalable Low-Power 1-Bit Hybrid Full Adder for Fast Computation

[...]

Mehedi Hasan¹, Md. Jobayer Hossein¹, Mainul Hossain², Hasan U. Zaman¹, Sharnali Islam³ - Show less +1 more•Institutions (3)

North South University¹, University of Central Florida², University of Illinois at Urbana–Champaign³

01 Aug 2020-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: Based on the simulation results, it can be stated that the proposed hybrid FA circuit is an attractive alternative in the data path design of modern high-speed Central Processing Units.

...read moreread less

Abstract: A novel design of a hybrid Full Adder (FA) using Pass Transistors (PTs), Transmission Gates (TGs) and Conventional Complementary Metal Oxide Semiconductor (CCMOS) logic is presented. Performance analysis of the circuit has been conducted using Cadence toolset. For comparative analysis, the performance parameters have been compared with twenty existing FA circuits. The proposed FA has also been extended up to a word length of 64 bits in order to test its scalability. Only the proposed FA and five of the existing designs have the ability to operate without utilizing buffer in intermediate stages while extended to 64 bits. According to simulation results, the proposed design demonstrates notable performance in power consumption and delay which accounted for low power delay product. Based on the simulation results, it can be stated that the proposed hybrid FA circuit is an attractive alternative in the data path design of modern high-speed Central Processing Units.

...read moreread less

67 citations

Journal Article•DOI•

A Logic Synthesis Methodology for Low-Power Ternary Logic Circuits

[...]

Sunmean Kim¹, Sung-Yun Lee², Sunghye Park², Kyung Rok Kim¹, Seokhyeong Kang¹ - Show less +1 more•Institutions (2)

Ulsan National Institute of Science and Technology¹, Pohang University of Science and Technology²

07 May 2020-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: The proposed ternary full adder has a significant improvement in the power-delay product (PDP) over previous designs and is applicable to both unbalanced (0, 1, 2) and balanced (−1, 0, 1) ternARY logic.

...read moreread less

Abstract: We propose a logic synthesis methodology with a novel low-power circuit structure for ternary logic. The proposed methodology synthesizes a ternary function as a ternary logic gate using carbon nanotube field-effect transistors (CNTFETs). The circuit structure uses the body effect to mitigate the excessive power consumption for the third logic value. Energy-efficient ternary logic circuits are designed with a combination of synthesized low-power ternary logic gates. The proposed methodology is applicable to both unbalanced (0, 1, 2) and balanced (−1, 0, 1) ternary logic. To verify the improvement in energy efficiency, we have designed various ternary arithmetic logic circuits using the proposed methodology. The proposed ternary full adder has a significant improvement in the power-delay product (PDP) over previous designs. Ternary benchmark circuits have been designed to show that complex ternary functions can be designed to more efficient circuits with the proposed methodology.

...read moreread less

64 citations

Journal Article•DOI•

Reconfigurable all optical half adder and optical XOR and AND logic gates based on 2D photonic crystals

[...]

Fariborz Parandin¹, M. Reza Malmir¹•Institutions (1)

Islamic Azad University¹

01 Feb 2020-Optical and Quantum Electronics

TL;DR: In this research, an optical half-adder using two-dimensional photonic crystals was designed and simulated and has the higher power difference in the high and low logic modes, which reduces errors in detecting these two values in the output.

...read moreread less

Abstract: The use of optical devices for high-speed data transmission has been considered for some time. Structures which can be used in optical integrated circuits are very important. Photonic crystals have been used as basic structures in the design of optical devices and especially logic devices. Considering the ability of these structures to design logic gates and circuits, it is expected to use them as base structures in the design of optical integrated circuits. In this research, an optical half-adder using two-dimensional photonic crystals was designed and simulated. One of the features of this circuit is the higher power difference in the high and low logic modes, which reduces errors in detecting these two values in the output. The circuit also has the ability to use the optical gates XOR and AND. In addition, it has a small structure that makes it suitable for use in optical integrated circuits.

...read moreread less

63 citations

Journal Article•DOI•

Block-Based Carry Speculative Approximate Adder for Energy-Efficient Applications

[...]

Farhad Ebrahimi-Azandaryani¹, Omid Akbari¹, Mehdi Kamal¹, Ali Afzali-Kusha¹, Massoud Pedram² - Show less +1 more•Institutions (2)

University of Tehran¹, University of Southern California²

01 Jan 2020-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: The effectiveness of the proposed approximate adder is compared with state-of-the-art approximate adders using a cost function based on the energy, delay, area, and output quality and results indicate an average of 50% reduction in terms of the cost function compared to other approximateAdders.

...read moreread less

Abstract: In this brief, a low energy consumption block-based carry speculative approximate adder is proposed. Its structure is based on partitioning the adder into some non-overlapped summation blocks whose structures may be selected from both the carry propagate and parallel-prefix adders. Here, the carry output of each block is speculated based on the input operands of the block itself and those of the next block. In this adder, the length of the carry chain is reduced to two blocks (worst case), where in most cases only one block is employed to calculate the carry output leading to a lower average delay. In addition, to increase the accuracy and reduce the output error rate, an error detection and recovery mechanism is proposed. The effectiveness of the proposed approximate adder is compared with state-of-the-art approximate adders using a cost function based on the energy, delay, area, and output quality. The results indicate an average of 50% reduction in terms of the cost function compared to other approximate adders.

...read moreread less

58 citations

Journal Article•DOI•

An ultrafast all-optical half adder using nonlinear ring resonators in photonic crystal microstructure

[...]

Mojtaba Hosseinzadeh Sani¹, Afsaneh Asgariyan Tabrizi², Hamed Saghaei³, Rouhollah Karimzadeh⁴•Institutions (4)

International University, Cambodia¹, Academic Center for Education, Culture and Research², Islamic Azad University³, Shahid Beheshti University⁴

01 Feb 2020-Optical and Quantum Electronics

TL;DR: In this article, an ultra-fast all-optical half adder based on nonlinear ring resonators is proposed, which is an appropriate candidate for photonic integrated circuits used in the next generation of alloptical CPUs.

...read moreread less

Abstract: Half adder and half subtractor are the basic building blocks of an arithmetic logic unit used in every optical central processing unit (CPU) to provide computational operators. In this paper, we aim to design an ultrafast all-optical half adder based on nonlinear ring resonators. The proposed structure consists of the concurrent designs of the AND and XOR logic gates inside a rod-based photonic crystal microstructure. The linear dielectric rods made of silicon and nonlinear dielectric rods composed of doped glass are used to design the nonlinear ring resonators as the fundamental blocks of a half adder. We demonstrate as the intensity of the incoming light increases, the nonlinear Kerr effect appears, and the total refractive index increases. It diverts the direction of light propagation to the desired nonlinear ring resonator depending on the signal wavelength, the radius of rods and lattice constant. Finally, after several resonances, the light is coupled to the output. Our numerical simulations using a two-dimensional finite-difference time-domain method reveal depending on the light intensity, the maximum and minimum transmissions of the half adder are 100% and 96%, respectively. The calculations also show the delay of the designed half adder is 3.6 ps. Due to the small area of 249.75 µm2, the proposed half adder is an appropriate candidate for photonic integrated circuits used in the next generation of all-optical CPUs.

...read moreread less

Journal Article•DOI•

Design of approximate adders and multipliers for error tolerant image processing

[...]

G. Anusha¹, P. Deepa¹•Institutions (1)

Government College of Technology, Coimbatore¹

01 Feb 2020-Microprocessors and Microsystems

TL;DR: The design metrics of proposed AAs, Approximate Dadda Multipliers (ADMs) are synthesized in Cadence Register-Transfer Level (RTL) compiler and compares the design metrics with three different technology nodes.

...read moreread less

Posted Content•

AdderSR: Towards Energy Efficient Image Super-Resolution

[...]

Dehua Song¹, Yunhe Wang¹, Hanting Chen¹, Chang Xu², Chunjing Xu¹, Dacheng Tao² - Show less +2 more•Institutions (2)

Huawei¹, University of Sydney²

18 Sep 2020-arXiv: Image and Video Processing

TL;DR: This paper thoroughly analyzes the relationship between an adder operation and the identity mapping and insert shortcuts to enhance the performance of SR models using adder networks and develops a learnable power activation for adjusting the feature distribution and refining details.

...read moreread less

Abstract: This paper studies the single image super-resolution problem using adder neural networks (AdderNet). Compared with convolutional neural networks, AdderNet utilizing additions to calculate the output features thus avoid massive energy consumptions of conventional multiplications. However, it is very hard to directly inherit the existing success of AdderNet on large-scale image classification to the image super-resolution task due to the different calculation paradigm. Specifically, the adder operation cannot easily learn the identity mapping, which is essential for image processing tasks. In addition, the functionality of high-pass filters cannot be ensured by AdderNet. To this end, we thoroughly analyze the relationship between an adder operation and the identity mapping and insert shortcuts to enhance the performance of SR models using adder networks. Then, we develop a learnable power activation for adjusting the feature distribution and refining details. Experiments conducted on several benchmark models and datasets demonstrate that, our image super-resolution models using AdderNet can achieve comparable performance and visual quality to that of their CNN baselines with an about 2$\times$ reduction on the energy consumption.

...read moreread less

Journal Article•DOI•

A novel controllable inverter and adder/subtractor in quantum-dot cellular automata using cell interaction based XOR gate

[...]

Nuriddin Safoev¹, Jun-Cheol Jeon¹•Institutions (1)

Kumoh National Institute of Technology¹

01 Feb 2020-Microelectronic Engineering

TL;DR: A novel three- input XOR gate that is based on a cell-interaction design that can be used as a multifunctional gate by fixing one of the structure's inputs, which allows two-input XOR or XNOR gates to be easily implemented.

...read moreread less

Journal Article•DOI•

A Semiparallel Full-Adder in IMPLY Logic

[...]

Shokat Ganjeheizadeh Rohani¹, Nima TaheriNejad², David Radakovits²•Institutions (2)

Vienna University of Technology¹, Information Technology Institute²

01 Jan 2020-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A new architecture for a digital full-adder is presented, which is up to 41% faster than existing IMPLY-based serial designs while requiring up to 78% less area (memristors) compared to the existing parallel design.

...read moreread less

Abstract: Passive implementation of memristors has led to several innovative works in the field of electronics. Despite being primarily a candidate for memory applications, memristors have proven to be beneficial in several other circuits and applications as well. One of the use cases is the implementation of digital circuits such as adders. Among several logic implementations using memristors, IMPLY logic is one of the promising candidates. In this brief, we present a new architecture for a digital full-adder, which is up to 41% faster than existing IMPLY-based serial designs while requiring up to 78% less area (memristors) compared to the existing parallel design.

...read moreread less

Journal Article•DOI•

Full Adder Circuit Design with Novel Lower Complexity XOR Gate in QCA Technology

[...]

Ali H. Majeed¹, Ali H. Majeed², Mohd Shamian Zainal², Esam Alkaldy¹, Danial Md Nor² - Show less +1 more•Institutions (2)

University of Kufa¹, Universiti Tun Hussein Onn Malaysia²

01 Apr 2020-Transactions on Electrical and Electronic Materials

TL;DR: This paper explains QCA based combinational circuit design; such as half-adder and full-adder, by only one uniform layer of cells, using a novel XOR gate.

...read moreread less

Abstract: Quantum-dot Cellular Automata (QCA) is a new technology for designing digital circuits in Nanoscale. This technology utilizes quantum dots rather than diodes and transistors. QCA supplies a new computation platform, where binary data can be represented by polarized cells, which can define by the electron’s configurations inside the cell. This paper explains QCA based combinational circuit design; such as half-adder and full-adder, by only one uniform layer of cells. The proposed design is accomplished using a novel XOR gate. The proposed XOR gate has a 50% speed improvement and 35% reduction in the number of cells needed over the best reported XOR. The results of QCADesigner software show that the proposed designs have less complexity and less power consumption than previous designs.

...read moreread less

Journal Article•DOI•

Smart Logic-in-Memory Architecture for Low-Power Non-Von Neumann Computing

[...]

Tommaso Zanotti, Francesco Maria Puglisi, Paolo Pavan

14 Apr 2020-IEEE Journal of the Electron Devices Society

TL;DR: This work uses a physics-based compact model to study an innovative smart IMPLY (SIMPLY) logic scheme which exploits the peripheral circuitry embedded in ordinary IMPLy architectures to solve the mentioned reliability issues, drastically reducing the energy consumption and setting clear design strategies.

...read moreread less

Abstract: Low-power smart devices are becoming pervasive in our world. Thus, relevant research efforts are directed to the development of innovative low power computing solutions that enable in-memory computations of logic-operations, thus avoiding the von Neumann bottleneck, i.e., the known showstopper of traditional computing architectures. Emerging non-volatile memory technologies, in particular Resistive Random Access memories, have been shown to be particularly suitable to implement logic-in-memory (LIM) circuits based on the material implication logic (IMPLY). However, RRAM devices non-idealities, logic state degradation, and a narrow design space limit the adoption of this logic scheme. In this work, we use a physics-based compact model to study an innovative smart IMPLY (SIMPLY) logic scheme which exploits the peripheral circuitry embedded in ordinary IMPLY architectures to solve the mentioned reliability issues, drastically reducing the energy consumption and setting clear design strategies. We then use SIMPLY to implement a 1-bit full adder and compare the results with other LIM solutions proposed in the literature.

...read moreread less

Journal Article•DOI•

AddNet: Deep Neural Networks Using FPGA-Optimized Multipliers

[...]

Julian Faraone¹, Martin Kumm², Martin Hardieck³, Peter Zipf³, Xueyuan Liu¹, David Boland¹, Philip H. W. Leong¹ - Show less +3 more•Institutions (3)

University of Sydney¹, Fulda University of Applied Sciences², University of Kassel³

01 Jan 2020-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: It is demonstrated that reconfigurable constant coefficient multipliers (RCCMs) offer a better alternative for saving the silicon area than utilizing low-precision arithmetic for deep-learning applications on field-programmable gate arrays (FPGAs).

...read moreread less

Abstract: Low-precision arithmetic operations to accelerate deep-learning applications on field-programmable gate arrays (FPGAs) have been studied extensively, because they offer the potential to save silicon area or increase throughput. However, these benefits come at the cost of a decrease in accuracy. In this article, we demonstrate that reconfigurable constant coefficient multipliers (RCCMs) offer a better alternative for saving the silicon area than utilizing low-precision arithmetic. RCCMs multiply input values by a restricted choice of coefficients using only adders, subtractors, bit shifts, and multiplexers (MUXes), meaning that they can be heavily optimized for FPGAs. We propose a family of RCCMs tailored to FPGA logic elements to ensure their efficient utilization. To minimize information loss from quantization, we then develop novel training techniques that map the possible coefficient representations of the RCCMs to neural network weight parameter distributions. This enables the usage of the RCCMs in hardware, while maintaining high accuracy. We demonstrate the benefits of these techniques using AlexNet, ResNet-18, and ResNet-50 networks. The resulting implementations achieve up to 50% resource savings over traditional 8-bit quantized networks, translating to significant speedups and power savings. Our RCCM with the lowest resource requirements exceeds 6-bit fixed point accuracy, while all other implementations with RCCMs achieve at least similar accuracy to an 8-bit uniformly quantized design, while achieving significant resource savings.

...read moreread less

Journal Article•DOI•

Gate Diffusion Input technique based full swing and scalable 1-bit hybrid Full Adder for high performance applications

[...]

Mehedi Hasan¹, Mehedi Hasan², Hasan U. Zaman¹, Hasan U. Zaman³, Mainul Hossain⁴, Parag Biswas¹, Sharnali Islam⁴ - Show less +3 more•Institutions (4)

North South University¹, University of Science and Technology Chittagong², Samsung³, University of Dhaka⁴

01 Dec 2020-Engineering Science and Technology, an International Journal

TL;DR: A full-swing high-speed hybrid Full Adder cell based on Gate Diffusion Input technique and Conventional Complementary Metal-Oxide Semiconductor (CCMOS) logic has been proposed and achieved the best performance parameters in large cascaded circuits.

...read moreread less

Journal Article•DOI•

Design of QCA-Serial Parallel Multiplier (QSPM) With Energy Dissipation Analysis

[...]

Ali Newaz Bahar¹, Khan A. Wahid¹•Institutions (1)

University of Saskatchewan¹

01 Oct 2020-IEEE Transactions on Circuits and Systems Ii-express Briefs

TL;DR: An efficient single-layer serial-parallel multiplier (SPM) in Quantum-dot Cellular Automata (QCA) is presented using a bit-serial adder using a fully utilized majority gate (MV), and a modified E-shaped exclusive-OR (E-XOR) gate.

...read moreread less

Abstract: This brief presents an efficient single-layer serial-parallel multiplier (SPM) in Quantum-dot Cellular Automata (QCA). We have designed a bit-serial adder (BSA) using a fully utilized majority gate (MV), and a modified E-shaped exclusive-OR (E-XOR) gate. The cell-interactive properties of the QCA cell have been utilized to realize the proposed E-XOR gate. This new gate leads the proposed SPM to achieve a reduction in cell count and area by 30% and 19%, 29% and 24%, 30% and 22%, 32% and 39%, and 36% and 46% for 4-, 8-, 16-, 32-, and 64-bit multipliers, respectively. All proposed circuits have been simulated and verified by using QCADesigner with a coherence vector simulation engine. In addition, the average switching and leakage energy dissipation are estimated using QCAPro tool.

...read moreread less

Journal Article•DOI•

A Memristive Multiplier Using Semi-Serial IMPLY-Based Adder

[...]

David Radakovits¹, Nima TaheriNejad¹, Mengye Cai², Theophile Delaroche³, Shahriar Mirabbasi² - Show less +1 more•Institutions (3)

Vienna University of Technology¹, University of British Columbia², University of Bordeaux³

23 Jan 2020-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: This work presents a semi-serial IMPLY-based adder, and proposes an IMPLy-based multiplier, which is shown to be more than $\mathbf {5\times }$ better than other works based on the figure of merit which gives equal weight to the number of steps and required die area.

...read moreread less

Abstract: Memristors are among emerging technologies with many promising features, which makes them suitable not only for storage purposes but also for computations. In this work, focusing on in-memory computations, we first present our semi-serial IMPLY-based adder and perform an extensive analysis of its merits. In addition to providing a favorable balance between the number of steps and number of memristors, a key property of the presented adder is its compactness as compared to the state-ofthe-art adders. Next, using our semi-serial adder, we propose an IMPLY-based multiplier. We show that the proposed multiplier is more than 5× better than other works based on the figure of merit which gives equal weight to the number of steps (i.e., speed) and required die area. Additionally, we provide a deeper insight into IMPLY-based arithmetic units, their properties, design characteristics, and advantages or disadvantages compared to one another by proposing new figures of merit and performing comprehensive comparative analyses. This facilitates the process of design, or selection, of suitable units for the design engineers and researchers in the field.

...read moreread less

Journal Article•DOI•

Practical Implementation of Multichannel Filtered-x Least Mean Square Algorithm Based on the Multiple-Parallel-Branch With Folding Architecture for Large-Scale Active Noise Control

[...]

Dongyuan Shi¹, Woon-Seng Gan¹, Jianjun He, Bhan Lam¹•Institutions (1)

Nanyang Technological University¹

01 Apr 2020-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: A novel architecture based on multiple-parallel-branch with folding (MPBF) technique is proposed, which parallelizes the branches and reuses the multiplier and adder in each folded branch so that the tradeoff between throughput and the usage of the hardware resources is balanced.

...read moreread less

Abstract: Multichannel active noise control (MCANC) is widely recognized as an effective and efficient solution for acoustic noise and vibration cancellation, such as in high-dimensional ventilation ducts, open windows, and mechanical structures. The feedforward multichannel filtered-x least mean square (FFMCFxLMS) algorithm is commonly used to dynamically adjust the transfer function of the multichannel controllers for different noise environments. The computational load incurred by the FFMCFxLMS algorithm, however, increases exponentially with increasing channel count, thus requiring high-end field-programmable gate array (FPGA) processors. Nevertheless, such processors still need specific configurations to cope with soaring computing loads as the channel count increases. To achieve a high-efficiency implementation of the FFMCFxLMS algorithm with floating-point arithmetic, a novel architecture based on multiple-parallel-branch with folding (MPBF) technique is proposed. This architecture parallelizes the branches and reuses the multiplier and adder in each folded branch so that the tradeoff between throughput and the usage of the hardware resources is balanced. The proposed architecture is validated in an experimental setup that implements the FFMCFxLMS algorithm for the MCANC system with 24 reference sensors, 24 secondary sources, and 24 error sensors, at a sampling and throughput rates of 25 kHz and 260 Mb/s, respectively.

...read moreread less

Journal Article•DOI•

Compact FeFET Circuit Building Blocks for Fast and Efficient Nonvolatile Logic-in-Memory

[...]

Evelyn T. Breyer¹, Halid Mulaosmanovic, Jens Trommer, Thomas Melde², Stefan Dunkel², Martin Trentzsch², Sven Beyer², Stefan Slesazeck, Thomas Mikolajick¹ - Show less +5 more•Institutions (2)

Dresden University of Technology¹, GlobalFoundries²

10 Apr 2020-IEEE Journal of the Electron Devices Society

TL;DR: This paper shows that the ultra-dense co-integration of FeFETs and nFets (28nm HKMG) with shared active area does not alter the FeFet’s switching behavior, nor does it affect the baseline CMOS.

...read moreread less

Abstract: Due to their CMOS compatibility, hafnium oxide based ferroelectric field-effect transistors (FeFET) gained remarkable attention recently, not only in the context of nonvolatile memory applications but also for being an auspicious candidate for novel combined memory and logic applications. In addition to bringing nonvolatility into existing logic circuits (Memory-in-Logic), FeFETs promise to guide the way to compact Logic-in-Memory solutions, where logic computations are examined in memory arrays or array-like structures. To increase the area-efficiency of such circuits, a dense integration of FeFETs and standard FETs is essential. In this paper, we show that the ultra-dense co-integration of FeFETs and nFETs (28nm HKMG) with shared active area does not alter the FeFET’s switching behavior, nor does it affect the baseline CMOS. Based on this, we propose the integration of a FeFET-based, 2-input look-up table (memory) directly into a 4-to-1 multiplexer (logic), which is utilized directly in a 2TNOR memory array or stand-alone circuit. The latter one dramatically reduces the transistor count by at least 33% compared to similar FeFET-based circuits. By storing values of the look-up table in a nonvolatile manner, no energy is consumed during standby mode, which enables normally-off computing. To take another step towards novel Logic-in-Memory designs, we experimentally demonstrate a very compact in-array 2T half adder and simulate an array-like 14T full adder, which exploit the advantages of the array arrangement: easy write procedure and a very compact, robust design. The proposed circuits exhibit energy-efficiency in the (sub)fJ-range and operation speeds of 1GHz.

...read moreread less

Posted Content•

Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework.

[...]

Sung-En Chang¹, Yanyu Li¹, Mengshu Sun¹, Runbin Shi², Hayden K.-H. So², Xuehai Qian³, Yanzhi Wang¹, Xue Lin¹ - Show less +4 more•Institutions (3)

Northeastern University¹, University of Hong Kong², University of Southern California³

08 Dec 2020-arXiv: Learning

TL;DR: The first solution that applies different quantization schemes for different rows of the weight matrix, and an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes can maintain, or even increase accuracy due to better matching with weight distributions.

...read moreread less

Abstract: Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices. This paper focuses on weight quantization, a hardware-friendly model compression approach that is complementary to weight pruning. Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of heterogeneous FPGA hardware resources. To achieve that, we first propose a hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) suitable for Gaussian-like weight distribution, in which the multiplication arithmetic can be replaced with logic shifter and adder, thereby enabling highly efficient implementations with the FPGA LUT resources. In contrast, the existing fixed-point quantization is suitable for Uniform-like weight distribution and can be implemented efficiently by DSP. Then to fully explore the resources, we propose an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes. Combining the two schemes can maintain, or even increase accuracy due to better matching with weight distributions.

...read moreread less

Proceedings Article•DOI•

LightBulb: a photonic-nonvolatile-memory-based accelerator for binarized convolutional neural networks

[...]

Farzaneh Zokaee¹, Qian Lou¹, Nathan Youngblood², Weichen Liu³, Yiyuan Xie⁴, Lei Jiang¹ - Show less +2 more•Institutions (4)

Indiana University¹, University of Pittsburgh², Nanyang Technological University³, Southwest University⁴

09 Mar 2020

TL;DR: A photonic nonvolatile memory (NVM)-based accelerator, LightBulb, is proposed to process binarized CNNs by high frequency photonic XNOR gates and popcount units and adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency.

...read moreread less

Abstract: Although Convolutional Neural Networks (CNNs) have demonstrated the state-of-the-art inference accuracy in various intelligent applications, each CNN inference involves millions of expensive floating point multiply-accumulate (MAC) operations To energy-efficiently process CNN inferences, prior work proposes an electro-optical accelerator to process power-of-2 quantized CNNs by electro-optical ripple-carry adders and optical binary shifters The electro-optical accelerator also uses SRAM registers to store intermediate data However, electro-optical ripple-carry adders and SRAMs seriously limit the operating frequency and inference throughput of the electro-optical accelerator, due to the long critical path of the adder and the long access latency of SRAMs In this paper, we propose a photonic nonvolatile memory (NVM)-based accelerator, Light-Bulb, to process binarized CNNs by high frequency photonic XNOR gates and popcount units LightBulb also adopts photonic racetrack memory to serve as input/output registers to achieve high operating frequency Compared to prior electro-optical accelerators, on average, LightBulb improves the CNN inference throughput by 17× ~ 173× and the inference throughput per Watt by 175 × ~ 660×

...read moreread less

Journal Article•DOI•

Carbon Nanotube and Resistive Random Access Memory Based Unbalanced Ternary Logic Gates and Basic Arithmetic Circuits

[...]

Furqan Zahoor¹, Tun Zainal Azni Zulkifli¹, Farooq Ahmad Khanday², Sohiful Anuar Zainol Murad•Institutions (2)

Universiti Teknologi Petronas¹, University of Kashmir²

26 May 2020-IEEE Access

TL;DR: This work aims to demonstrate the viability of RRAM in the design of ternary logic systems and shows a very small variation in power consumption and energy consumption with variation in process parameters, temperature, output load, supply voltage and operating frequency.

...read moreread less

Abstract: In this paper, the design of ternary logic gates (standard ternary inverter, ternary NAND, ternary NOR) based on carbon nanotube field effect transistor (CNTFET) and resistive random access memory (RRAM) is proposed. Ternary logic has emerged as a very promising alternative to the existing binary logic systems owing to its energy efficiency, operating speed, information density and reduced circuit overheads such as interconnects and chip area. The proposed design employs active load RRAM and CNTFET instead of large resistors to implement ternary logic gates. The proposed ternary logic gates are then utilised to carry out basic arithmetic functions and is extendable to implement additional complex functions. The proposed ternary gates show significant advantages in terms of component count, chip area, power consumption, energy consumption and dense fabrication. The results demonstrate the advantage of the proposed models with a reduction of 50% in transistor count for the STI, TNAND and TNOR logic gates. For THA and THS arithmetic modules 65.11% reduction in transistor count is observed while for TM design, around 38% reduction is observed. In this work, we aim to demonstrate the viability of RRAM in the design of ternary logic systems, thus the focus is mainly on obtaining the proper functionality of the proposed design. Also the proposed logic gates show a very small variation in power consumption and energy consumption with variation in process parameters, temperature, output load, supply voltage and operating frequency. For simulations, HSPICE tool is used to verify the authenticity of the proposed designs. The ternary half adder, ternary half subtractor and ternary multiplier circuits are then implemented utilising the proposed gates and validated through simulations.

...read moreread less

Journal Article•DOI•

A Cross-Layer Gate-Level-to-Application Co-Simulation for Design Space Exploration of Approximate Circuits in HEVC Video Encoders

[...]

Guilherme Paim¹, Leandro M. G. Rocha¹, Hussam Amrouch², Eduardo Costa³, Sergio Bampi¹, Jorg Henkel² - Show less +2 more•Institutions (3)

Universidade Federal do Rio Grande do Sul¹, Karlsruhe Institute of Technology², Universidade Católica de Pelotas³

01 Oct 2020-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: The approach shows that the lower-part-or and error-tolerant adder I approximate adders, as well as truncation-to-zero deliver better compression-power trade-offs, with substantial differences from the static analysis.

...read moreread less

Abstract: A cross-layer design space exploration (DSE) method based on a proposed co-simulation technique is presented herein. The proposed method is demonstrated evaluating the impacts on both coding efficiency and power dissipation of applying distinct approximate logic operators in a s $\mu {\mathrm{ m}}$ of absolute differences (SAD) kernel that accelerates an H.265/HEVC (high-efficiency video coding) encoder. The proposed method simulates the gate-level circuit dynamically inside the application, with realistic results of the impact of the adder-tree approximate logic implementation on both quality and encoder bit-rate results. A comprehensive DSE is shown herein, with 13 types of 6 classes of approximate adders in the SAD accelerator hardware blocks. Over 3,000 logic variants of approximations at gate-level were developed. Actual video sequences as inputs to the x265 software encoder are co-simulated, to dynamically capture the video motion-estimation (ME) behavior in the presence of logic approximations. While the prior art that only estimates the impact of the approximate logic on power, area, and quality on static designs with statistical assumptions, which are agnostic to the actual algorithm data-dependent behavior in the application, our method explores accurately the trade-off between power dissipation and coding efficiency dynamically over the entire HEVC encoding. Our approach shows that the lower-part-or and error-tolerant adder I approximate adders, as well as truncation-to-zero deliver better compression-power trade-offs, with substantial differences from the static analysis.

...read moreread less

Journal Article•DOI•

Design of High-Performance QCA Incrementer/Decrementer Circuit Based on Adder/Subtractor Methodology

[...]

Nuriddin Safoev¹, Jun-Cheol Jeon¹•Institutions (1)

Kumoh National Institute of Technology¹

01 Feb 2020-Microprocessors and Microsystems

TL;DR: A novel design of an adder/subtractor-based incrementer/decrementer using quantum-dot cellular automata (QCA) technology is focused on, which shows an improvement in area usage and latency compared to its existing counterpart.

...read moreread less

Journal Article•DOI•

An Optimistic Design of 16-Tap FIR Filter with Radix-4 Booth Multiplier Using Improved Booth Recoding Algorithm

[...]

M. Sakthimohan¹, J. Deny¹•Institutions (1)

Kalasalingam University¹

19 Nov 2020-Microprocessors and Microsystems

TL;DR: Results show that the Improved Booth multiplier-based FIR (radix-4) filter leads to smallest power and area, and the proposed multiplier architecture helps to minimize the number steps in multiplication and also in digital circuits decrease the propagation delay.

...read moreread less

Journal Article•DOI•

Power Efficient Tiny Yolo CNN Using Reduced Hardware Resources Based on Booth Multiplier and WALLACE Tree Adders

[...]

Fasih Ud Din Farrukh¹, Chun Zhang¹, Yancao Jiang¹, Zhonghan Zhang¹, Ziqiang Wang¹, Zhihua Wang¹, Hanjun Jiang¹ - Show less +3 more•Institutions (1)

Tsinghua University¹

07 Jul 2020

TL;DR: A new processing element design is provided as an alternate solution for hardware implementation of CNN accelerator design that allows to reduce hardware costs by 24.5% achieving a power efficiency of 61.64 GOP/s/W that outperforms the previous designs.

...read moreread less

Abstract: Convolutional Neural Network (CNN) has attained high accuracy and it has been widely employed in image recognition tasks. In recent times, deep learning-based modern applications are evolving and it poses a challenge in research and development of hardware implementation. Therefore, hardware optimization for efficient accelerator design of CNN remains a challenging task. A key component of the accelerator design is a processing element (PE) that implements the convolution operation. To reduce the amount of hardware resources and power consumption, this article provides a new processing element design as an alternate solution for hardware implementation. Modified BOOTH encoding (MBE) multiplier and WALLACE tree-based adders are proposed to replace bulky MAC units and typical adder tree respectively. The proposed CNN accelerator design is tested on Zynq-706 FPGA board which achieves a throughput of 87.03 GOP/s for Tiny-YOLO-v2 architecture. The proposed design allows to reduce hardware costs by 24.5% achieving a power efficiency of 61.64 GOP/s/W that outperforms the previous designs.

...read moreread less

Collapse