Showing papers on "Arithmetic logic unit published in 2021"

PDF

Open Access

Journal Article•DOI•

64-GHz Datapath Demonstration for Bit-Parallel SFQ Microprocessors Based on a Gate-Level-Pipeline Structure

[...]

Ryota Kashima¹, Ikki Nagaoka¹, Masamitsu Tanaka¹, Taro Yamashita¹, Akira Fujimaki¹ - Show less +1 more•Institutions (1)

23 Feb 2021-IEEE Transactions on Applied Superconductivity

TL;DR: In this article, an 8-bit wide, bit-parallel datapath composed of an arithmetic logic unit and register files for high-throughput oriented SFQ microprocessors based on a gate-level-pipeline structure was demonstrated.

...read moreread less

Abstract: We successfully demonstrated an 8-bit-wide, bit-parallel datapath composed of an arithmetic logic unit and register files for high-throughput oriented SFQ microprocessors based on a gate-level-pipeline structure. Achieving high-speed operation in the bit-parallel datapath is difficult because of feedback paths. We used concurrent-flow clocking and counter-flow clocking in combination to solve the timing problem at the feedback path in the datapath, and we optimized the number of JJs and pipeline stages in the register file for solving the timing issue. We designed the datapath with the cell library for the AIST 10 kA/cm $^2$ Advanced Process. The total number of pipeline stages, Josephson junctions, and circuit area of the designed datapath were 52, 18448, and 3.81 mm × 4.05 mm, respectively. We obtained a relatively wide bias margin of the designed datapath at the target clock frequency of 50 GHz, and it operated up to 64 GHz in on-chip high-speed testing.

...read moreread less

26 citations

Journal Article•DOI•

Novel design and simulation of reversible ALU in quantum dot cellular automata

[...]

Behrouz Safaiezadeh¹, Ebrahim Mahdipour¹, Majid Haghparast¹, Samira Sayedsalehi¹, Mehdi Hosseinzadeh² - Show less +1 more•Institutions (2)

Islamic Azad University¹, Duy Tan University²

02 Jun 2021-The Journal of Supercomputing

TL;DR: This paper presents a QCA technology-based reversible ALU unit using basic reversible blocks and a novel reversible block namely BS1 Block, which performs logic and arithmetic operations in the proposed scheme.

...read moreread less

Abstract: Quantum dot cellular automata (QCA) technology is considered as one of the most suitable replacements to reduce the CMOS-based digital circuit design problems at the nanoscale due to its tiny size, fast, latency and very low power consumption. One of the main components of microprocessors is the arithmetic logic unit (ALU) and in other words, it acts as the heart of microprocessors. This paper presents a QCA technology-based reversible ALU unit using basic reversible blocks and a novel reversible block namely BS1 Block. The proposed block performs logic and arithmetic operations in the proposed scheme. The simulations of the proposed design are carried out by QCA Designer. According to the simulated results, the proposed structure has a 35%, 27% and 30% improvement in quantum cost, the number of cells and the occupied area in comparison to the previous conducted researches, respectively.

...read moreread less

21 citations

Journal Article•DOI•

A RISC-V Post Quantum Cryptography Instruction Set Extension for Number Theoretic Transform to Speed-Up CRYSTALS Algorithms

[...]

Pietro Nannipieri¹, Stefano Di Matteo¹, Luca Zulberti¹, Francesco Albicocchi¹, Sergio Saponara¹, Luca Fanucci¹ - Show less +2 more•Institutions (1)

University of Pisa¹

08 Nov 2021-IEEE Access

TL;DR: In this paper, the authors investigate a solution trading-off performance and complexity to execute the lattice-based algorithms CRYSTALS-Kyber and -Dilithium.

...read moreread less

Abstract: In recent years, public-key cryptography has become a fundamental component of digital infrastructures. Such a scenario has to face a new and increasing threat, represented by quantum computers. It is well known that quantum computers in the next years will be able to run algorithms capable of breaking the security of currently widespread cryptographic schemes used for public-key cryptography. Post-quantum cryptography aims to define and execute algorithms on classical computer architectures, able to withstand attacks from quantum computers. The National Institute of Standards and Technology is currently running a selection process to define one or more quantum-resistant public-key algorithms and lattice-based cryptographic constructions are considered one of the leading candidates. However, such algorithms require non-negligible computational resources to be executed. One viable solution is to accelerate them totally or partially in hardware, to alleviate the workload of the main processing unit. In this paper, we investigate a solution trading-off performance and complexity to execute the lattice-based algorithms CRYSTALS-Kyber and -Dilithium: we introduce a dedicated Post-Quantum Arithmetic Logic Unit, embedded directly in the pipeline of a RISC-V processor. This results in an almost negligible area overhead with a large impact on the algorithms speed-up and a consistent reduction in the energy required per single operation.

...read moreread less

14 citations

Journal Article•DOI•

Development of Superconductor Advanced Integrated Circuit Design Flow Using Synopsys Tools

[...]

Amol Inamdar, Jushya Ravi, Sukanya Sagarika Meher, Stephen Miller, M. Eren Celik, A. Erik Lehmann, S.C. Lo¹, Aaron Barker¹, Stephen Whiteley¹, Nisha Johnson¹, Ron Duncan¹, Sidd Devalapalli¹, Neel Gopalan¹, Timur V. Fillipov, Deepnarayan Gupta - Show less +11 more•Institutions (1)

Synopsys¹

09 Feb 2021-IEEE Transactions on Applied Superconductivity

TL;DR: Using the 64-Bit arithmetic logic unit and a Pseudo Random Bit Sequence generator as reference circuits, the use of Synopsys tools for superconductor IC design is demonstrated including spice circuit simulations, plotting waveforms, margins analysis, Monte-Carlo simulations, HDL simulations with timing back-annotation, and IC validation including design rule checker and layout-versus-schematic checker.

...read moreread less

Abstract: HYPRES developed an advanced design flow and design infrastructure for single-flux-quantum (SFQ) superconductor integrated circuits using standard CMOS based EDA tools along with internally developed tools and has been successfully using this flow for the past several years The design infrastructure includes the process design kit, advanced simulation methodology, and IC verification rule decks The superconductor hierarchical circuit analyzer developed by HYPRES serves as bedrock of our simulation methodology facilitating circuit analysis and debugging including extraction of circuit parameter margins, analysis of Monte-Carlo simulations with process corners, as well as automated timing characterization Using this proven design flow and infrastructure as a knowledge source, we have collaborated with Synopsys to enhance their tools for a full native tool enabled design flow and infrastructure, which represents a significant expansion in design capabilities and capacity for superconducting electronics Using the 64-Bit arithmetic logic unit and a Pseudo Random Bit Sequence (PRBS) generator as reference circuits, we demonstrate the use of Synopsys tools for superconductor IC design including spice circuit simulations, plotting waveforms, margins analysis, Monte-Carlo simulations, HDL simulations with timing back-annotation, and IC validation including design rule checker and layout-versus-schematic checker

...read moreread less

13 citations

Journal Article•DOI•

A High-Performance Multimem SHA-256 Accelerator for Society 5.0

[...]

Thi Hong Tran¹, Hoai Luan Pham¹, Yasuhiko Nakashima¹•Institutions (1)

Nara Institute of Science and Technology¹

02 Mar 2021-IEEE Access

TL;DR: In this paper, a multimem SHA-256 accelerator was proposed to reduce the critical path delay and significantly increase the processing rate of the SHA algorithm at the system-on-chip (SoC) level.

...read moreread less

Abstract: The development of a low-cost high-performance secure hash algorithm (SHA)-256 accelerator has recently received extensive interest because SHA-256 is important in widespread applications, such as cryptocurrencies, data security, data integrity, and digital signatures. Unfortunately, most current researches have focused on the performance of the SHA-256 accelerator but not on a system level, in which the data transfer between the external memory and accelerator occupies a large time fraction. In this paper, we solve the state-of-art problem with a novel SHA-256 architecture named the multimem SHA-256 accelerator that achieves high performance at the system on chip (SoC) level. Notably, our accelerator employs three novel techniques, the pipelined arithmetic logic unit (ALU), multimem processing element (PE), and shift buffer in shift buffer out (SBi-SBo), to reduce the critical path delay and significantly increase the processing rate. Experiments on a field-programmable gate array (FPGA) and an application-specific integrated circuit (ASIC) show that the proposed accelerator achieves significantly better processing rate and hardware efficiency than previous works. The accelerator accuracy is verified on a real hardware platform (FPGA ZCU102). The accelerator is synthesized and laid out with 180 nm complementary metal oxide semiconductor (CMOS) technology with a chip sized $8.5\,mm \times 8.5\,mm$ , consumes 1.86 W, and provides a maximum processing rate of 40.96 Gbps at 80 MHz and 1.8 V. With FPGA Xilinx 16 nm FinFET technology, the accelerator processing rate is as high as 284 Gbps.

...read moreread less

11 citations

Journal Article•DOI•

Micro-ring resonator based all-optical Arithmetic and Logical Unit

[...]

Nathrao B. Jadhav¹, Rajas Bhagat², Sanika Paranjpe², Saurabh Dahitule², Siddhi Madke², Savitri Jadhav¹ - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Maharashtra Institute of Technology²

01 Oct 2021-Optik

TL;DR: An all-optical ALU exclusively using Micro-Ring Resonator (MRR) to perform addition and comparison is proposed, which makes the proposed design novel and easy to integrate with VLSIO.

...read moreread less

9 citations

Book Chapter•DOI•

Design of 1-Bit FinFET Sum Circuit for Computational Applications

[...]

Chandrashekar Pittala, Vallabhuni Vijay

25 Feb 2021

TL;DR: In this paper, a sum module is designed to give a high driving capability to the next adder circuits in multi-bit addition, the circuit used in sum module must consume less amount of power.

...read moreread less

Abstract: In the past years applications in the digital signal processing, image processing, microprocessors and logical operations are executed using Arithmetic logic unit (ALU) by considering the stunning features of the FinFET. FinFET technology is booming in the integrated circuits and chip manufacturing industries. The 1-bit FinFET full adder categorized into three modules XOR module, Sum module and the Carry module. In this paper, a sum module is designed. The main requirement of 1-bit sum module is to give a high driving capability to the next adder circuits in multi-bit addition. The circuit used in sum module must consume less amount of power. The proposed full adder circuit is designed by Cadence virtuoso tools and FinFET 20 nm technology. The proposed circuit design consume low power, less delay and reduced chip area. Other performance characteristics analysis such as time delay, power consumption is done by comparing with the standard circuits.

...read moreread less

7 citations

Journal Article•DOI•

Ternary Arithmetic Logic Unit Design Utilizing Carbon Nanotube Field Effect Transistor (CNTFET) and Resistive Random Access Memory (RRAM).

[...]

Furqan Zahoor¹, Fawnizu Azmadi Hussin¹, Farooq Ahmad Khanday², M. R. Ahmad¹, Illani Mohd Nawi¹ - Show less +1 more•Institutions (2)

Universiti Teknologi Petronas¹, University of Kashmir²

21 Oct 2021-Micromachines

TL;DR: The simulation results indicate that the proposed CNTFET-RRAM integration enables the compact circuit realization with good robustness, and due to the addition of RRAM as circuit element, the proposed ALU has the advantage of non-volatility.

...read moreread less

Abstract: Due to the difficulties associated with scaling of silicon transistors, various technologies beyond binary logic processing are actively being investigated. Ternary logic circuit implementation with carbon nanotube field effect transistors (CNTFETs) and resistive random access memory (RRAM) integration is considered as a possible technology option. CNTFETs are currently being preferred for implementing ternary circuits due to their desirable multiple threshold voltage and geometry-dependent properties, whereas the RRAM is used due to its multilevel cell capability which enables storage of multiple resistance states within a single cell. This article presents the 2-trit arithmetic logic unit (ALU) design using CNTFETs and RRAM as the design elements. The proposed ALU incorporates a transmission gate block, a function select block, and various ternary function processing modules. The ALU design optimization is achieved by introducing a controlled ternary adder–subtractor module instead of separate adder and subtractor circuits. The simulations are analyzed and validated using Synopsis HSPICE simulation software with standard 32 nm CNTFET technology under different operating conditions (supply voltages) to test the robustness of the designs. The simulation results indicate that the proposed CNTFET-RRAM integration enables the compact circuit realization with good robustness. Moreover, due to the addition of RRAM as circuit element, the proposed ALU has the advantage of non-volatility.

...read moreread less

6 citations

Journal Article•DOI•

Design and testing of a reversible ALU by quantum cells automata electro-spin technology

[...]

Rupsa Roy¹, Swarup Sarkar¹, Sourav Dhar¹•Institutions (1)

Sikkim Manipal University¹

29 Apr 2021-The Journal of Supercomputing

TL;DR: The design of a novel multilayer portable, dynamic, fault-tolerant, power-efficient, thermally stable reversible ALU is proposed which is explored through QCA-ES and area density, delay, fault tolerance and thermal stability are investigated.

...read moreread less

Abstract: Arithmetic logic unit (ALU), a core component of a processor, is one of the thrust areas of the current research. Presently, ALU is designed by transistor-based CMOS technique and its individual components are placed in different layers. The current design is affected by the limitations of Moore’s law and design complexity. At present, ‘Quantum cellular automata electro-spin (QCA-ES)’ technology is widely accepted technology as an alternative of ‘CMOS’ to minimize the above discussed problems. In this research paper, the design of a novel multilayer portable, dynamic, fault-tolerant, power-efficient, thermally stable reversible ALU is proposed which is explored through QCA-ES. All the arithmetic and logical components of ALU are separately placed in different layers. Area density, delay, fault tolerance and thermal stability are investigated. A specific type of gate, known as reversible gate (modified 3:3 ‘TSG’ gate), is used in this proposed design with QCA technology to get the optimized design ALU with low occupied area, complexity, delay and power dissipation. Investigation of a fault-free design and saturated amplitude level (of output) change with respect to temperature increment in the proposed device are also discussed in this paper. Not only the thermal stability (up to 6 k temperature) but also an investigation on cell complexity of the 100% fault-free (against multiple cell-omission, cell-displacement, cell-orientation change and extra cell deposition), multilayer nano-device is represented in this work. ‘QCA-Designer’ software is used in this research work to design and develop layout of the proposed components in quantum-field and find out the occupied area, delay and complexity of proposed design. ‘QCA-Pro’ software is used for getting the value of dissipated power.

...read moreread less

6 citations

Journal Article•DOI•

Z-domain mathematical modeling and performance analysis of ripple ring resonator (RRR) with design of all-optical arithmetic logic unit (ALU)

[...]

Kuldeep Singh¹, Kuldeep Singh², Sanjoy Mandal¹•Institutions (2)

Indian Institutes of Technology¹, Galgotia's College of Engineering and Technology²

01 Apr 2021-Optik

TL;DR: The proposed ripple ring resonator (RRR) structure is proposed and analyzed by coupling two and three optical microring resonators (OMRRs), giving promising results and has integration compatibility with complementary metal–oxide semiconductor (CMOS) photonic technology.

...read moreread less

5 citations

Journal Article•DOI•

Robust Circuit and System Design for General-Purpose Computational Resistive Memories

[...]

Felipe Pinto, Ioannis Vourkas

01 May 2021-Electronics

TL;DR: This work built upon the 1T1R crossbar topology and adopted a logic design style in which all computations are equivalent to modified memory read operations for higher reliability, performed either in a word-wise or bit-wise manner, owing to an enhanced peripheral circuitry.

...read moreread less

Abstract: Resistive switching devices (memristors) constitute a promising device technology that has emerged for the development of future energy-efficient general-purpose computational memories. Research has been done both at device and circuit level for the realization of primitive logic operations with memristors. Likewise, important efforts are placed on the development of logic synthesis algorithms for resistive RAM (ReRAM)-based computing. However, system-level design of computational memories has not been given significant consideration, and developing arithmetic logic unit (ALU) functionality entirely using ReRAM-based word-wise arithmetic operations remains a challenging task. In this context, we present our results in circuit- and system-level design, towards implementing a ReRAM-based general-purpose computational memory with ALU functionality. We built upon the 1T1R crossbar topology and adopted a logic design style in which all computations are equivalent to modified memory read operations for higher reliability, performed either in a word-wise or bit-wise manner, owing to an enhanced peripheral circuitry. Moreover, we present the concept of a segmented ReRAM architecture with functional and topological features that benefit flexibility of data movement and improve latency of multi-level (sequential) in-memory computations. Robust system functionality is validated via LTspice circuit simulations for an n-bit word-wise binary adder, showing promising performance features compared to other state-of-the-art implementations.

...read moreread less

Journal Article•DOI•

Designing digital circuits using 3D nanomagnetic logic architectures

[...]

Bandan Kumar Bhoi¹, Nirupma Pathak, Santosh Kumar, Neeraj Kumar Misra²•Institutions (2)

Veer Surendra Sai University of Technology¹, Techno India²

01 Jun 2021-Journal of Computational Electronics

TL;DR: In this paper, a 3D Ex-OR, parity generator, parity checker, multiplexer, and arithmetic logic unit (ALU) functionality are synthesized using pNML technology.

...read moreread less

Abstract: The approach to designing digital circuits using three-dimensional (3D) perpendicular nanomagnetic logic (pNML) is thoroughly investigated Nanomagnetic logic (NML) technology eventually optimizes the circuit performance in comparison with conventional metal–oxide–semiconductor (MOS) technology, which suffers from the hot carrier, velocity saturation, and short-channel effects, which may considerably degrade device performance In contrast, nanomagnetic logic is immune to radiation; it behaves as nonvolatile memory and shows zero leakage current, as required for use in high-speed and low-cost nanoelectronics applications In this paper, novel and organized designs, eg, for 3D Ex-OR, parity generator, parity checker, multiplexer, and arithmetic logic unit (ALU) functionality, are synthesized using pNML technology Previous designs are not compact in terms of delay, layer count, or bounded area To overcome this, new designs for the mentioned functionalities are proposed based on pNML with smaller area and lower latency compared with previous circuits

...read moreread less

Journal Article•DOI•

RISC-V3: A RISC-V Compatible CPU With a Data Path Based on Redundant Number Systems

[...]

Marc Reichenbach¹, Johannes Knodtel¹, Sebastian Rachuj¹, Dietmar Fey¹•Institutions (1)

University of Erlangen-Nuremberg¹

01 Jan 2021-IEEE Access

TL;DR: RISC-V3 as discussed by the authors uses an RNS number representation internally to speed up instruction execution times and therefore increase the system performance, but it does not have a flags register which is expensive to calculate when using RNS.

...read moreread less

Abstract: Redundant number systems (RNS) are a well-known technique to speed up arithmetic circuits. However, in a complete CPU, arithmetic circuits using RNS were only included on subcircuit level e.g. inside the Arithmetic Logic Unit (ALU) for realization of the division. Still, extending this approach to create a CPU with a complete data path based on RNS can be beneficial for speeding up data processing, due to avoiding conversions in the ALU between RNS and binary number representations. Therefore, with this paper we present a new CPU architecture called RISC-V3 which is compatible to the RISC-V instruction set, but uses an RNS number representation internally to speed up instruction execution times and therefore increase the system performance. RISC-V is very suitable for RNS because it does not have a flags register which is expensive to calculate when using an RNS. To present reliable performance numbers, arithmetic circuits using RNS were realized in different semiconductor technologies. Moreover, an instruction set simulator was used to estimate system performance for a benchmark suite (Embench). Our results show, that we are up to 81% faster with the RISC-V3 architecture compared to a binary one, depending on the executed benchmark and CMOS technology.

...read moreread less

Proceedings Article•DOI•

Implementation of High Performance 4-Bit ALU using Dual Mode Pass Transistor Logic

[...]

Anurag Chauhan¹, Kirandeep Kaur Saini¹, Nitin Rajput¹, Rushil Domah¹•Institutions (1)

Delhi Technological University¹

25 Jun 2021

TL;DR: A four bit arithmetic logic unit which is energy efficient and temperature invariant implemented using the dual mode pass transistor logic using Cadence® Virtuoso® Schematic Editor is presented.

...read moreread less

Abstract: In this paper, we present a four bit arithmetic logic unit which is energy efficient and temperature invariant implemented using the dual mode pass transistor logic. The basic logic gates such as NOR and NAND are designed using both CMOS logic and dual mode pass transistor logic and are used in the proposed design. Simulations performed demonstrated that DMPL can reduce the computed worst case delay by 42.39%and 39.13%for NOR and NAND gates respectively in dynamic mode and average power dissipation by 67.96%and 24.09%for NOR and NAND gates respectively in static mode. In the implemented Arithmetic and Logic Unit, we observe a reduction in worst case delay and average power dissipation by 62.67%and 28.28%. The proposed logic was implemented in 90nm bulk technology using Cadence® Virtuoso® Schematic Editor.

...read moreread less

Proceedings Article•DOI•

Area Efficient Multilayer Arithmetic Logic Unit Implementation in Quantum-dot Cellular Automata

[...]

K. Swetha, K. Lokesh Krishna, J.V.S. Sowmya, D. Srinivasulu Reddy, G. Pravallika, G.Anil Kumar - Show less +2 more

04 Feb 2021

TL;DR: In this article, a QCA constructed full adder logic circuit based on five input majority gates is designed and simulated, and a 2:1 multiplexer is implemented in multilayer 1-bit ALU to reduce the overall cell count.

...read moreread less

Abstract: Quantum-dot Cellular Automata (QCA) is an innovative favorable calculation pattern nanoscale technology to design arithmetic digital circuits of any size and can be considered as a proper best alternative to digital Complementary Metal Oxide Semiconductor process Also QCA provides a promising key for refining the computation results in numerous computational applications with high compaction density, and proficient in carrying out computations at ultra-high switching speeds The key component present in Central Processing Unit is Arithmetic Logic Unit (ALU) which is used to execute several operations like arithmetic and logical operations, and the design of full adder circuits and low order multiplexer circuits of different sizes are important In this proposed work, a QCA constructed full adder logic circuit based on five input majority gate is designed and simulated A QCA constructed multilayer 1- bit ALU structure is designed in this paper which can implement both logical and arithmetic operations To perform arithmetic operations a novel 2:1 multiplexer is proposed with reduced number of cell counts Therefore, a 2:1 multiplexer is implemented in multilayer 1-bit ALU to reduce the overall cell count Hence this proposed design of full-adder and other universal gates uses a reduced number of cells which in turn leads to reduced circuit size, and low power dissipation than the previous designs The proposed architecture is simulated and verified using QCAD tool The simulation results show that this proposed work shows superior performance in terms of circuit area and less number of cell counts

...read moreread less

Journal Article•DOI•

Designing Bi-directional Counters Using Quantum-dot Cellular Automata Nanotechnology

[...]

Zaman Amirzadeh, Mohammad Gholami¹•Institutions (1)

University of Mazandaran¹

01 Apr 2021-International Journal of Engineering

TL;DR: The proposed structure for T-latch has a lower number of cells, occupied area and lower power consumption than existing methods, and up-down circuits are designed for the first time in QCA technology.

...read moreread less

Abstract: One of the major problems in designing highly compact integrated circuits is the power consumption of the circuits. Therefore, several technologies have been introduced to overcome the problems facing MOSFET technology. One of these technologies is the Quantum-Dot Cellular Atomata (QCA), which has several advantages. In this paper, we focus on computational logic gates based on the T-Latch circuit. T-latch is the basis of many circuit in arithmetic logic unit (ALU). The proposed structure for T-latch has a lower number of cells, occupied area and lower power consumption than existing methods. In the proposed T-Latch, compared to previous best designs, 6.45% cross section area and 44.49% power consumption were reduced. Also in this paper, for the first time a T-latch with reset terminal and a T-Latch with both set and reset terminals were designed. In addition, using the proposed T-latch, a 3-bit bidirectional up-down counter which consists of 204 quantum cells, 0.26 µm2 cross-sectional area, delay of 5.25 clock cycles, a three-bit up-down counter with a reset pin and a three-bit up-down counter with set and reset terminals were made. The proposed up-down circuits are designed for the first time in QCA technology. All the design and simulation results are done in QCADesigner software.

...read moreread less

Posted Content•

A Primer for Neural Arithmetic Logic Modules.

[...]

Bhumika Mistry¹, Katayoun Farrahi¹, Jonathon S. Hare¹•Institutions (1)

University of Southampton¹

23 Jan 2021-arXiv: Neural and Evolutionary Computing

TL;DR: Neural arithmetic logic units (NALU) as mentioned in this paper have become a growing area of interest, though remain a niche field, and have been extensively studied in the literature. But their performance has not yet reached the state of the art.

...read moreread less

Abstract: Neural Arithmetic Logic Modules have become a growing area of interest, though remain a niche field. These units are small neural networks which aim to achieve systematic generalisation in learning arithmetic operations such as {+, -, *, \} while also being interpretive in their weights. This paper is the first in discussing the current state of progress of this field, explaining key works, starting with the Neural Arithmetic Logic Unit (NALU). Focusing on the shortcomings of NALU, we provide an in-depth analysis to reason about design choices of recent units. A cross-comparison between units is made on experiment setups and findings, where we highlight inconsistencies in a fundamental experiment causing the inability to directly compare across papers. We finish by providing a novel discussion of existing applications for NALU and research directions requiring further exploration.

...read moreread less

Book Chapter•DOI•

PDP Analysis of CNTFET Full Adders for Single and Multiple Threshold Voltages

[...]

M. Elangovan¹, R. Ranjith¹, S. Devika¹•Institutions (1)

Government College¹

01 Jan 2021

TL;DR: Different types of full adders are implemented usingCNTFET and their power delay product (PDP) is analysed for single and multiple threshold voltages of CNTFET, and the low and high PDP of fullAdders are identified.

...read moreread less

Abstract: Adder is a basic building block of the arithmetic logic unit (ALU). Designing of optimized adder circuit inherently makes a pavement for obtaining optimized ALU design. The implementation of metal–oxide–semiconductor field-effect transistor (MOSFET)-based very large-scale integration (VLSI) circuits in the nanoscale range is reached saturation condition. This is due to the MOSFET that meets significant issues like producing more leakage current and highly dependent on PVT variation during nanoscale fabrication. The carbon nanotube field-effect transistor (CNTFET) can overcome the demerits of MOSFET, and it supports low-power, delay-optimized VLSI circuit design. In this paper, different types of full adders are implemented using CNTFET and their power delay product (PDP) is analysed for single and multiple threshold voltages of CNTFET. From the simulation, the low and high PDP of full adders are identified. The PDP of full adders is optimized by varying the threshold voltage of CNTFET. The simulation is carried out using the HSPICE simulation tool. The Stanford University 32-nm-CNTFET model is used for the simulation.

...read moreread less

Book Chapter•DOI•

Fault-Tolerant Implementation of Quantum Arithmetic and Logical Unit (QALU) Using Clifford+T-Group

[...]

Laxmidhar Biswal¹, Chandan Bandyopadhyay², Sudip Ghosh¹, Hafizur Rahaman¹•Institutions (2)

VLSI Technology¹, Indian Institute of Engineering Science and Technology, Shibpur²

01 Jan 2021

TL;DR: In this article, the design of an important QIP module, i.e., arithmetic logic unit (ALU), has been shown. And the entire design has been made on top of quantum Clifford+T-group.

...read moreread less

Abstract: The quest of efficient quantum circuit is to achieve quantum supremacy in theory as well as in practice. The foremost obstacle is to protect the cohesive time of extremely fragile quantum states from inherent noise. To address this issue, Quantum Error Correction Code (QECC) with fault-tolerant quantum circuit is most desirable. Aiming to contribute toward designing an efficient Quantum Information Processor (QIP); in this work, we have shown the design of an important QIP module, i.e., Arithmetic Logic Unit (ALU). The entire design has been made on top of quantum Clifford+T-group. In the design phase, initially, we formulate a 1-bit design and then to make a generalized representation of the ALU, multiple smaller modules have been integrated. For ensuring improved features in this component, the design has been made fault-tolerant, circuit optimization rules are executed to minimize the design metrics and parallelism in high latency T-gate is ensured. In a way to check the functional correctness of our proposed design, several logical operations have been successfully tested over it.

...read moreread less

Energy Efficient Arithmetic Logic Unit (ALU) Based On Dynamic Voltage and Frequency Scaling (DVFS)

[...]

Shahrul Izwan Kamal Akhball, Hasliza Hassan

19 May 2021

TL;DR: In this paper, a design of 8-bit arithmetic logic unit (ALU) with eight different operations is presented, which is the most crucial parts in digital computer which is designed to compute all the arithmetic and logic operations, including decoding operations.

...read moreread less

Abstract: This project presents a design of 8-bit Arithmetic Logic Unit (ALU) with eight different operations. ALU is the most crucial parts in digital computer which is designed to compute all the arithmetic and logic operations, including decoding operations that need to be done for almost any data that is being processed by the central processing unit (CPU). In the applications of digital circuits, there are some important attributes that need to be considered such as maximizing speed and minimizing power consumption. Higher power consumption results in more heat dissipation, higher cooling cost and make the system more prone to failures and malfunctions. However, low power consumption literally can reduce problems related to heat and also reduced the performance of the system. Therefore, this project will cover in designing 8-bit ALU with eight operations by using Intel Quartus Prime Development Suite. Next, determining optimize voltage and frequency by using Maplesoft Maple. Verifying the functionality and performance of ALU that implement DVFS technique and optimizing the performance of ALU in term of frequency, timing, and power by maximize power saving. This project will focus on three conditions which are high performance, optimized, and low performance test. The integrations of all sub-modules will create an efficient and effective solutions for ALU design. The generated graph from Maple proves the DVFS technique. DVFS technique improved by 25% from the conventional technique in term of timing performance.

...read moreread less

Patent•

Direct memory access (DMA) controller, device and method using a write control module for reorganization of storage addresses in a shared local address space

[...]

Durand Yves¹, Bernard Christian•Institutions (1)

Commissariat à l'énergie atomique et aux énergies alternatives¹

02 Feb 2021

TL;DR: In this paper, a direct memory access controller, configured to be used in a computing node of a system on chip (SoC), includes: (1) an input buffer for receiving packets of data coming from an input/output interface of the computing node; (2) a write control module for controlling writing of data extracted from each packet to a local memory of the node shared by at least one processor other than the direct memory Access Controller.

...read moreread less

Abstract: A direct memory access controller, configured to be used in a computing node of a system on chip (SoC), includes: (1) an input buffer for receiving packets of data coming from an input/output interface of the computing node; (2) a write control module for controlling writing of data extracted from each packet to a local memory of the computing node shared by at least one processor other than the direct memory access controller; and (3) an arithmetic logic unit for executing microprograms. The write control module is configured to control the execution by the arithmetic logic unit of at least one microprogram including instruction lines for arithmetic and/or logical calculation concerning only storage addresses for storing the data received by the input buffer for a reorganization of the data in the shared local memory. Optionally, at least one microprogram may be stored in a register, and at least two operating modes (e.g., restart mode and pause mode) of the at least one microprogram stored in the register may be configurable. Exemplary microprograms can (1) provide image processing parameters including sizes of columns of image blocks, (2) provide image processing parameters including numbers of successive pieces of data to be processed which are to be written to successive addresses in the shared local memory, and (3) utilize a sequential write mode and/or an absolute-offset write mode. Microprograms may be selected based on an identifier included in a header of each packet.

...read moreread less

Patent•

Power reduction in processor pipeline by detecting zeros

[...]

Bshara Nafea¹, Diamant Ron, Huang Randy Renfu, Saidi Ali Ghassan•Institutions (1)

Amazon.com¹

26 Jan 2021

TL;DR: In this article, power reduction in a computer processor based on detection of whether data destined for input to an arithmetic logic unit (ALU) has a particular value has been discussed, where data is written to a register prior to performing an arithmetic or logical operation using the data as an operand.

...read moreread less

Abstract: Techniques are described for power reduction in a computer processor based on detection of whether data destined for input to an arithmetic logic unit (ALU) has a particular value. The data is written to a register prior to performing an arithmetic or logical operation using the data as an operand. Depending on a timing of when the data is supplied to the register, the determination is made before or after the data is written to the register, and a memory associated with the register is updated with a result of the determination. Contents of the memory are used to make a decision whether to allow the ALU to perform the arithmetic or logical operation. The memory can be implemented as a non-architectural register.

...read moreread less

Patent•

Container measurement system

[...]

Kyu Shingun, Hoso Yukihiro, Fujiwara Sho

01 Apr 2021

TL;DR: In this article, a work machine that loads a container and is capable of acquiring a depth map of the container is used to calculate the position of a flat section constituting the container.

...read moreread less

Abstract: This container measurement system comprises a depth map acquisition unit and an arithmetic logic unit. The depth map acquisition unit is provided in a work machine that loads a container and is capable of acquiring a depth map of the container. The arithmetic logic unit processes the depth map of the container acquired by the depth map acquisition unit. The arithmetic logic unit calculates, on the basis of the container depth map, the three-dimensional position of a flat section constituting the container. The arithmetic logic unit calculates, on the basis of the three-dimensional position of the flat section, three-dimensional information containing the three-dimensional position and the three-dimensional shape of the container.

...read moreread less

Proceedings Article•DOI•

A Quasi-Synchronous Sampling Energy Metering Chip with a Dedicated Dual-Core DSP

[...]

Xiao Xiaohui¹, Nianxiong Tan¹, Cao Jie, Du Zhaosheng•Institutions (1)

Zhejiang University¹

12 Mar 2021

TL;DR: In this paper, a dual-core digital signal processor (DSP) with a primary core and a secondary core sharing the same data memory and arithmetic logic unit (ALU) was designed to enable harmonic analysis in smart meters.

...read moreread less

Abstract: To enable harmonic analysis in smart meters, we design an energy metering chip that supports the quasi-synchronous sampling. The quasi-synchronous sampling is realized by the Newton polynomial interpolation algorithm. To implement the overall metering algorithm, we design a dedicated dual-core digital signal processor (DSP) with a primary core and a secondary core sharing the same data memory and arithmetic logic unit (ALU). The DSP also has a special data memory architecture with a dynamic shifting and virtual address mechanism, which simplifies the control in the DSP program. The quasi-synchronous sampling technique can achieve an accuracy of 0.3% in harmonic metering, meeting the requirement of single-phase smart meters.

...read moreread less

Journal Article•DOI•

Design of Reversible Arithmetic Logic Unit with Built-In Testability

[...]

Katherine Harrison¹, Ahmet Börütecene¹, Jonas Löwgren¹, Desirée Enlund¹, Rasmus Ringdahl¹, Vangelis Angelakis¹ - Show less +2 more•Institutions (1)

Linköping University¹

01 Sep 2021-IEEE Technology and Society Magazine

TL;DR: The challenge of how cities can be designed and developed in an inclusive and sustainable direction is monumental as discussed by the authors, but the impact of such solutions will be significantly reduced without long-term, widespread adoption by citizens.

...read moreread less

Abstract: The challenge of how cities can be designed and developed in an inclusive and sustainable direction is monumental. Smart city technologies currently offer the most promising solution for long-term sustainability, but the impact of such solutions will be significantly reduced without long-term, widespread adoption by citizens.

...read moreread less

Journal Article•DOI•

High-Speed FPGA Implementation of SIKE Based on an Ultra-Low-Latency Modular Multiplier

[...]

Jing Tian¹, Bo Wu¹, Zhongfeng Wang¹•Institutions (1)

Nanjing University¹

29 Jul 2021-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: In this paper, an extremely low-latency modular multiplier is devised based on a modified algorithm by fully parallelizing and highly optimizing the small-size multipliers and the reduction submodules.

...read moreread less

Abstract: The supersingular isogeny key encapsulation (SIKE) protocol, as one of the post-quantum protocol candidates, is widely regarded as the best alternative for curve-based cryptography. However, the long latency, caused by the serial large-degree isogeny computation which is dominated by modular multiplications, has made it less competitive than most popular post-quantum candidates. In this paper, we propose a high-speed and low-latency architecture for our recently presented optimized SIKE algorithm. Firstly, we design a new field arithmetic logic unit (FALU) with many algorithmic transformations and architectural optimizations. Especially, for the FALU, an extremely low-latency modular multiplier is devised based on a modified algorithm by fully parallelizing and highly optimizing the small-size multipliers and the reduction submodules. Secondly, we develop a compact control logic and update the instructions based on the benchmark provided in the newest SIKE library, fitting well with our design. Thirdly, an efficient memory access method is proposed by scheduling the input and output of the arithmetic logic unit (ALU) in two identical RAMs, which can significantly reduce the latency. Finally, we code the proposed architectures using the Verilog language and integrate them into the SIKE library. The implementation results on a Xilinx Virtex-7 FPGA show that for SIKEp751, our design only costs 9.3 $ms$ with a frequency of 155.8 MHz, about $2\times$ faster than the state-of-the-art, and achieves the best area efficiency among existing works. Particularly, the modular multiplier merely needs 16 clock cycles, reducing the delay by nearly one order of magnitude with a small factor of increase in hardware resource.

...read moreread less

Book Chapter•DOI•

Evaluation of Double Precision Dual-Rail Asynchronous IEEE 754 Intermediate Product Shifter

[...]

Sudhakar Jyothula, K. Sushma

05 Mar 2021

TL;DR: In this paper, an intermediate product (IP) shifter was designed and implemented using various approaches such as CMOS logic and clock-less techniques, such as Multi-Threshold Null Convention Logic (MTNCL) and proposed multi-threshold dual-spacer dual-rail delay-insensitive logic (MTD3L), which shifts 1-bit operand to right side.

...read moreread less

Abstract: A floating point multiplier (FPM) is one of the building block for various appliances such as arithmetic logic unit (ALU), digital signal processor (DSP), and computational dynamic range applications. The most usable standard to represent FPM is Institute of Electrical and Electronics Engineers (IEEE)—754, which segregated into three fields—Sign, exponent and mantissa field. The operation of FPM consists three stages—pre-normalization, multiplication, and post-normalization process. The normalization process is utmost important progress for any floating point computations. Thus, this paper deals with post-normalization process of 32-bit and 64-bit FPM by using intermediate product (IP) shifter design, which shifts 1-bit operand to right side. For single precision and double precision FPM, we desire 47-bit and 105-bit IP shifter using 2:1 multiplexers. The IP shifter is designed and implemented using various approaches such as CMOS logic and clock-less techniques—Multi-Threshold Null Convention Logic (MTNCL) and proposed Multi-Threshold Dual-Spacer Dual-Rail Delay-Insensitive Logic (MTD3L). The IP shifter is designed in gate level by using mentor graphics EDA tools with 130 nm technology, and the proposed technique is compared with existing approaches in terms of power dissipation, delay, and power-delay product (PDP) constraints.

...read moreread less

Patent•

Processor memory access

[...]

Maalej Khaled, Nguyen Trung Dung, Schmitt Julien, Bernard Pierre-Emmanuel

02 Apr 2021

TL;DR: In this article, an IT device consisting of a plurality of ALUs (9), a set of registers (11), a memory (13), and a control unit (5) controlling the ALUs is described.

...read moreread less

Abstract: The invention relates to an IT device comprising: a plurality of ALUs (9); a set of registers (11); a memory (13); a memory interface between the registers (11) and the memory (13); a control unit (5) controlling the ALUs (9), generating: at least one cycle i including both the implementation of at least one first calculation by an arithmetic logic unit (9) and the downloading of a first data set (AA4_7; BB4_7) from the memory (13) to at least one register (11); and at least one cycle iI, subsequent to the at least one cycle i, including the implementation of a second calculation by an arithmetic logic unit (9), for which second calculation part (A4; B4) at least of the first data set (AA4_7; BB4_7) forms at least one operand.

...read moreread less

Patent•

Arithmetic logic unit with normal and accelerated performance modes using differing numbers of computational circuits

[...]

Debabrata Mohapatra¹, Perry Wang, Xiang Zou, Kim Sang Kyun, Deepak A. Mathaikutty, Gautham N. Chinya - Show less +2 more•Institutions (1)

Intel¹

18 May 2021

TL;DR: In this paper, a processor includes a front end including circuitry to decode a first instruction to set a performance register for an execution unit and a second instruction, and an allocator with circuitry to assign the second instruction to the execution unit.

...read moreread less

Abstract: A processor includes a front end including circuitry to decode a first instruction to set a performance register for an execution unit and a second instruction, and an allocator including circuitry to assign the second instruction to the execution unit to execute the second instruction. The execution unit includes circuitry to select between a normal computation and an accelerated computation based on a mode field of the performance register, perform the selected computation, and select between a normal result associated with the normal computation and an accelerated result associated with the accelerated computation based on the mode field.

...read moreread less

Patent•

Static shared memory access with one piece of input data to be reused for successive execution of one instruction in a reconfigurable parallel processor

[...]

Li Yuan, Zhu Jianbin

23 Mar 2021

TL;DR: In this article, a processor may comprise a plurality of processing elements (PEs) that each may comprise an arithmetic logic unit (ALU), a data buffer associated with the ALU, and an indicator associated with data buffer to indicate whether a piece of data inside the data buffer is to be reused for repeated execution of a same instruction as a pipeline stage.

...read moreread less

Abstract: Processors, systems and methods are provided for thread level parallel processing. A processor may comprise a plurality of processing elements (PEs) that each may comprise an arithmetic logic unit (ALU), a data buffer associated with the ALU, and an indicator associated with the data buffer to indicate whether a piece of data inside the data buffer is to be reused for repeated execution of a same instruction as a pipeline stage.

...read moreread less