Showing papers on "Arithmetic logic unit published in 2018"

PDF

Open Access

Proceedings Article•

[...]

Andrew Trask¹, Felix Hill², Scott Reed², Jack W. Rae², Chris Dyer³, Phil Blunsom² - Show less +2 more•Institutions (3)

University of Oxford¹, Google², Carnegie Mellon University³

03 Dec 2018

TL;DR: In this article, a neural arithmetic logic unit (NALU) is proposed to learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images.

...read moreread less

Abstract: Neural networks can learn to represent and manipulate numerical information, but they seldom generalize well outside of the range of numerical values encountered during training. To encourage more systematic numerical extrapolation, we propose an architecture that represents numerical quantities as linear activations which are manipulated using primitive arithmetic operators, controlled by learned gates. We call this module a neural arithmetic logic unit (NALU), by analogy to the arithmetic logic unit in traditional processors. Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images. In contrast to conventional architectures, we obtain substantially better generalization both inside and outside of the range of numerical values encountered during training, often extrapolating orders of magnitude beyond trained numerical ranges.

...read moreread less

140 citations

Posted Content•

Neural Arithmetic Logic Units

[...]

Andrew Trask¹, Felix Hill², Scott Reed², Jack W. Rae², Chris Dyer³, Phil Blunsom² - Show less +2 more•Institutions (3)

University of Oxford¹, Google², Carnegie Mellon University³

01 Aug 2018-arXiv: Neural and Evolutionary Computing

TL;DR: Experiments show that NALU-enhanced neural networks can learn to track time, perform arithmetic over images of numbers, translate numerical language into real-valued scalars, execute computer code, and count objects in images.

...read moreread less

74 citations

Proceedings Article•DOI•

Overview and Comparative Performance Analysis of Various Full Adder Cells in 90 nm Technology

[...]

Mehedi Hasan¹, Md. Jobayer Hossein¹, Uttam Kumar Saha¹, Shahariar Tarif¹•Institutions (1)

North South University¹

01 Dec 2018

TL;DR: An overview and comparative analysis of different sorts of full adder cells is presented and it was observed that Complementary Pass Logic adder cell had the least delay and 14-Transistor addercell displayed the worst delay.

...read moreread less

Abstract: Full Adders considered as the major component in Arithmetic Logic Unit of digital signal processing chips and microprocessors gained much interest among researchers over the years. As a result, various adder cells have been developed by employing various logic styles. Therefore, it is necessary to have a complete overview of the various sorts of full adders to Figure out their performance level. This paper presents an overview and comparative analysis of different sorts of full adder cells. Analysis and simulation on Complementary Pass Logic, Complementary CMOS, Transmission Gate, Transmission Function, 24 Transistor, 14 Transistor, 10 Transistor, Swing Restored Complementary Pass Logic, Hybrid Pass Static CMOS, Double Pass Transistor and some Hybrid full adder cells have been conducted using Cadence Virtuoso Tools in 90nm technology. It was observed that Complementary Pass Logic adder cell had the least delay and 14-Transistor adder cell displayed the worst delay. Hybrid Pass Static CMOS full adder had the least power consumption whereas 24 Transistor full adder consumed highest power. Highest power-delay product was displayed by Double Pass Transistor Logic adder cell.

...read moreread less

27 citations

Journal Article•DOI•

Design of Improved Arithmetic Logic Unit in Quantum-Dot Cellular Automata

[...]

Saeed Rasouli Heikalabad¹, Mahya Rahimpour Gadim¹•Institutions (1)

Islamic Azad University¹

08 Mar 2018-International Journal of Theoretical Physics

TL;DR: Evaluation results show that the proposed design of improved single-bit arithmetic logic unit in quantum dot cellular automata has best performance in terms of area, complexity and delay compared to the previous designs.

...read moreread less

Abstract: The quantum-dot cellular automata (QCA) can be replaced to overcome the limitation of CMOS technology. An arithmetic logic unit (ALU) is a basic structure of any computer devices. In this paper, design of improved single-bit arithmetic logic unit in quantum dot cellular automata is presented. The proposed structure for ALU has AND, OR, XOR and ADD operations. A unique 2:1 multiplexer, an ultra-efficient two-input XOR and a low complexity full adder are used in the proposed structure. Also, an extended design of this structure is provided for two-bit ALU in this paper. The proposed structure of ALU is simulated by QCADesigner and simulation result is evaluated. Evaluation results show that the proposed design has best performance in terms of area, complexity and delay compared to the previous designs.

...read moreread less

26 citations

Journal Article•DOI•

Logic Design of a 16-bit Bit-Slice Arithmetic Logic Unit for 32-/64-bit RSFQ Microprocessors

[...]

Guang-Ming Tang¹, Pei-Yao Qu¹, Xiaochun Ye¹, Dongrui Fan¹•Institutions (1)

Chinese Academy of Sciences¹

31 Jan 2018-IEEE Transactions on Applied Superconductivity

TL;DR: A 16-bit bit-slice arithmetic logic unit (ALU) is proposed for 32-/64-bit rapid single-flux-quantum microprocessors based on a Ladner–Fischer adder, which can be used for any 16 n-bit processing.

...read moreread less

Abstract: A 16-bit bit-slice arithmetic logic unit (ALU) is proposed for 32-/64-bit rapid single-flux-quantum microprocessors. It is based on a Ladner-Fischer adder. The ALU covers all of the ALU operations for MIPS32 instructions set. And each of the two 64-bit operands is divided into four slices of 16 bits each. The ALU uses synchronous concurrent-flow clocking and consists of 11 pipeline stages. The proposed ALU can be used for any 16n-bit processing.

...read moreread less

23 citations

Journal Article•DOI•

Modified Binary Multiplier Architecture to Achieve Reduced Latency and Hardware Utilization

[...]

Geetam Singh Tomar, Marcus L. George¹•Institutions (1)

University of the West Indies¹

01 Feb 2018-Wireless Personal Communications

TL;DR: The results of simulation indicate that the latency of the proposed novel binary multiplier systems (8-bit, 16-bit and 24-bit) with significantly shorter than existing implementations.

...read moreread less

Abstract: Arithmetic Logic Units (ALUs) are very important components of the processor, which performs various arithmetic and logical operations such as multiplication, division, addition, subtraction, cubing, squaring, etc. Of these all operations, multiplication is most elementary and most frequently used operation in the ALUs. The operation of multiplication also forms the basis of many other complex arithmetic operations such as cubing, squaring, convolution, etc. This paper presents the modified novel multi-precision binary multiplier architecture to achieve a reduced latency/delay and area/hardware utilization along with existing implementations of binary multiplication. This system will function as second stage of the of a novel multi-precision binary multiplier system. The system was implemented using Xilinx 14.2 ISE and simulated with ISIM which was available from Xilinx 14.2 ISE. The results of simulation indicate that the latency of the proposed novel binary multiplier systems (8-bit, 16-bit and 24-bit) with significantly shorter than existing implementations.

...read moreread less

16 citations

Journal Article•DOI•

Design of Programmable Analog Calculation Unit by Implementing Support Vector Regression for Approximate Computing

[...]

Renyuan Zhang¹, Noriyuki Uetake¹, Takashi Nakada¹, Yasuhiko Nakashima¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Nov 2018-IEEE Micro

TL;DR: The performances over energy, flexibility, and hardware efficiency of the proposed ACU are superior to a basic four-bit digital arithmetic logic unit and look-up table based architectures.

...read moreread less

Abstract: In this work, we design a programmable analog calculation unit (ACU) for approximately computing arbitrary functions with two operands. By implementing an efficient scheme of support vector regression, the target functions are retrieved by very large scale integrated circuits in one clock cycle with only 600 transistors. A set of dynamically tunable analog circuits are designed for generating various features of Gaussian kernel functions. By mixing these kernel functions, any specific complex function is computed by the regression. The ACU is designed and simulated in a standard CMOS technology for proof-of-concept. From the circuit simulation results, the proposed ACU calculates all the target functions with the average error less than 1.7%. The performances over energy, flexibility, and hardware efficiency of the proposed ACU are superior to a basic four-bit digital arithmetic logic unit and look-up table based architectures. The robustness against temperature and process variations is also presented with acceptable fluctuations on calculating results. To conveniently integrate the proposed ACUs into ordinary digital systems, we also design the compact memory circuits, which offer dual-mode (analog and binary) data storage/access.

...read moreread less

13 citations

Journal Article•DOI•

QCA Gray Code Converter Circuits Using LTEx Methodology

[...]

Chiradeep Mukherjee¹, Saradindu Panda², Asish Kumar Mukhopadhyay³, Bansibadan Maji¹•Institutions (3)

National Institute of Technology, Durgapur¹, Narula Institute of Technology², American Hotel & Lodging Educational Institute³

20 Apr 2018-International Journal of Theoretical Physics

TL;DR: The novel formulations exploiting the operability of the LTEx module have been proposed to instantiate area-delay efficient B2G and G2B Converters which can be exclusively used in Gray Code Addressing schemes.

...read moreread less

Abstract: The Quantum-dot Cellular Automata (QCA) is the prominent paradigm of nanotechnology considered to continue the computation at deep sub-micron regime. The QCA realizations of several multilevel circuit of arithmetic logic unit have been introduced in the recent years. However, as high fan-in Binary to Gray (B2G) and Gray to Binary (G2B) Converters exist in the processor based architecture, no attention has been paid towards the QCA instantiation of the Gray Code Converters which are anticipated to be used in 8-bit, 16-bit, 32-bit or even more bit addressable machines of Gray Code Addressing schemes. In this work the two-input Layered T module is presented to exploit the operation of an Exclusive-OR Gate (namely LTEx module) as an elemental block. The “defect-tolerant analysis” of the two-input LTEx module has been analyzed to establish the scalability and reproducibility of the LTEx module in the complex circuits. The novel formulations exploiting the operability of the LTEx module have been proposed to instantiate area-delay efficient B2G and G2B Converters which can be exclusively used in Gray Code Addressing schemes. Moreover this work formulates the QCA design metrics such as O-Cost, Effective area, Delay and Cost α for the n-bit converter layouts.

...read moreread less

12 citations

Proceedings Article•DOI•

Performance Analysis of Parallel Prefix Adder for Datapath Vlsi Design

[...]

K. C. Shilpa¹, M. Shwetha¹, B.C. Geetha¹, D. M. Lohitha¹, Navya¹, N.V. Pramod¹ - Show less +2 more•Institutions (1)

Dr. Ambedkar Institute of Technology¹

20 Apr 2018

TL;DR: This paper proposed four type of Parallel prefix adder (PPA) like Sklansky adder, Kogge-StoneAdder, Brent-Kung adder and Ladner-Fischer adder which are suited for binary addition with wide word.

...read moreread less

Abstract: All modern processor, including microprocessor, digital signal processor contain Arithmetic Logic Unit (ALU). The computing efficiency of these modern processor mainly depended on efficiency of ALU. An adder is the basic building block for an ALU which performs arithmetic as well as logic operations. The existing adders like half adder, full adder, ripple carry adder, carry skip adder and carry lookahead adders cannot meet the expected optimization goals, so in this paper proposed four type of Parallel prefix adder (PPA) like Sklansky adder, Kogge-Stone adder, Brent-Kung adder and Ladner-Fischer adder. Parallel prefix adder [PPA] are kind of adder that uses prefix operation in order to do efficient addition. These adders are suited for binary addition with wide word. The Parallel prefix adders are derived from the carry look ahead adder. The performance analysis of PPA considered on area, delay and power consumption and simulation are carried out for 8 bit input data width.

...read moreread less

11 citations

Journal Article•DOI•

BCB Evaluation of High-Performance and Low-Leakage Three-Independent-Gate Field-Effect Transistors

[...]

Jorge Romero-Gonzalez¹, Pierre-Emmanuel Gaillardon¹•Institutions (1)

University of Utah¹

09 Apr 2018-IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

TL;DR: The TIGFET technology has been benchmarked against several beyond-CMOS devices and it is shown that the standby energy of the 32-bit adder decreased by two orders of magnitude compared with CMOS HP and a decrease of at least one order of magnitudeCompared withCMOS low-voltage.

...read moreread less

Abstract: Three-independent-gate field-effect transistors (TIGFETs) are a promising next-generation device technology. Their controllable-polarity capability allows for superior design of arithmetic and sequential logic gates. In this paper, the TIGFET technology has been benchmarked against several beyond-CMOS devices. The benchmarking techniques followed a similar approach used by the Nanoelectronic Research Initiative Group. The performance of the 32-bit adder and the 32-bit arithmetic logic unit (ALU) was investigated using the advanced 15-nm technology node. The TIGFET devices were shown to achieve the best energy-delay product (EDP) compared with all other beyond-CMOS devices for the 32-bit adder and competitive EDP for the 32-bit ALU. In particular, TIGFETs have 3.83 times and 1.54 times lower EDP than CMOS high-performance (HP) for the 32-bit adder and the 32-bit ALU, respectively. In addition, TIGFETs were shown to have a similar throughput for the 32-bit ALU compared with CMOS HP. Finally, due to TIGFETs’ ultralow leakage current and unique circuit designs, our results show that the standby energy of the 32-bit adder decreased by two orders of magnitude compared with CMOS HP and a decrease of at least one order of magnitude compared with CMOS low-voltage.

...read moreread less

11 citations

Journal Article•DOI•

Resource Utilization Optimization with Design Alternatives in FPGA based Arithmetic Logic Unit Architectures

[...]

Rakhi Nangia¹, Neeraj Kr. Shukla¹•Institutions (1)

ITM University, Gurgaon, Haryana¹

01 Jan 2018-Procedia Computer Science

TL;DR: The idea is resource sharing and functionality sharing technique to design an ALU that leads to a significant saving of resources and a significant reduction in hardware requirement.

...read moreread less

Journal Article•DOI•

Complementary Logic Implementation for Antiferromagnet Field-Effect Transistors

[...]

Chenyun Pan¹, Azad Naeemi²•Institutions (2)

University of Kansas¹, Georgia Institute of Technology²

30 Oct 2018-IEEE Journal on Exploratory Solid-State Computational Devices and Circuits

TL;DR: In this paper, a compact and complementary logic implementation is proposed for antiferromagnet field-effect transistor (AFMFET) devices that enables a complete set of Boolean operations based on complementary logic as well as majority-gate logic.

...read moreread less

Abstract: In this paper, a compact and complementary logic implementation is proposed for antiferromagnet field-effect transistor (AFMFET) devices. The implementation enables a complete set of Boolean operations based on complementary logic as well as majority-gate logic. The impacts of several key device-level design parameters are investigated, such as the channel resistance and critical switching voltage, and their optimal values that minimize the overall energy-delay product (EDP) of a 32-bit arithmetic logic unit are quantified. In addition, it is shown that one can potentially take advantage of the large domain size of some AFM materials such as chromium and build a compact majority-gate-based logic. The potential performance benefits of the majority-gate-based logic are also quantified. Compared to the conventional CMOS logic circuit, the one with AFMFET devices using majority gates can potentially achieve $10\times $ improvement in terms of the EDP.

...read moreread less

Proceedings Article•DOI•

Design of 8 bit Reconfigurable ALU Using Quantum Dot Cellular Automata

[...]

K. Pandiammal¹, D. Meganathan²•Institutions (2)

Jerusalem College of Engineering, Chennai¹, Madras Institute of Technology²

01 Oct 2018

TL;DR: The 8-bit QCA-based Reconfigurable Arithmetic Logic Unit (ALU) is proposed using clock zone based crossover (CZBCO) to perform four arithmetic and logical operations such as binary addition, logical AND, OR and EXOR.

...read moreread less

Abstract: Quantum-dot cellular automata (QCA) is a new computational paradigm to design digital circuits at nano-scale. The 8-bit QCA-based Reconfigurable Arithmetic Logic Unit (ALU) is proposed using clock zone based crossover (CZBCO). The ALU unit is designed to perform four arithmetic and logical operations such as binary addition, logical AND, OR and EXOR. The MGs used in EXOR operation are reused for logical AND &OR operations and reduces two MGs in 1-bit ALU design. The proposed 1-bit ALU reduces energy dissipation by 54.5% and minimizes the QCA cells utilization by 43.5% when compared to existing works. Therefore the proposed 8-bit reconfigurable ALU reduces 16-MGs, nullifies design complexity and is implemented on single layer using CZBCO. Hence, the proposed reconfigurable ALU achieves less cell count and power dissipation compared to the existing work.

...read moreread less

Proceedings Article•DOI•

Arithmetic Logic Unit Using Diode Free Adiabatic Logic and Selection Unit for Adiabatic Logic Family

[...]

Vasudev Grover¹, Vishwas Gosain¹, Neeta Pandey¹, Kirti Gupta²•Institutions (2)

Delhi Technological University¹, Bharati Vidyapeeth's College of Engineering²

01 Feb 2018

TL;DR: A 1-bit ALU circuit has been designed in diode free adiabatic logic methodology which has much superior performance than CMOS circuits considering power dissipation and power delay product.

...read moreread less

Abstract: Reduction in energy dissipation is an active area of research. Systems which consume power require the deployment of expensive cooling systems. In this paper a 1-bit ALU circuit has been designed in diode free adiabatic logic methodology which has much superior performance than CMOS circuits considering power dissipation and power delay product. An apt selection unit for adiabatic logic circuits has also been proposed. Two types of multiplexer circuits are designed, first by DFAL method, and second using transmission gates. The performance of DFAL based circuits connected to both types of multiplexers is observed. The performance of the circuits is verified through simulations in Tanner EDA simulator using 180nm CMOS TSMC parameters, as these are the achievable parameters in India for fabrication. Higher energy efficiency is achieved in the proposed circuit.

...read moreread less

Proceedings Article•DOI•

Division circuit using reversible logic gates

[...]

Ismail Gassoumi, Lamjed Touil, Bouraoui Ouni

22 Mar 2018

TL;DR: The proposed design of division block is based on reversible gates with reduction of garbage outputs, constant inputs, quantum cost and hardware complexity and demonstrates that the proposed solution have less performance and significantly better scalability than the current designs.

...read moreread less

Abstract: In the recent years, reversible approach is becoming widely used in many domains, such as quantum computing, optical computing and ultra-low power VLSI circuit. Division has its application in the design of reversible Arithmetic Logic Unit (ALU). In this paper, we have exhibited a novel design of division sequential circuit using reversible logic gates. The proposed design of division block is based on reversible gates with reduction of garbage outputs, constant inputs, quantum cost and hardware complexity. The comparative results demonstrate that the proposed solution have less performance and significantly better scalability than the current designs.

...read moreread less

Proceedings Article•DOI•

Comparison of Various Adders and their VLSI Implementation

[...]

Shubham Sarkar¹, Sujan Sarkar¹, Jishan Mehedi¹•Institutions (1)

Jalpaiguri Government Engineering College¹

01 Jan 2018

TL;DR: A comparative study on various parallel adders and proposes a hybrid adder, which is of prime importance that the authors modify the adder in order to fetch maximum efficiency regarding - Propagation delay, Area on Chip and Power Consumption.

...read moreread less

Abstract: It is profoundly accepted that the main processing unit of any device capable of carrying out computations is the Central Processing Unit (CPU) and the one of the most fundamental and integral part of CPU is an Arithmetic and Logical Unit (ALU) and adders are the primary and indispensable component of Arithmetic Logic Unit (ALU). The ALU is primarily responsible for carrying out the logical operation, arithmetic operations etc. Adders are also very important for Digital Signal Processing (DSP) for filter designing. Nowadays it has become very important to speed up all devices and make them more power efficient for lack of storage of huge amount of power and small in size for mobility. As adders are the main part of all these, it is of prime importance that we modify the adder in order to fetch maximum efficiency regarding - Propagation delay, Area on Chip and Power Consumption. Various adders have been invented so far which specializes in various work platform and they are efficient in their ways. This paper consists a comparative study on various parallel adders and proposes a hybrid adder. All the results of the Adders are carried out in Xilinx 14.7 ISE environment and coded using Verilog HDL. Specific graph and table of the values are given for propagation delay, area, number of transistors required for a better comparison.

...read moreread less

Proceedings Article•DOI•

Low power 4-Bit Arithmetic Logic Unit using Full-Swing GDI technique

[...]

Mahmoud Aymen Ahmed¹, M. A. Abdelghany²•Institutions (2)

Sohag University¹, Minia University²

01 Feb 2018

TL;DR: This paper provides a design of 4-Bit Arithmetic Logic Unit (ALU) using Full-Swing GDI Technique, which considered an effective method for low power digital design while reducing the area of the circuit compared to other logic styles.

...read moreread less

Abstract: Power dissipation and area of the circuit are the main issues in the electronics industry, this paper provides a design of 4-Bit Arithmetic Logic Unit (ALU) using Full-Swing GDI Technique, which considered an effective method for low power digital design while reducing the area of the circuit compared to other logic styles. The proposed ALU design consists of 2×1 Multiplexer, 4×1 Multiplexer and low power Full Adder cell to realize the arithmetic and logic operations. The simulation carried out using Cadence Virtuoso using 65nm TSMC process. The results show that the proposed design consume less power using less number of transistors, while achieving full swing operation compared to previous work.

...read moreread less

Journal Article•DOI•

Reversible Realization of N-bit Arithmetic Circuit for Low Power Loss ALU Applications

[...]

Vandana Shukla¹, O. P. Singh¹, G. R. Mishra¹, R. K. Tiwari²•Institutions (2)

Guru Gobind Singh Indraprastha University¹, Dr. Ram Manohar Lohia Avadh University²

01 Jan 2018-Procedia Computer Science

TL;DR: A novel approach to design n-bit arithmetic circuit with reversible design approach is proposed and is compared with existing designs on the basis of some selected performance parameters to show that proposed design is the most optimized approach for reversible realization of arithmetic circuit for low power Arithmetic logic unit (ALU) circuit applications.

...read moreread less

Proceedings Article•DOI•

Modified CSA-CIA for Reducing Propagation Delay

[...]

Shubham Sarkar¹, Sujan Sarkar¹, Jishan Mehedi¹•Institutions (1)

Jalpaiguri Government Engineering College¹

01 Jan 2018

TL;DR: A comparative study of Carry Save Adder (CSA) and Carry Increment Adder and proposed a hybrid adder circuit to decrease the delay associated with the adder to an optimum level to improve propagation delay improvement.

...read moreread less

Abstract: An adder is a fundamental component of various Very Large-Scale Integration (VLSI) circuits like Central Processing Unit (CPU), Arithmetic Logic Unit (ALU), Memory Access Unit (MAU) etc. A various number of operations can be achieved by adders such as addition, subtraction, multiplication, division, exponentiation etc. The basic circuit of the adder is designed using logic gates. The demand for high-performance VLSI systems are increasing rapidly for use in small and portable devices. The speed related to operation depends upon the delay of the adder as it happens to be one of the most fundamental components of all the computing units and it is a very important parameter for high performance. There have been so many research works on reducing the delay associated with the adder. In this paper, we have done a comparative study of Carry Save Adder (CSA) and Carry Increment Adder (CIA) and proposed a hybrid adder circuit to decrease the delay associated with the adder to an optimum level. As CIA has favorable performance regarding propagation delay and CSA also happens to have good performance in higher bit operations. A simulation study has been carried out for comparative study, the coding has been done using Verilog hardware description language (HDL) and the simulation has been realized with the help of Xilinx ISE 14.7 environment. The result shows the effectiveness of the hybrid circuit proposed for propagation delay improvement.

...read moreread less

Journal Article•DOI•

A Dataflow Processor as the Basis of a Tiled Polymorphic Computing Architecture with Fine-Grain Instruction Migration

[...]

David Hentrich¹, Erdal Oruklu¹, Jafar Saniie¹•Institutions (1)

Illinois Institute of Technology¹

01 Oct 2018-IEEE Transactions on Parallel and Distributed Systems

TL;DR: A dataflow processor is presented that is to be used as the basis of a tiled polymorphic computing architecture with a new scheme that allows a program's instructions to be migrated before and during runtime in a fine-grained manner across the collection of processors.

...read moreread less

Abstract: A dataflow processor is presented that is to be used as the basis of a tiled polymorphic computing architecture. The key contribution is a new scheme that allows a program's instructions to be migrated before and during runtime in a fine-grained manner across the collection of processors. The primary reason to perform this migration is to execute programs faster. Other reasons to perform instruction migration are to prioritize computational resources and to achieve thermal balancing. The act of performing instruction migration across of a collection of processors is logically equivalent to rearranging the computer architecture under the program (i.e., polymorphic computing). Additionally, a new dataflow instruction set which enables the migration is presented. This instruction set is built upon the concept of a single RISC-like dataflow instruction that can atomically execute, make decisions, and independently route results. Furthermore, the novel concept of an operation cell is presented. An operation cell holds a single instruction and its data. It also contains logic to independently determine when an instruction is executed and when to forward data to other operation cells in the collection of processors. In addition, the internal architecture of this processor is presented. This includes the arithmetic logic unit (ALU) that is used to execute the instructions and a series of buses that allow data movement to occur in parallel to instruction execution. Finally, an inter-processor migration strategy is defined.

...read moreread less

Proceedings Article•DOI•

Design and Analysis of Low-Power 16-bit Parallel-Prefix Adiabatic Adders

[...]

Nagesh N. Nazare, R. J. Nayana, Pradeep S. Bhat, B. S. Premananda

01 May 2018

TL;DR: This work provides a comparison in terms of area, power and latency between Adiabatic Ripple-Carry adder (RCA), Kogge-Stone Adder (KSA), and Han-Carlson Adder and their Static CMOS counterparts.

...read moreread less

Abstract: Adders are the elementary components of all the general-purpose microprocessors and signal processing units which include filters, MAC, Arithmetic Logic Unit, … The efficient design of an adder determines the overall performance efficiency of the system. Parallel Prefix Algorithm is one of the proficient way of implementing an adder. The increase in the number of portable devices has increased the need for low power design techniques. Adiabatic Logic is the one of the promising technique to recover and recycle the power back to the source. This work provides a comparison in terms of area, power and latency between Adiabatic Ripple-Carry adder (RCA), Kogge-Stone Adder (KSA) and Han-Carlson Adder (HCA) and their Static CMOS counterparts. CMOS 180 nm technology is used for schematic entry. Functional verification is performed using Cadence Virtuoso - Spectre Simulator. Analysis of the adders shows that Adiabatic KSA has the least latency, whereas Adiabatic HCA provides a good trade-off between latency and power dissipation.

...read moreread less

Proceedings Article•DOI•

Balancing resiliency and energy efficiency of functional units in ultra-low power systems

[...]

Mohammad Saber Golanbari¹, Anteneh Gebregiorgis¹, Elyas Moradi¹, Saman Kiamehr¹, Mehdi B. Tahoori¹ - Show less +1 more•Institutions (1)

Karlsruhe Institute of Technology¹

22 Jan 2018

TL;DR: This paper proposes to partition a functional unit such as Arithmetic Logic Unit (ALU) into multiple smaller and faster functional units and power-gate them whenever they are not used for long time in order to guarantee resilient and energy-efficient system operation.

...read moreread less

Abstract: For applications with stringent power budget, such as ultra low power systems and Internet of the things (IoT), power and energy are the most important constraints. It is shown that when the supply voltage is close to the threshold voltage of transistor, known as near threshold computing (NTC), the energy consumption is at its minimum range. However, by reducing the supply voltage not only the circuit performance decreases significantly, but aggravates various reliability mechanisms. Moreover, the performance variation due to process and runtime variation increases exponentially which makes the traditional margining, to address variability, very inefficient. In this paper, we address energy-efficient countermeasures to combat reliability challenges at NTC in order to guarantee resilient and energy-efficient system operation. We propose to partition a functional unit such as Arithmetic Logic Unit (ALU) into multiple smaller and faster functional units and power-gate them whenever they are not used for long time. Simulation results show that by applying the proposed method the energy efficiency of an ALU can be improved by up to 43.4% at NTC with multiple fold reliability improvements due to timing failures.

...read moreread less

Journal Article•DOI•

A novel three-input approximate XOR gate design based on quantum-dot cellular automata

[...]

Negin Maroufi¹, Davoud Bahrepour¹•Institutions (1)

Islamic Azad University¹

01 Jun 2018-Journal of Computational Electronics

TL;DR: The present study is the first to introduce a novel three-input approximate XOR gate, utilized in designing approximate 4-2 compressors for reducing circuit complexity, which can be achieved by the minimum number of the gates and area-efficient designs.

...read moreread less

Abstract: Quantum-dot cellular automata (QCA) are one of the most promising emerging nanoelectronic paradigms used for designing computers and very large-scale integration circuits. Many applications can tolerate the errors and imprecision of digital systems; thus, approximate computing is widely used in such cases. Exclusive OR (XOR) gates are among the major gates, based on which other gates could be developed. A three-input XOR gate is considered to be a basic gate in designing compressors as one of the most important parts of the arithmetic logic unit. Several studies have focused on the accuracy of XOR gate in the QCA technology. However, using approximation in designing XOR gates remains a significant concern in this regard. To the best of our knowledge, the present study is the first to introduce a novel three-input approximate XOR gate. This gate is utilized in designing approximate 4-2 compressors for reducing circuit complexity, which can be achieved by the minimum number of the gates and area-efficient designs.

...read moreread less

Journal Article•DOI•

Design and Comparison of Wallace Multiplier Based on Symmetric Stacking and High Speed counters

[...]

B.L. Chandrika, Santhosh N S, Amaresha S K

05 Jun 2018

TL;DR: In this article, column compression techniques are used to reduce the number of bits needed to be added to a multiplier to achieve high efficiency. But, the critical path delay is not reduced and also the latency of the multiplier.

...read moreread less

Abstract: High latency and efficient addition of multiple operands is an essential operation in any computational unit. The latency, power efficiency and area of multiplier circuits is of critical importance in the performance of processors. Multiplier circuits are an essential part of an arithmetic logic unit, or a digital signal processor system for performing filtering and convolution. The binary multiplication of integers or fixed-point numbers results in partial products that must be added to produce the final product. The addition of these partial products dominates the latency and power consumption of the multiplier. In order to combine the partial products efficiently, column compression is commonly used. Many methods have been presented to optimize the performance of the partial product summation previously. To achieve higher efficiency, more number of bits need to be reduced at a time. For this, the column compression techniques can be used. By using this, the critical path delay can be reduced and also the latency of the multiplier. When higher compression unit is used the energy of the multiplier is also reduced. In this paper, the column compression techniques are compared. The stacking circuits presented in, show an improvement over algorithmic Wallace multiplier which uses the generic equations for the column compression units. In stacking units, the 7:3 and 6:3 counters are derived from a basic 3-bit stacking circuit reducing usage of XOR gates. This reduces the usage of XOR gates and thus the critical delay. While the algorithmic units use generic equations using the generation and propagation functions of an adder.

...read moreread less

Proceedings Article•DOI•

Implementation of 32-Bit Arithmetic Logic Unit on Xilinx using VHDL

[...]

Subramanya G. Nayak¹•Institutions (1)

Manipal Institute of Technology¹

09 Oct 2018

TL;DR: This work proposes a technique to design and implement a 32 bit ALU which is a digital circuit that performs arithmetic and logical operations on Xilinx ISE using VHDL.

...read moreread less

Abstract: In the present day knowledge, there is an massive requisite of developing appropriate data communication interfaces for real time embedded systems. Field Programmable Gate Array (FPGA) gives various means, which can be programmed for constructing an effective embedded unit. The FPGA configuration is generally specified using a hardware description language (HDL). VHDL (VHSIC hardware description language) is a hardware description language used in electronic design automation to explain digital and mixed-signal structures such as field programmable gate arrays (FPGA) and integrated circuits. This work proposes a technique to design and implement a 32 bit ALU which is a digital circuit that performs arithmetic and logical operations on Xilinx ISE using VHDL.

...read moreread less

Book Chapter•DOI•

Implementation of the standard floating point mac using ieee 754 floating point adder

[...]

R. Prakash Rao¹, N. Dhanunjaya Rao¹, K. Naveen¹, P. Ramya¹•Institutions (1)

Maturi Venkata Subba Rao Engineering College¹

01 Feb 2018

TL;DR: To improve the performance of the traditional fixed point MAC, this work implemented the standard floating point MAC using IEEE 754 floating point adder, which can be used to design all floating point DSP processors through the standard MAC.

...read moreread less

Abstract: There are two types of processors. The first category is arithmetic processors and the second category is signal processing processors. For arithmetic processors in general precision is less where as for the signal processing processors precision is high because the signal is to be reconstructed at the receiving end. In the arithmetic processors the basic building block is Arithmetic Logic Unit (ALU). In the signal processing processors the basic building block is Multiplier Accumulator Content (MAC). For the Standard floating point MAC, more precision is essential compared with the traditional fixed point MAC because the former is to be designed by the IEEE 754 floating point arithmetic and the lateral is designed with fixed-point arithmetic i.e, the precision is limited to integer. But, in general if the precision is more, the accuracy is more. Hence, to improve the performance of the traditional fixed point MAC, in this work we implemented the standard floating point MAC using IEEE 754 floating point adder. This can be used to design all floating point DSP processors through the standard floating point MAC.

...read moreread less

Proceedings Article•DOI•

Design and Implementation of BIST logic for ALU on FPGA

[...]

Jamuna S, Dinesha P, Kp Shashikala, Kishore Kumar K

01 Dec 2018

TL;DR: An attempt is made to illustrate reconfigurable Built-in self-test (BIST) logic which detects faults across the ALU block mapped on the FPGA.

...read moreread less

Abstract: Field programmable gate arrays (FPGAs) are the reconfigurable logic devices which are widely used in many applications like system prototyping, complex computing systems, automotive electronics and mobile devices. FPGAs have become very popular at present because of their features like high logic capacity, reconfigurability and regular structure with less area cost. However, increase in density and complexity also has resulted in more probability of faults. There is a necessity for fault detection to ensure that system gives the output as expected. Arithmetic logic unit is an essential subsystem block found in almost all digital computing systems. Since the main computational tasks are performed by ALU, it has to be reliable. In this paper an attempt is made to illustrate reconfigurable Built-in self-test (BIST) logic which detects faults across the ALU block mapped on the FPGA. BIST is a fault detection technique that allows a circuit to test itself. Proposed work aims at detecting stuck-at faults occurring at the internal blocks of ALU. Test patterns are generated through LFSR and outputs are analyzed using MISR block. Through this proposed work an effort is made to verify the design also using Questasim simulator. The design is been executed using XILINX Vivado IDE. Number of faults detected is 80 out of 98 with code coverage of 94.2% and the total power consumed by the proposed design is 1.06 watts.

...read moreread less

Proceedings Article•DOI•

Low power GDI ALU design with mixed logic adder functionality

[...]

Jean Abou Rahal¹, Bassel Maamari¹, Basma Hajri¹, Rouwaida Kanj¹, Mohammad M. Mansour¹, Ali Chehab¹ - Show less +2 more•Institutions (1)

American University of Beirut¹

04 Jun 2018

TL;DR: A low power Arithmetic Logic Unit (ALU) in 90nm technology using optimized Gate Diffusion Input (GDI) design style is presented, by combining the adder and logic functionality in a modified ten transistor cell, which is compact compared to other designs.

...read moreread less

Abstract: In this paper, we present a low power Arithmetic Logic Unit (ALU) in 90nm technology using optimized Gate Diffusion Input (GDI) design style. Inspired by the basic ten transistor adder cell compactness, we redesign the architecture of the ALU, by combining the adder and logic functionality in a modified ten transistor cell. The design is compact compared to other designs. For low power applications, we add a transmission gate to isolate the front-end logic section from the backend adder section of the same cell. This restricts power activity only to the logic section when the ALU performs logic operations, leading to more than 2X power savings compared to no-transmission gate implementation. All this comes with less than 9% increase in delay at 1.5V for an 8-bit ALU implementation, and around 50% reduction in the transistor count compared to a traditional GDI ALU implementation with separate logic and arithmetic functionality and enable signals. The design is simulated and verified using a 90nm TSMC design kit.

...read moreread less

Journal Article•DOI•

Estimation and Analysis of Novel Dynamic Body Biased TSPC Design Technique

[...]

Preeti Verma¹, V. S. Pandey¹, Ajay K. Sharma², Arti Noor³•Institutions (3)

National Institute of Technology Delhi¹, Punjab Technical University², Centre for Development of Advanced Computing³

01 Dec 2018

TL;DR: Comprehensive simulation in cadence using 90 nm technology, shows that the proposed design vanquish conventional and other previously reported dynamic circuit design techniques in all aspect of circuit performance.

...read moreread less

Abstract: This paper targets to present energy efficient high speed true single phase clock dynamic circuit design technique, utilizing a novel body biasing tuner. The threshold voltage is controlled dynamically by dint of a novel body bias tuner so that performance of the circuit is enhanced in terms of power, delay, temperature, voltage, noise and corner variations. Power consumption and delay is computed and analysed for wide range of temperature and 40.78–95.5% saving in power delay product is obtained with the same. Quantification of bias voltage variation effect and process corners to find the effectiveness of the proposed design are examined and it is found to be performing consistently as compared with other techniques. Later on bouncing noise analysis is done for the valuation of noise in the circuit. Comparison of power delay product, transistor count and clock phase is done with several previously reported designs. Comprehensive simulation in cadence using 90 nm technology, shows that the proposed design vanquish conventional and other previously reported dynamic circuit design techniques in all aspect of circuit performance. Further, an arithmetic logic unit for measurement using sensors is implemented as a prolongation of the proposed dynamic circuit design technique.

...read moreread less

Patent•

Online alarm device of control loop power supply and verification method thereof

[...]

Zhai Shimin, Bai Xiaobo, Zhou Zhejun, Li Wei, Yuan Yikun, Deng Hanqiu, Guan Yunquan, Wang Qi, Guo Peibin - Show less +5 more

02 Nov 2018

TL;DR: In this paper, an online alarm device of a control loop power supply and a verification method of the verification method is presented. But the verification of the online alarm unit is limited to the field of a reactor safety instrument control system.

...read moreread less

Abstract: The invention belongs to the field of a reactor safety instrument control system, and particularly relates to an online alarm device of a control loop power supply and a verification method thereof. The online alarm device comprises a driving control module power supply, an overcurrent protecting fuse, a voltage stabilizing diode, an arithmetic logic unit and an online alarm unit. The control module power supply, the overcurrent protecting fuse and the voltage stabilizing diode are successively connected in series. The overcurrent protecting fuse further comprises an overcurrent protecting fuse input end and an overcurrent protecting fuse output end. The arithmetic logic unit is provided with an arithmetic logic unit output end and an arithmetic logic unit input end. Two lines are respectively leaded out of the overcurrent protecting fuse input end and the overcurrent protecting fuse output end for connecting with the arithmetic logic unit input end. The arithmetic logic unit output end is serially connected with the online alarm unit.

...read moreread less