scispace - formally typeset
Search or ask a question

Showing papers on "Arithmetic logic unit published in 2020"


Journal ArticleDOI
TL;DR: An electronic-photonic computing architecture for a wavelength division multiplexing-based electronic-Photonic arithmetic logic unit, which disentangles the exponential relationship between power and clock rate, leading to an enhancement in computation speed and power efficiency as compared to the state-of-the-art transistors-based circuits.
Abstract: The past two decades have witnessed the stagnation of the clock speed of microprocessors followed by the recent faltering of Moore’s law as nanofabrication technology approaches its unavoidable physical limit. Vigorous efforts from various research areas have been made to develop power-efficient and ultrafast computing machines in this post-Moore’s law era. With its unique capacity to integrate complex electro-optic circuits on a single chip, integrated photonics has revolutionized the interconnects and has shown its striking potential in optical computing. Here, we propose an electronic-photonic computing architecture for a wavelength division multiplexing-based electronic-photonic arithmetic logic unit, which disentangles the exponential relationship between power and clock rate, leading to an enhancement in computation speed and power efficiency as compared to the state-of-the-art transistors-based circuits. We experimentally demonstrate its practicality by implementing a 4-bit arithmetic logic unit consisting of 8 high-speed microdisk modulators and operating at 20 GHz. This approach paves the way to future power-saving and high-speed electronic-photonic computing circuits. Integrated photonics allows integration of complex optical circuits on a single chip. Here, the authors propose a wavelength division multiplexing based electronic-photonic arithmetic logic unit for computing at high speeds and with improved power consumption to help with the physical limits of Moore’s law.

95 citations


Journal ArticleDOI
TL;DR: An efficient fault-tolerant 3-input majority gate with ten simple and rotated cells whose output signal strength is very high and which is 100% and 90% tolerant against single-cell omission and extra-cell deposition defects is proposed.
Abstract: Inverter and majority gates are considered as two important primitive gates for designing logical circuits in the quantum-dot cellular automata (QCA) technology. Up to now, many QCA layouts have been introduced for three-input majority gates, most of which are not robust against the QCA defects and so they are prone to faults. In this paper, we propose an efficient fault-tolerant 3-input majority gate with ten simple and rotated cells whose output signal strength is very high (± 9.93e−001). The fault tolerance of the proposed structure is investigated against cell omission, extra-cell deposition, and displacement defects. The results show that the proposed structure is 100% and 90% tolerant against single-cell omission and extra-cell deposition defects. Moreover, the error probability of the proposed gate under cell omission and extra-cell deposition defects is investigated through analytical modeling. Using the proposed fault-tolerant structure, two basic circuits including a fault-tolerant QCA full-adder and a fault-tolerant 2:1 QCA multiplexer are introduced. Finally, using the proposed circuits, a fault-tolerant one-bit arithmetic logic unit with four mathematical and logical operations is designed and implemented. To verify the proposed three-input majority gate, some physical proofs are provided. The results of simulations by QCADesigner 2.0.3 show that the proposed circuits work well. The power analysis of the proposed structure is performed using a QCAPro tool. The comparison results show that the proposed circuits are much better than the previous designs.

44 citations


Journal ArticleDOI
TL;DR: A novel fault-tolerant 5-input majority gate using 28 simple and rotated cells in Quantum-dot Cellular Automata (QCA) technology is proposed and the functionality of the proposed structure is confirmed by physical proofs.

28 citations


Journal ArticleDOI
TL;DR: A novel logic-in-memory (LIM) architecture of magnetic arithmetic logic unit (P-MALU) based on hybrid STT-MTJ/CMOS circuits is proposed which is superior than other two ALU designs in terms of power dissipation, delay and device count.
Abstract: One of the major concern for CMOS technology is the increase in power dissipation as the technology node lowers down to deep submicron region. Magnetic tunnel junction (MTJ) working on Spin transfer torque (STT) switching mechanism is recognized as one of the most promising spintronic device for post CMOS era due to its non-volatility, high speed, high endurance, CMOS compatibility and mainly the low power dissipation which can offer the solutions for the problems posed by existing CMOS technology. We have proposed a novel logic-in-memory (LIM) architecture of magnetic arithmetic logic unit (P-MALU) based on hybrid STT-MTJ/CMOS circuits. Simulation results reveal that there is significant reduction in the total power dissipation and transistor count of arithmetic unit by 28.44% and 29.16% compared to double pass transistor logic based clocked CMOS ALU design (DPTL-C2MOS-ALU), while 58.87% and 45.16% to modified magnetic arithmetic logic unit (M-MALU) respectively. Reduction in average power dissipation for logical unit is 37.61% and 52.55% along with 47.22% and 42.42% fewer transistors than DPTL-C2MOS-ALU and M-MALU design respectively. Monte-Carlo(MC) simulation is then performed by incorporating process and mismatch variations for CMOS and extracted parameters of MTJ, to study the behavior of DPTL-C2MOS-ALU, M-MALU and P-MALU designs in terms of power dissipation. All the simulation results reveal that the P-MALU is superior than other two ALU designs in terms of power dissipation, delay and device count. Further, the P-MALU circuit is extended for 4-bits arithmetic operations. Electrical simulations are performed to verify the functionality of the design for higher bit operations which demonstrates the feasibility of the proposed design in VLSI circuits.

23 citations


Journal ArticleDOI
TL;DR: This article presents the low-power ternary arithmetic logic unit (ALU) design in carbon nanotube field-effect transistor (CNFET) technology and shows that the proposed processing modules outperform their counterparts in terms of power consumption, energy consumption and device count.
Abstract: This article presents the low-power ternary arithmetic logic unit (ALU) design in carbon nanotube field-effect transistor (CNFET) technology. CNFET unique characteristic of geometry-dependent threshold voltage is employed in the multi-valued logic design. The ternary logic benefit of reduced circuit overhead is exploited by embedding multiple modules within a block. The existence of symmetric literals among various single shift and dual shift operators in addition and subtraction operations results in the optimized realization of adder/subtractor modules. The proposed design is based on the notion of multiplexing either arithmetic, logical or miscellaneous operations, depending upon the status of input selection trits. The results obtained by the synopsis HSPICE simulator with the Stanford 32 nm CNFET technology illustrate that the proposed processing modules outperform their counterparts in terms of power consumption, energy consumption and device count. The proposed methodology leads to saving in power consumption and energy consumption (PDP) of 62% and 58%, respectively, on the benchmark circuit of the ALU [full adder/subtractor (FAS)]. Furthermore, for the 2-trit multiplier design, the enhanced performance at the architecture and circuit level is achieved through the optimized designs of various adder and multiplier circuits.

18 citations


Proceedings ArticleDOI
15 Jun 2020
TL;DR: Implementing the GDI technique in designing the ALU results in low power consumption and the number of transistors it requires is much less, which result in reduced chip-area and power consumption - two of the most important parameters in digital VLSI design.
Abstract: In this paper, the design of an 8-bit Arithmetic Logic Unit (ALU) using Gate Diffusion Input (GDI) technique is proposed. Implementing the GDI technique in designing the ALU results in low power consumption and the number of transistors it requires is much less. Which result in reduced chip-area and power consumption - two of the most important parameters in digital VLSI design. In this design, 3T XOR is used in the full adder. Moreover, a novel 1-to-8 demultiplexer circuit has been used in the design as well. A considerable number of research papers are studied and compared various logic families and then finally designed an 8-bit ALU which can perform 8 different operations. The design is validated using the schematic editor-DHCH 3.5 and the simulation have been carried out using Xilinx ISE 14.7.

9 citations


Journal ArticleDOI
TL;DR: This work presents the design of a reduced instruction set computing (RISC) microprocessor in a four-stage pipeline architecture using Active-HDL to verify the design by simulating a case study, with the code presented in the Appendix.
Abstract: This work presents the design of a reduced instruction set computing (RISC) microprocessor in a four-stage pipeline architecture. The nuts and bolts of each block including timing diagrams are elaborated. To be more specific, a hardware solution to pipeline hazards is proposed and verified, i.e. provided the current result of the arithmetic logic unit (ALU) is employed by the next instruction, the FOWARDING_EN goes high and commands the ALU to select the forwarded result as the updated input, rather than the stale value read from the random-access memory (RAM), thus obviating the need to wait for another two cycles until the expected data has been written back. For the design with data forwarding, two limitations were discovered and tackled successfully by a software solution. Active-HDL was employed to verify the design by simulating a case study, with the code presented in the Appendix.

9 citations


Posted Content
TL;DR: An in-depth analysis reveals practical shortcomings by design, such as the inability to multiply or divide negative input values or training stability issues for deeper networks, and proposes an improved model architecture that solves stability issues and outperforms the original NALU model in means of arithmetic precision and convergence.
Abstract: Neural networks have to capture mathematical relationships in order to learn various tasks. They approximate these relations implicitly and therefore often do not generalize well. The recently proposed Neural Arithmetic Logic Unit (NALU) is a novel neural architecture which is able to explicitly represent the mathematical relationships by the units of the network to learn operations such as summation, subtraction or multiplication. Although NALUs have been shown to perform well on various downstream tasks, an in-depth analysis reveals practical shortcomings by design, such as the inability to multiply or divide negative input values or training stability issues for deeper networks. We address these issues and propose an improved model architecture. We evaluate our model empirically in various settings from learning basic arithmetic operations to more complex functions. Our experiments indicate that our model solves stability issues and outperforms the original NALU model in means of arithmetic precision and convergence.

8 citations


Journal ArticleDOI
29 Sep 2020
TL;DR: This article proposed Neural Arithmetic Logic Unit (NALU) which is able to explicitly represent the mathematical relationships by the units of the network to learn operations such as summation, subtraction or multiplication.
Abstract: Neural networks have to capture mathematical relationships in order to learn various tasks. They approximate these relations implicitly and therefore often do not generalize well. The recently proposed Neural Arithmetic Logic Unit (NALU) is a novel neural architecture which is able to explicitly represent the mathematical relationships by the units of the network to learn operations such as summation, subtraction or multiplication. Although NALUs have been shown to perform well on various downstream tasks, an in-depth analysis reveals practical shortcomings by design, such as the inability to multiply or divide negative input values or training stability issues for deeper networks. We address these issues and propose an improved model architecture. We evaluate our model empirically in various settings from learning basic arithmetic operations to more complex functions. Our experiments indicate that our model solves stability issues and outperforms the original NALU model in means of arithmetic precision and convergence.

7 citations


Proceedings ArticleDOI
12 Mar 2020
TL;DR: Simulation of QCA design is more reliable when compared to Verilog Code Design and QCAD, a free computer-aided design software application for 2-Dimensional design and drafting, shows results of multiplexers and demultiplexers.
Abstract: Advancement from Verilog for the design and development of digital ICs is the innovation in QCA (Quantum Cellular Automata) Design which represents ideal cell automation using Quantum dots. QCA Design uses a transistor less computation technology which can produce high-density Nanoscale transistor. The main area of concentration in this paper is to simulate design and comparison of digital circuits using QCA Designer and Verilog Code, more specifically multiplexers and demultiplexers. The simulation results of these two digital circuits using QCA Designer and Verilog code are shown separately along with a performance comparison table. It was observed that they seem to be the same. But., while we consider other parameters like time, space, speed and sensitivity., we truly get to know that QCA design is more reliable when compared to Verilog Code Design. QCAD is a free computer-aided design software application for 2-Dimensional design and drafting. Multiplexers are used in telephone networking, computer memory, and transmission from the computer system of a satellite and in many more fields. Demultiplexers are used in Communication Systems, ALU (Arithmetic Logic Unit) and many other fields.

5 citations


Journal ArticleDOI
01 Dec 2020
TL;DR: This manuscript is an attempt to design all-optical reversible arithmetic logic unit that will simplify the processing by eliminating optoelectronic conversion, but it will also minimize the problem of heat dissipation.
Abstract: This manuscript is an attempt to design all-optical reversible arithmetic logic unit. Reversibility will not only help in reducing the errors at the receiving end, but also will increase the multitasking of the optical processors by reducing heat dissipation. At the receiving end of an optical network, conversion of data from optical to electrical takes place for the processing purpose. This kind of processing of data dissipates huge amount of energy in form of heat. This problem can be solved by developing optical processing unit. This manuscript proposed the solution for above-said problem. This kind of processors will simplify the processing by eliminating optoelectronic conversion, but it will also minimize the problem of heat dissipation. The value of average quality factor for this proposed model is 59.98 dB, while the average extinction ratio is 21.26 dB. Optical cost for this proposed model is 4.

Book ChapterDOI
01 Jan 2020
TL;DR: Analytical comparison of full adders has been presented on the basis of power, delay and PDP and results show that for input to output carry FA-Tung has the highest speed and the lowest PDP.
Abstract: In this paper, analytical comparison of full adders has been presented on the basis of power, delay and PDP. All simulations are performed using SPICE in 32 nm CMOS technology. Full adder is the basic block of an arithmetic logic unit (ALU) so the power consumption and delay of an ALU are reduced by optimizing full adder. Simulation results show that for input to output carry FA-Tung has the highest speed and the lowest PDP. While for input to output sum, FA-Goel has the highest speed and the lowest PDP. FA-Conventional has the highest power consumption while FA-Tung has the lowest power consumption. For multi-bit adders, FA-Tung has the best performance.

Proceedings ArticleDOI
06 Jul 2020
TL;DR: A proof-of-concept BNN with 8x8 processing elements (PEs) is implemented by FPGA for performing six calculation units (CU) in parallel and all calculations are retrieved with the inaccuracy less than 3.1%.
Abstract: An elastic neural network is implemented by FPGA for constructing the multi-grained reconfigurable accelerator (MGRA). On the basis of a novel bisection neural network (BNN) topology, the entire network on hardware is efficiently partitioned into arbitrary pieces with diamond-like shape (seen as "DiaNet") which perform regressions for retrieving arbitrary approximate calculations in parallel. By organizing massive DiaNets, the entire network is reconfigurable in fine-grained (functions of each DiaNet), mid-grained (DiaNet features), and coarse-grained (organization of DiaNets) without redundancy. In this work, a proof-of-concept BNN with 8x8 processing elements (PEs) is implemented by FPGA for performing six calculation units (CU) in parallel. Over various approximate computing tasks with one, two, and three operands, all calculations are retrieved with the inaccuracy less than 3.1%. The maximum hardware utilization of a single CU is reduced to 1.7%, 17.9%, and 7.6% of general arithmetic logic unit (ALU), approximate computing units powered by domain-specific architecture (DSA) and neural network, respectively.

Proceedings ArticleDOI
01 Feb 2020
TL;DR: The proposed 17T full adder and its application in implementing an efficient Arithmetic Logic Unit (ALU) design have been proposed and will be able to significantly reduce the power requirements of a digital processor.
Abstract: In this paper, a novel 17T full adder and its application in implementing an efficient Arithmetic Logic Unit (ALU) design have been proposed. The proposed design will be able to significantly reduce the power requirements of a digital processor. Further, it also minimizes the delay and the design is efficient in terms of power-delay product. The ALU is one of the important entities of a digital processor. In a digital processor, an ALU as the name suggest performs logical and arithmetic operations. Thus, increasing the speed of operation while reducing the power requirements can cause a cumulative increase in the throughput of the digital system. Further, the proposed 17T full adder uses only multiplexing logic to produce the Sum and the Carry-Out signals with similar Sum and Carry-Out signal path delay. Moreover, the signal propagation delay from input to output of the proposed adder has been found to be 83.8% to 89.9% less as compared to existing hybrid full adder design like HFA-22T, HFA-20T and HFA-19T as well as full adders like 10T and 11T. Further, 71.5% to 74.3% saving in power requirements has been observed. Thus, the proposed 8-bit ALU has also been able to perform almost 52% better than the existing design in terms of the overall power-delay product. We have simulated the designs as well as evaluated the results using the Cadence Virtuoso EDA tool v15.0 in 45 nm process technology. Performance analyses were done with respect to power, delay, and power-delay product.

Journal ArticleDOI
TL;DR: The main aim of this paper is to propose a new design of reversible ALU and enhance number of operations in it and demonstrates increase in functionality with 56% reduction in gates, 17 % reduction in garbage lines, 92%) reduction in ancillary lines and 53 % reductionIn quantum cost.
Abstract: Energy loss is a big challenge in digital logic design primarily due to impending end of Moore’s Law. Increase in power dissipation not only affects portability but also overall life span of a device. Many applications cannot afford this loss. Therefore, future computing will rely on reversible logic for implementation of power efficient and compact circuits. Arithmetic and logic unit (ALU) is a fundamental component of all processors and designing it with reversible logic is tedious. The various ALU designs using reversible logic gates exist in literature but operations performed by them are limited. The main aim of this paper is to propose a new design of reversible ALU and enhance number of operations in it. This paper critically analyzes proposed ALU with existing designs and demonstrates increase in functionality with 56% reduction in gates, 17 % reduction in garbage lines, 92 % reduction in ancillary lines and 53 % reduction in quantum cost. The proposed ALU design is coded in Verilog HDL, synthesized and simulated using EDA (Electronic Design Automation) tool-Xilinx ISE design suit 14.2. RCViewer+ tool has been used to validate quantum cost of proposed design.

Proceedings ArticleDOI
04 Mar 2020
TL;DR: Three multiplier designs are proposed and compared with existing multiplier designs, found to be more optimal with the existing design in terms of delay and area and can be used for exact multiplier applications.
Abstract: Fast multipliers play a significant role in digital signal processing (DSP) and Arithmetic Logic Unit (ALU) systems. Delay and area are cardinal factors that limit the performance of a VLSI design circuit. The paper focuses on new approaches to Dadda Multiplier using Novel compressor designs. Two novel 4-2 compressors and modified higher order compressors are introduced. Three multiplier designs are proposed and compared with existing multiplier designs. Proposed design is found to be more optimal with the existing design in terms of delay and area, and can be used for exact multiplier applications. The designs are simulated using Xilinx ISE tool.

Journal ArticleDOI
TL;DR: A logic-level simulation of the proposed 8-bit bit-parallel RSFQ microprocessor shows correct operation with a target frequency of 16.7 GHz.
Abstract: With Moore's law approaching its physical limits, low-temperature computing technology is ushering in unprecedented development opportunities. Rapid single-flux-quantum (RSFQ) circuit technology is currently the most mature superconducting integrated circuit technology. Based on the current fabrication process, we propose an 8-bit bit-parallel RSFQ microprocessor. The proposed microprocessor processes 8-bit data each clock cycle. Ten different instructions are executed. The microprocessor mainly consists of an on-chip instruction memory, two data registers, an instruction decoder, an 8-bit bit-parallel arithmetic logic unit, and a program counter. The microprocessor contains 7702 JJs (based on the Open Dataset of CONNECT Cell Library for AIST ADP2) without considering splitters, Josephson transmission lines, and passive transmission lines. We perform a logic-level simulation of the proposed microprocessor. The simulation results show correct operation with a target frequency of 16.7 GHz.

Book ChapterDOI
01 Jan 2020
TL;DR: By using in-memory computing technique, the well-known von-Neumann bottleneck will be mitigated as well as energy efficiency is enhanced.
Abstract: This paper presents an In-Memory Computation (IMC) architecture using Full Swing Gate Diffusion Input (FS-GDI) in a single-ended disturb-free 6T SRAM. Not only are basic boolean functions (AND, NAND, OR, NOR, XOR2, XOR3, XNOR2) fully realized, a Ripple-Carry Adder (RCA) is also realized such that IMC is feasible without ALU (Arithmetic Logic Unit) or CPU. FS-GDI reserves the benefits of the original GDI, and further resolves the reduced voltage swing issue, but it leads to speed degradation and large static power. Therefore, by using in-memory computing technique, the well-known von-Neumann bottleneck will be mitigated as well as energy efficiency is enhanced.

Posted Content
Akhil Singh1, Arnav Shantnu Dublish1, Shreyasi1, Aniruddha Naik1, V. Nithya1 
TL;DR: In this paper, the concept of layering is used to give input to the proposed arithmetic logic unit (ALU), thus reducing the occupation area and the cell count of the proposed ALU.
Abstract: Quantum Dot Cellular Automata (QCA) is a modern paragon that encodes binary information, ie 0 and 1, inside a cell instead of traditional current switches Information is expressed by a QCA cell’s charging configuration Current does not flow in these cells This innovative concept provides a potential alternative to transistor-less computation at Nano scale An ALU stands for Arithmetic Logic Unit which carries of logical and arithmetic operation based on the input operands The concept of layering is used to give input to the proposed Arithmetic Logic Unit (ALU), thus reducing the occupation area and the cell count It consists of Half Adder and NAND Gate followed by two 2x1 Multiplexer The proposed design of half adder and NAND Gate (used for And operation) uses less occupation area, less number of cells and low energy dissipation than the existing designs

Proceedings ArticleDOI
28 Jul 2020
TL;DR: A modular design of Arithmetic Logic Unit based on memristor and CMOS logic which will lead to significant decrease in overall implementation area and significant reduction in transistor count is presented.
Abstract: This paper presents a modular design of Arithmetic Logic Unit based on memristor and CMOS logic. The proposal has been implemented using ThrEshold Adaptive Memristor model (TEAM). The functional verification is done using simulations on Cadence Virtuoso tool. There is significant reduction in transistor count using the proposed approach which will lead to significant decrease in overall implementation area. Comparison of this approach helps us to reduce the chip area as well as power utilization and it is reliable as well.

Proceedings ArticleDOI
01 Feb 2020
TL;DR: A trade-off among different parameters like Area acquired, Power used, Quality & Performance (PDP), Energy efficiencies (EDP) of fourteen different full adder circuit is calculated and analyzed at 90nm CMOS technology on DSCH 3.8 software.
Abstract: A full adder is a basic building block and most used circuit that is used as long-chain for different operations and calculating numbers using the cascade of different structures and topology in computer architecture ie Arithmetic Logic Unit (ALU) In this paper a trade-off among different parameters like Area acquired, Power used, Quality & Performance (PDP), Energy efficiencies (EDP) of fourteen different full adder circuit is calculated and analyzed at 90nm CMOS technology on DSCH 38 software using MICROWIND 38 simulator by making a layout of these circuit at constant 27°C temperature and at constant delay and constant voltage to make the analysis simple

Journal ArticleDOI
TL;DR: The proposed methods significantly improve the circuit timing, and at the same time considerably limit leakage energy, by employing a combination of cross-layer techniques based on circuit redesign and code replacement techniques.
Abstract: Modern electronic devices are an indispensable part of our everyday life. A major enabler for such integration is the exponential increase of the computation capabilities as well as the drastic improvement in the energy efficiency over the last 50 years, commonly known as Moore’s law. In this regard, the demand for energy-efficient digital circuits, especially for application domains such as the Internet of Things (IoT), has faced an enormous growth. Since the power consumption of a circuit highly depends on the supply voltage, aggressive supply voltage scaling to the near-threshold voltage region, also known as Near-Threshold Computing (NTC), is an effective way of increasing the energy efficiency of a circuit by an order of magnitude. However, NTC comes with specific challenges with respect to performance and reliability, which mandates new sets of design techniques to fully harness its potential. While techniques merely focused at one abstraction level, in particular circuit-level design, can have limited benefits, cross-layer approaches result in far better optimizations. This paper presents instruction multi-cycling and functional unit partitioning methods to improve energy efficiency and resiliency of functional units. The proposed methods significantly improve the circuit timing, and at the same time considerably limit leakage energy, by employing a combination of cross-layer techniques based on circuit redesign and code replacement techniques. Simulation results show that the proposed methods improve performance and energy efficiency of an Arithmetic Logic Unit by 19% and 43%, respectively. Furthermore, the improved performance of the optimized circuits can be traded to improving the reliability.

Book ChapterDOI
01 Jan 2020
TL;DR: In this chapter, the reversible arithmetic logic unit (ALU) and its implementation in QCA framework is examined and the complexity of different reversible and non-reversible ALU structures is inspected with comparative analysis.
Abstract: In this chapter, we examined the reversible arithmetic logic unit (ALU) and its implementation in QCA framework. ALU is one of the fundamental components as it defines the performance of any processing systems. This chapter is structured in four sections. First section discusses different ALU structures in QCA. In Sect. 7.2, we analyze and validate one of the reversible ALU designs in QCA framework. Section 7.3 inspects the complexity of different reversible and non-reversible ALU structures with comparative analysis. Section 7.4 presents the summary of the chapter.

Patent
14 Jan 2020
TL;DR: In this paper, a low complexity optimization solver for path smoothing with constraint variation is presented, which includes an L1 controller configured to receive a raw data series z to be smoothed, where L1 represents a formulation based on L1 norm cost, receives weights w0, w1, w2, w3, and w3 to control smoothness of an output path, and formulate an L 1 trend filtering problem.
Abstract: An apparatus and method of low complexity optimization solver for path smoothing with constraint variation are herein disclosed. According to one embodiment, an apparatus includes an L1 controller configured to receive a raw data series z to be smoothed, where L1 represents a formulation based on L1 norm cost, receives weights w0, w1, w2, and w3 to control smoothness of an output path, and formulate an L1 trend filtering problem; an L1 central processing unit (CPU) connected to the L1 controller and configured to transform the L1 trend filtering problem to a primal-dual linear programming (LP) optimization problem pair; and an L1 arithmetic logic unit (ALU) connected to the L1 CPU and configured to solve a primal problem of the primal-dual problem pair with an extended full tableau simplex method.

Proceedings ArticleDOI
05 Jun 2020
TL;DR: An Arithmetic & Logic Unit (ALU) that is both Ultra-Low Power and High Speed that was implemented in 45nm technology and implemented in Digital Environment with Verilog HDL so as to optimize the design.
Abstract: Device scaling has been trending due to its sleek and small dimensions yet its energy efficiency limits itself. With the increasing demand for speed, portability and miniaturization of current gadgets, the power consumption of these products has become a major design factor. Especially for mobile devices, the power consumption drives the battery life-time, the generated heat and the required heat dispersion measures. Therefore, this demands a reduction in the power dissipation of digital circuits. Processors have become increasingly complex and power hungry. There are many applications of microprocessors with limited resources, where energy efficiency becomes a critical requirement. Power dissipation depends on the CMOS fabrication technology, operating frequency, but most of all on the switching per clock cycle within the digital circuit. Power consumption due to device parasites is another major design issue. This paper implements an Arithmetic & Logic Unit (ALU) that is both Ultra-Low Power and High Speed. The design was implemented in 45nm technology. The proposed ALU was implemented in Digital Environment with Verilog HDL so as to optimize the design.

Proceedings ArticleDOI
07 Oct 2020
TL;DR: A modified structure for an 8-bit Arithmetic logic unit with modified Booth Multiplier is presented in this work and the modified booth encoding method reduces the delay there by improving the speed of the overall device.
Abstract: Nowadays most progressive networks are organised through Boolean Implementation. Boolean Implementation helps in diminishing warmth dissipating, providing for almost essentialness free figuring, resulting in enhanced device sizes as well as engaging efficient evaluation of lacks. A modified structure for an 8-bit Arithmetic logic unit with modified Booth Multiplier is presented in this work. The 16-bit logic is arranged through a falling 1-bit arithmetic logic. The imperative modules of a 1-bit ALU are the module of power and the module of addition. This ALU arrangement has decreased door check and semiconductor count. Using a modified booth multiplier the arithmetic logic unit is implanted in this paper. The modified booth encoding method reduces the delay there by improving the speed of the overall device.

Journal ArticleDOI
TL;DR: This paper designed a 4-bit Arithmetic and Logic Unit (ALU) using Single Electron Transistor (SET), a new type of switching nanodevice that uses controlled single-electron tunneling to amplify the current.
Abstract: The demand for low power dissipation and increasing speed elicits numerous research efforts in the field of nano CMOS technology. The Arithmetic Logic Unit is the core of any central processing unit. In this paper, we designed a 4-bit Arithmetic and Logic Unit (ALU) using Single Electron Transistor (SET). Single-electron transistor (SET) is a new type of switching nanodevice that uses controlled single-electron tunneling to amplify the current. The single-electron transistor (SET) is highly scalable and possesses ultra-low power consumption when compared to conventional semiconductor devices. Reversible logic gates designed using SET are used for performing 4-bit arithmetic operations. We modelled symmetric single gate SET operating at room temperature using Verilog A code. The design is carried out in cadence simulation environment. The 4-bit SET based ALU design exhibits the power of 0.52 nW and delay of 350pS.

Journal Article
TL;DR: Agarwal et al. as mentioned in this paper investigated garbage-free reversible central processing unit computing systems to physical gate-level implementation, and proposed the design of adder, sub tractor, multiplexer, encoder and work towards a reversible circuit for general circuit.
Abstract: Reversible computing spans computational models that are both forward and backward deterministic. These models have applications in program inversion and bidirectional computing, and are also interesting as a study of theoretical properties. A reversible computation does, thus, not have to use energy, though this is impossible to avoid in practice, due to the way computers are build. It is, however, not always obvious how to implement reversible computing systems. The restriction to avoid information loss imposes new design criteria that need to be incorporated into the design; criteria that do not follow directly from conventional models. In this paper, investigate garbage-free reversible central processing unit computing systems to physical gate-level implementation. Arithmetic operations are a basis for many computing systems, so a proposed the design of adder, sub tractor, multiplexer, encoder and work towards a reversible circuit for general circuit are important new circuits. In all design implemented Xilinx software and simulated VHDL text bench.

Posted Content
TL;DR: In this article, the authors propose to group the bits into 4-bit blocks that are operated on concurrently and create block-skewed datapath units for 32-bit operation.
Abstract: Single flux quantum (SFQ) circuits are an attractive beyond-CMOS technology because they promise two orders of magnitude lower power at clock frequencies exceeding 25 GHz.However, every SFQ gate is clocked creating very deep gate-level pipelines that are difficult to keep full, particularly for sequences that include data-dependent operations. This paper proposes to increase the throughput of SFQ pipelines by re-designing the datapath to accept and operate on least-significant bits (LSBs) clock cycles earlier than more significant bits. This skewed datapath approach reduces the latency of the LSB side which can be feedback earlier for use in subsequent data-dependent operations increasing their throughput. In particular,we propose to group the bits into 4-bit blocks that are operatedon concurrently and create block-skewed datapath units for 32-bit operation. This skewed approach allows a subsequent data-dependent operation to start evaluating as soon as the first 4-bit block completes. Using this general approach, we developa block-skewed MIPS-compatible 32-bit ALU. Our gate-level Verilog design improves the throughput of 32-bit data dependent operations by 2x and 1.5x compared to previously proposed 4-bit bit-slice and 32-bit Ladner-Fischer ALUs respectively.

Patent
20 Feb 2020
TL;DR: A processor has first, second and third ALUs as mentioned in this paper, where the first ALU has on a first side an input and an output, while the second ALU is a rotated orientation relative to the input and the output of the first side.
Abstract: A processor has first, second and third ALUs. The first ALU has on a first side an input and an output. The second ALU has a first side facing the first side of the first ALU, an input and an output on the first side of the second ALU and being in a rotated orientation relative to the input and the output of the first side of the first ALU, and an output on a second side of the second ALU. The third ALU has a first side facing the second side of the second ALU, and an input and an output on the first side of the third ALU. The input of the first side of the first ALU is logically directly connected to the output of the first side of the second ALU.