scispace - formally typeset
Search or ask a question

Showing papers on "Adder published in 2021"


Journal ArticleDOI
02 Jul 2021-Science
TL;DR: In this article, a monolithic optical micro-lithographic process was proposed to directly micropattern a set of elastic electronic materials by sequential ultraviolet light-triggered solubility modulation.
Abstract: Polymeric electronic materials have enabled soft and stretchable electronics. However, the lack of a universal micro/nanofabrication method for skin-like and elastic circuits results in low device density and limited parallel signal recording and processing ability relative to silicon-based devices. We present a monolithic optical microlithographic process that directly micropatterns a set of elastic electronic materials by sequential ultraviolet light-triggered solubility modulation. We fabricated transistors with channel lengths of 2 micrometers at a density of 42,000 transistors per square centimeter. We fabricated elastic circuits including an XOR gate and a half adder, both of which are essential components for an arithmetic logic unit. Our process offers a route to realize wafer-level fabrication of complex, high-density, and multilayered elastic circuits with performance rivaling that of their rigid counterparts.

102 citations


Journal ArticleDOI
TL;DR: This article proposes an improved logarithmic multiplier (ILM) that, unlike existing designs, rounds both inputs to their nearest powers of two by using a proposed nearest-one detector (NOD) circuit.
Abstract: Multiplication is the most resource-hungry operation in neural networks (NNs). Logarithmic multipliers (LMs) simplify multiplication to shift and addition operations and thus reduce the energy consumption. Since implementing the logarithm in a compact circuit often introduces approximation, some accuracy loss is inevitable in LMs. However, this inaccuracy accords with the inherent error tolerance of NNs and their associated applications. This article proposes an improved logarithmic multiplier (ILM) that, unlike existing designs, rounds both inputs to their nearest powers of two by using a proposed nearest-one detector (NOD) circuit. Considering that the output of the NOD uses a one-hot representation, some entries in the truth table of a conventional adder cannot occur. Hence, a compact adder is designed for the reduced truth table. The 8×8 ILM achieves up to 17.48 percent saving in power consumption compared to a recent LM in the literature while being almost 8 percent more accurate. Moreover, the evaluation of the ILM for two benchmark NN workloads shows up to 21.85 percent reduction in energy consumption compared to the NNs implemented with other LMs. Interestingly, using the ILM increases the classification accuracy of the considered NNs by up to 1.4 percent compared to a NN implementation that uses exact multipliers.

53 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed designs of approximate adders and multipliers based on majority logic (ML), which utilize approximate compressors and a reduction circuitry with so-called complement bits.
Abstract: As a new paradigm for nanoscale technologies, approximate computing deals with error tolerance in the computational process to improve performance and reduce power consumption. Majority logic (ML) is applicable to many emerging nanotechnologies; its basic building block (the 3-input majority voter, MV) has been extensively used for digital circuit design. In this paper, designs of approximate adders and multipliers based on ML are proposed; the proposed multipliers utilize approximate compressors and a reduction circuitry with so-called complement bits. An influence factor is defined and analyzed to assess the importance of different complement bits depending on the size of the multiplier; a scheme for selection of the complement bits is also presented. The proposed designs are evaluated using hardware metrics (such delay and gate complexity) as well as error metrics. Compared with other ML-based designs found in the technical literature, the proposed designs are found to offer superior performance. Case studies of error-resilient applications are also presented to show the validity of the proposed designs.

44 citations


Journal ArticleDOI
TL;DR: Generic area-optimized, low-latency accurate, and approximate softcore multiplier architectures, which exploit the underlying architectural features of FPGAs, i.e., lookup table (LUT) structures and fast-carry chains to reduce the overall critical path delay (CPD) and resource utilization of multipliers
Abstract: Multiplication is one of the widely used arithmetic operations in a variety of applications, such as image/video processing and machine learning FPGA vendors provide high-performance multipliers in the form of DSP blocks These multipliers are not only limited in number and have fixed locations on FPGAs but can also create additional routing delays and may prove inefficient for smaller bit-width multiplications Therefore, FPGA vendors additionally provide optimized soft IP cores for multiplication However, in this work, we advocate that these soft multiplier IP cores for FPGAs still need better designs to provide high-performance and resource efficiency Towards this, we present generic area-optimized, low-latency accurate and approximate softcore multiplier architectures, which exploit the underlying architectural features of FPGAs, ie, look-up table (LUT) structures and fast carry chains to reduce the overall critical path delay and resource utilization of multipliers Compared to Xilinx multiplier LogiCORE IP, our proposed unsigned and signed accurate architecture provides up to 25% and 53% reduction in LUT utilization, respectively, for different sizes of multipliers Moreover, with our unsigned approximate multiplier architectures, a reduction of up to 51% in the critical path delay can be achieved with an insignificant loss in output accuracy when compared with the LogiCORE IP For illustration, we have deployed the proposed multiplier architecture in accelerators used in image and video applications, and evaluated them for area and performance gains Our library of accurate and approximate multipliers is open-source and available online at https://cfaedtu-dresdende/pd-downloads to fuel further research and development in this area, facilitate reproducible research, and thereby enabling a new research direction for the FPGA community

43 citations


Journal ArticleDOI
TL;DR: A design approach for ternary combinational logic circuits while using CNTFETs and RRAM is presented and the proposed designs show a significant reduction in the transistor count, decreased cell area, and lower power consumption.
Abstract: The capability of multiple valued logic (MVL) circuits to achieve higher storage density when compared to that of existing binary circuits is highly impressive. Recently, MVL circuits have attracted significant attention for the design of digital systems. Carbon nanotube field effect transistors (CNTFETs) have shown great promise for design of MVL based circuits, due to the fact that the scalable threshold voltage of CNTFETs can be utilized easily for the multiple voltage designs. In addition, resistive random access memory (RRAM) is also a feasible option for the design of MVL circuits, owing to its multilevel cell capability that enables the storage of multiple resistance states within a single cell. In this manuscript, a design approach for ternary combinational logic circuits while using CNTFETs and RRAM is presented. The designs of ternary half adder, ternary half subtractor, ternary full adder, and ternary full subtractor are evaluated while using Synopsis HSPICE simulation software with standard 32 nm CNTFET technology under different operating conditions, including different supply voltages, output load variation, and different operating temperatures. Finally, the proposed designs are compared with the state-of-the-art ternary designs. Based on the obtained simulation results, the proposed designs show a significant reduction in the transistor count, decreased cell area, and lower power consumption. In addition, due to the participation of RRAM, the proposed designs have advantages in terms of non-volatility.

35 citations


Proceedings ArticleDOI
Dehua Song1, Yunhe Wang1, Hanting Chen1, Chang Xu2, Chunjing Xu1, Dacheng Tao2 
20 Jun 2021
TL;DR: Hu et al. as mentioned in this paper proposed to use adder neural networks (AdderNets) to calculate the output features to avoid massive energy consumptions of conventional multiplications for image super-resolution.
Abstract: This paper studies the single image super-resolution problem using adder neural networks (AdderNets). Com-pared with convolutional neural networks, AdderNets utilize additions to calculate the output features thus avoid massive energy consumptions of conventional multiplications. However, it is very hard to directly inherit the existing success of AdderNets on large-scale image classification to the image super-resolution task due to the different calculation paradigm. Specifically, the adder operation cannot easily learn the identity mapping, which is essential for image processing tasks. In addition, the functionality of high-pass filters cannot be ensured by AdderNets. To this end, we thoroughly analyze the relationship between an adder operation and the identity mapping and insert shortcuts to enhance the performance of SR models using adder networks. Then, we develop a learnable power activation for adjusting the feature distribution and refining details. Experiments conducted on several benchmark models and datasets demonstrate that, our image super-resolution models using AdderNets can achieve comparable performance and visual quality to that of their CNN baselines with an about 2.5× reduction on the energy consumption. The codes are available at: https://github.com/huawei-noah/AdderNet.

34 citations


Posted ContentDOI
TL;DR: The wafer-scale fabrication processes are guided by ML combined with grid searching to co-optimize device performance, including mobility, threshold voltage and subthreshold swing, and experimentally validate the application potential of ML-assisted fabrication optimization for beyond-silicon electronic materials.
Abstract: Triggered by the pioneering research on graphene, the family of two-dimensional layered materials (2DLMs) has been investigated for more than a decade, and appealing functionalities have been demonstrated. However, there are still challenges inhibiting high-quality growth and circuit-level integration, and results from previous studies are still far from complying with industrial standards. Here, we overcome these challenges by utilizing machine-learning (ML) algorithms to evaluate key process parameters that impact the electrical characteristics of MoS2 top-gated field-effect transistors (FETs). The wafer-scale fabrication processes are then guided by ML combined with grid searching to co-optimize device performance, including mobility, threshold voltage and subthreshold swing. A 62-level SPICE modeling was implemented for MoS2 FETs and further used to construct functional digital, analog, and photodetection circuits. Finally, we present wafer-scale test FET arrays and a 4-bit full adder employing industry-standard design flows and processes. Taken together, these results experimentally validate the application potential of ML-assisted fabrication optimization for beyond-silicon electronic materials. Here, the authors demonstrate the application of machine learning to optimize the device fabrication process for wafer-scale 2D semiconductors, and eventually fabricate digital, analog, and optoelectrical circuits.

30 citations


Proceedings ArticleDOI
01 Feb 2021
TL;DR: Zhang et al. as mentioned in this paper proposed an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed sum-ofpower-of-2 (SP2) and the fixed-point schemes.
Abstract: Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices. This paper focuses on weight quantization, a hardware-friendly model compression approach that is complementary to weight pruning.Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of heterogeneous FPGA hardware resources. To achieve that, we first propose a hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) suitable for Gaussian-like weight distribution, in which the multiplication arithmetic can be replaced with logic shifter and adder, thereby enabling highly efficient implementations with the FPGA LUT resources. In contrast, the existing fixed-point quantization is suitable for Uniform-like weight distribution and can be implemented efficiently by DSP. Then to fully explore the resources, we propose an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes. Combining the two schemes can maintain, or even increase accuracy due to better matching with weight distributions.For the FPGA implementations, we develop a parameterized architecture with heterogeneous Generalized Matrix Multiplication (GEMM) cores—one using LUTs for computations with SP2 quantized weights and the other utilizing DSPs for fixed-point quantized weights. Given the partition ratio among the two schemes based on resource characterization, MSQ quantization training algorithm derives an optimally quantized model for the FPGA implementation. We evaluate our FPGA-centric quantization framework across multiple application domains. With optimal SP2/fixed-point ratios on two FPGA devices, i.e., Zynq XC7Z020 and XC7Z045, we achieve performance improvement of 2.1 × -4.1 × compared to solely exploiting DSPs for all multiplication operations. In addition, the CNN implementations with the proposed MSQ scheme can achieve higher accuracy and comparable hardware utilization efficiency compared to the state-of-the-art designs.

30 citations


Journal ArticleDOI
TL;DR: The proposed hybrid FA based on the XOR-XNOR module can be a reliable and superior alternative to existing FAs and showed superior performance in the 32-bit operation.

29 citations


Journal ArticleDOI
TL;DR: In this paper, insights have been discussed leading to design of more efficient PhC based all-optical adders for next generation ultra-first optical processors.

25 citations


Journal ArticleDOI
TL;DR: A method to implement a majority gate in a transistor-accessed ReRAM array during the READ operation, which forms a functionally complete Boolean logic, capable of implementing any digital logic.
Abstract: To overcome the “von Neumann bottleneck,” methods to compute in memory are being researched in many emerging memory technologies, including resistive RAMs (ReRAMs). Majority logic is efficient for synthesizing arithmetic circuits when compared to NAND/NOR/IMPLY logic. In this work, we propose a method to implement a majority gate in a transistor-accessed ReRAM array during the READ operation. Together with NOT gate, which is also implemented in memory, the proposed gate forms a functionally complete Boolean logic, capable of implementing any digital logic. Computing is simplified to a sequence of READ and WRITE operations and does not require any major modifications to the peripheral circuitry of the array. While many methods have been proposed recently to implement the Boolean logic in memory, the latency of in-memory adders implemented as a sequence of such Boolean operations is exorbitant ( ${O}$ ( ${n}$ )). Parallel-prefix (PP) adders use prefix computation to accelerate addition in conventional CMOS-based adders. By exploiting the parallel-friendly nature of the proposed majority gate and the regular structure of the memory array, it is demonstrated how PP adders can be implemented in memory in ${O}$ (log( ${n}$ )) latency. The proposed in-memory addition technique incurs a latency of $4\cdot $ log( ${n}$ )+6 for $n$ -bit addition and is energy-efficient due to the absence of sneak currents in 1Transistor–1Resistor configuration.

Book ChapterDOI
01 Jan 2021
TL;DR: In this paper, a Wallace tree 8 * 8 multiplier architecture is proposed, and it produces optimized area and delay, where 2-bit and 3-bit adders are utilized in the 8-bit multiplier.
Abstract: In VLSI, hardware architecture requires the multiplier unit as one of the important parts for arithmetic operation. A multiplier is a major component in many hardware architectures, so various experts are focusing their research in multiplier design to accomplish compact area, delay, and power. Numerous case studies were done for many architectures, in that the increased speed and low area are achieved through a reduction of partial products. One and only of the finest methods is Wallace tree multiplier (WTM). In this research article, Wallace tree 8 * 8 multiplier architecture is proposed, and it produces optimized area and delay. Our work targets structuring and execution of Wallace tree 8 * 8 multiplier utilizing VHDL language. Using limiting quantity of partial products, 2-bit and 3-bit adders are utilized in the 8-bit multiplier. In this work, 8 * 8 Wallace tree multiplier development is inspected and reproduced in XILINX Integrated Software Environment tool. In this 8-bit Wallace tree multiplier circuit, our primary objectives are to diminish the area of multiplier circuit and speed up multiplier routine.

Posted ContentDOI
TL;DR: In this article, the authors proposed a new all-optical fulladder design based on nonlinear X-shaped photonic crystal (PhC) resonators for high-speed data processing systems.
Abstract: This paper proposes a new all-optical full-adder design based on nonlinear X-shaped photonic crystal (PhC) resonators. The PhC-based full-adder consists of three input ports, two X-shaped PhC resonators (X-PCRs), and two output ports. The dielectric rods made of silicon and nonlinear rods composed of doped glass are used to design the X-PCRs. Two well-known plane wave expansion and finite difference time domain methods are applied to study and analyze the photonic band structure and light propagation inside the PhC, respectively. Our numerical results demonstrate when the incoming light intensity increases, the nonlinear Kerr effect appears and manages the direction of light propagation inside the structure. The maximum time delay and footprint of the proposed full-adder are about 2.5 ps and 663 μm2, making it an appropriate adder for high-speed data processing systems.

Journal ArticleDOI
TL;DR: Quantum-dot Cellular Automata is an evolving post-CMOS paradigm that can be used for designing nanoscale circuits and digital circuits are implemented in QCA using majority logic.
Abstract: Quantum-dot Cellular Automata is an evolving post-CMOS paradigm that can be used for designing nanoscale circuits. Digital circuits are implemented in QCA using majority logic. Adder and subtractor...

Journal ArticleDOI
01 Jun 2021
TL;DR: The main finding of this research is that the single-bit performance parameters of FA cells should not be considered as the main basis for performance comparison and any FA cell should be analyzed in a multi-bit structure to determine its practical effectiveness.
Abstract: Full Adder (FA) circuits are integral components in the design of Arithmetic Logic Units (ALUs) of modern computing systems. Recently, there have been massive research interests in this area due to the growing need for low-power and high-performance computing systems. Researchers have proposed a variety of FA cells with diverse design techniques, each having its pros and cons. As a result, a systematic method for performance comparison of FA cells using a common simulation platform has become necessary. In this work, we present an extensive study of FA cells. We have compared the performance of thirty-three (33) existing 1-bit FA cells. The drive powers of these FA cells have been compared by applying a variety of load conditions. In addition, the 1-bit FA cells have been extended to 32-bit structures to test their scalability and to investigate their performance in wide-word structures. We have determined that twenty-one (21) of the thirty-three (33) FA cells cannot operate in a 32-bit structure, even though some of them exhibit excellent performance as a 1-bit cell. The main finding of this research is that the single-bit performance parameters of FA cells should not be considered as the main basis for performance comparison. Any FA cell should be analyzed in a multi-bit structure to determine its practical effectiveness.

Proceedings ArticleDOI
05 Dec 2021
TL;DR: In this paper, a grid-based state-action representation and an RL environment for constructing legal prefix circuits are designed and RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16.0% and 30.2% lower area.
Abstract: In this work, we present a reinforcement learning (RL) based approach to designing parallel prefix circuits such as adders or priority encoders that are fundamental to high-performance digital design. Unlike prior methods, our approach designs solutions tabula rasa purely through learning with synthesis in the loop. We design a grid-based state-action representation and an RL environment for constructing legal prefix circuits. Deep Convolutional RL agents trained on this environment produce prefix adder circuits that Pareto-dominate existing baselines with up to 16.0% and 30.2% lower area for the same delay in the 32b and 64b settings respectively. We observe that agents trained with open-source synthesis tools and cell library can design adder circuits that achieve lower area and delay than commercial tool adders in an industrial cell library.

Journal ArticleDOI
TL;DR: A memristor neural network circuit is designed, which can recognize and sequence four characters simultaneously, which may provide a reference for the development of new brain-like system.
Abstract: Hopfield neural network has been widely used in image recognition because of its associative memory behavior. In this paper, a memristor neural network circuit is designed, which can recognize and sequence four characters simultaneously. It mainly includes three modules, namely a character recognition module, a signal processing module and a sequence module. The character recognition module consists of four individual character recognition units, corresponding to the recognition of four character images (W, H, A, T). The character recognition module includes calculation submodule and iteration submodule. After the operation of the calculation submodule and the iterative submodule, the four-character images distributed by noise can be identified simultaneously. The signal processing module is used to simplify the output signals of the character recognition module by four adder units. The sequence module ensures that stable state is eventually converged to the word (WHAT). The synapse weight circuit given in this paper can obtain different weights, so as to realize the function of associative memory. The iterative process circuit of Hopfield neural network is also designed to further demonstrate the iterative process. The neural network circuit composed of memristors maybe smaller, which may provide a reference for the development of new brain-like system.

Journal ArticleDOI
Moojune Song1, Min Gyu Park1, San Ko1, Sung Kyu Jang1, Minkyu Je1, Kab-Jin Kim1 
TL;DR: In this paper, a skyrmion-based logic device is presented, which takes advantage of the skyrmetion annihilation (SA) and increases the efficiency of logic operation.
Abstract: Skyrmion-based devices are an attractive candidate for nonvolatile memory and low-power computation. In a real device, however, skyrmions easily annihilate at device edges, which hampers device applications. Here, we present a novel skyrmion-based logic device, which takes advantage of the skyrmion annihilation (SA) and increases the efficiency of logic operation. An SA half adder (HA) is implemented in a ferromagnet/heavy metal nanotrack by introducing a geometric notch that annihilates the skyrmion. In addition, full adder and ${n}$ -bit SA ripple-carry adder are demonstrated by directly cascading the SA HAs. The prototype of a 32-bit SA ripple-carry adder consumes energy as low as 0.62 pJ per each operation, which is only 18% of the previously proposed skyrmion adder. Our SA logic gate can, therefore, be a promising candidate for the beyond-CMOS logic device.

Journal ArticleDOI
Ji-Hoon Kim1, Juhyoung Lee1, Jinsu Lee1, Jaehoon Heo1, Joo-Young Kim1 
TL;DR: Z-PIM as mentioned in this paper adopts the bit serial arithmetic that performs a multiplication bit-by-bit through multiple cycles to reduce the complexity of the operation in a single cycle and to provide flexibility in bit-precision.
Abstract: We present an energy-efficient processing-in-memory (PIM) architecture named Z-PIM that supports both sparsity handling and fully variable bit-precision in weight data for energy-efficient deep neural networks. Z-PIM adopts the bit-serial arithmetic that performs a multiplication bit-by-bit through multiple cycles to reduce the complexity of the operation in a single cycle and to provide flexibility in bit-precision. To this end, it employs a zero-skipping convolution SRAM, which performs in-memory AND operations based on custom 8T-SRAM cells and channel-wise accumulations, and a diagonal accumulation SRAM that performs bit- and spatial-wise accumulation on the channel-wise accumulation results using diagonal logic and adders to produce the final convolution outputs. We propose the hierarchical bitline structure for energy-efficient weight bit pre-charging and computational readout by reducing the parasitic capacitances of the bitlines. Its charge reuse scheme reduces the switching rate by 95.42% for the convolution layers of VGG-16 model. In addition, Z-PIM’s channel-wise data mapping enables sparsity handling by skip-reading the input channels with zero weight. Its read-operation pipelining enabled by a read-sequence scheduling improves the throughput by 66.1%. The Z-PIM chip is fabricated in a 65-nm CMOS process on a 7.568-mm2 die, while it consumes average 5.294-mW power at 1.0-V voltage and 200-MHz frequency. It achieves 0.31–49.12-TOPS/W energy efficiency for convolution operations as the weight sparsity and bit-precision vary from 0.1 to 0.9 and 1 to 16 bit, respectively. For the figure of merit considering input bit-width, weight bit-width, and energy efficiency, the Z-PIM shows more than 2.1 times improvement over the state-of-the-art PIM implementations.

Journal ArticleDOI
TL;DR: In this paper, a 2D-PC with a hexagonal nanoring resonator (NRR), a coupling rod, and several waveguides was designed and simulated, where the mechanism of the interference effect in the PC was used to simplify and minimize the structure.
Abstract: Given the special place of hybrid logic circuits such as all-optical full adders in next-generation digital systems, a new kind of these structures using two-dimensional (2D) photonic crystals (2D-PC) is designed and simulated herein. The proposed structure is made of a hexagonal nanoring resonator (NRR), a coupling rod, and several waveguides. In this all-optical full adder, the mechanism of the interference effect in the PCs is used to simplify and minimize the structure. To make the structure flexible, the radius of the dielectric rod in the whole structure and the NRR are considered based on a lattice constant of 0.2a and 0.04a, respectively. The structure is operated at a wavelength of 1550 nm, considering the value of the power entering the waveguides and that exiting the Carry and Sum ports. To analyze the all-optical full adder, the plane-wave expansion method and finite-difference time-domain method are applied respectively to calculate the bandgap diagram and obtain the transmission and propagation of the optical field. In the proposed structure, the contrast ratio at the Carry and is been investigated in a unique and novel way, yielding values of 10.68 and 9.03 dB, respectively. In addition, the maximum and minimum response time for the Carry and Sum are obtained as 1.6 and 0.75 ps, respectively. The total footprint of the structure is about 183 µm2. Due to its ultracompact size, low power consumption, fast response time, and simple structure, this all-optical full adder is suitable for use in low-power optical integrated circuits.

Journal ArticleDOI
TL;DR: In this brief, hybrid partial product-based building blocks are proposed by considering the probability distribution of the input operands and an efficient hardware implementation of approximate 4×4 multipliers is achieved, while maintaining the required accuracy.
Abstract: In this brief, hybrid partial product-based building blocks are proposed by considering the probability distribution of the input operands. An efficient hardware implementation of approximate 4x4 multipliers is achieved while maintaining a required accuracy. Moreover, high-performance approximate NOR-based half adder (NxHA) and full adder (NxFA) cells are proposed for use in a 4x4 multiplier. Three different strategies (Ax8-1/2/3) are further proposed and analyzed for utilizing the 4x4 multipliers when designing larger multipliers. Ax8-2 provides the best trade-off among the designs with a moderate MRED. A reduction of 30% and 17% in the MRED is achieved compared to previous best energy-optimized and MRED-optimized designs. Among the designs with higher MREDs, Ax8-3 exhibits the smallest MRED and PDP. Moreover, it shows an improvement of 7% to 28% in delay compared to existing approximate recursive designs. As a case study, image multiplication is evaluated; a high peak signal-to-noise ratio (PSNR) with a value close to 50dB is obtained for the proposed multiplier designs.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a stateful crossbar-compatible XOR operation that requires only one cycle for its completion, which is two times faster than the current minimum required time for performing XOR (which is two cycles) using other atomic operations in comparable memristive stateful logic families.
Abstract: With the fast approach of the end of silicon scaling and existing problems, such as the Von-Neumann bottleneck, alternative computing paradigms are in demand. In-memory computation (IMC) is one of the most promising solutions, and memristive technology is one of the best platforms for that purpose. Many logic families have been proposed to enable memristive IMC, among which stateful logic family stands out due to its minimal power consumption and simplicity. In this work, to complement existing works, we propose the first stateful crossbar-compatible XOR atomic logic operation that requires only one cycle for its completion, which is two times faster than the current minimum required time for performing XOR (which is two cycles) using other atomic operations in comparable memristive stateful logic families. We show that, in an example case of an adder, by taking advantage of the proposed single-cycle in-memristor XOR (SIXOR), up to $4.5\times $ speedup can be achieved compared to other SoA stateful adders. The gained speed-up scales up in more complex systems and calculations that use XOR .

Journal ArticleDOI
TL;DR: To achieve high performance, Multiplexer Based Approximate Full Adders (MBAFA) are proposed in the inaccurate part of the HPETA design, which exhibits high speed, area efficiency, low power consumption, less Area-Delay Product (ADP) and 56.32% lesser Power-Delayed Product (PDP) than the existing conventional CSLA, SAET-CSLA, ETCSLa, HSETA, HSSSA, respectively.
Abstract: In this paper, we proposed High Performance Error Tolerant Adders (HPETA) which have an efficient design and quality metrics for inexact computing applications. To achieve high performance, Multipl...

Journal ArticleDOI
TL;DR: The basic structure for a half-adder circuit is proposed by inducing nonlinear Kerr-material to the Mach-Zehnder interferometers (MZIs) by using MIM plasmonic waveguide-based MZIs in the footprint of 85 μm.

Journal ArticleDOI
TL;DR: In this article, a new semiconductor optical amplifier (SOA)-based module for multi-valued logic units using the cross-polarization modulation effect is proposed and analyzed.
Abstract: In this communication, a new semiconductor optical amplifier (SOA)-based module for multi-valued logic units using the cross-polarization modulation effect is proposed and analyzed. The design is simple and compact, consisting of only three SOAs and a few passive optical elements. SOAs have very low switching power (< 1mW), and are very small (< 1 mm) and integrable into modern optical integrated circuits. Being multifunctional, the design is versatile; it can function as a demultiplexer, comparator, half adder, half subtractor, and as basic (OR, AND), universal (NOR, NAND), XOR, and XNOR logic gates. This design follows a tree architecture, operates at very high speed (~ 100Gbit/s), and provides a good Q factor (30 dB or more). The corresponding bit error rate (BER) is very low (~ 10–24). In this work, a relative eye opening as large as 90.4% is calculated. The variations in Q and BER with noise and control power are also investigated.


Journal ArticleDOI
TL;DR: An alternative approach for the streamlined physical design of quantum-dot cellular automata (QCA) full-adder circuits in which the placement of input cells and wire crossing congestion are substantially reduced.
Abstract: Nowadays, arithmetic computing is an important subject in computer architectures in which the one-bit full-adder gate plays a significant role. Thus, efficient design of such full-adder component can be beneficial to the overall efficiency of the entire system. In this essay, a novel method for the design and simulation of a combined majority gate toward realization of the one-bit full-adder gate is proposed. We inspect an alternative approach for the streamlined physical design of quantum-dot cellular automata (QCA) full-adder circuits in which the placement of input cells and wire crossing congestion are substantially reduced. The proposed method has outstanding characteristics such as low complexity, reduced area consumption, simplified physical design, and ultra-high speed one-bit full-adder. Based on simulation results the proposed design provides 33.33% reduction in area and 20.00% improvement in complexity as well as 10.49% in 1 Ek reduction in power consumption.

Journal ArticleDOI
TL;DR: A low power, area-efficient full adder cell designed with approximate outputs that is applicable in image processing as an error-resilient application and the final outputs of approximation are acceptable in this application due to image quality metrics.

Proceedings ArticleDOI
22 Jun 2021
TL;DR: In this paper, the authors proposed a reliable and efficient approximate multiplier design, that uses optimized lower part constant OR adder (OLOCA) design and hardware optimized approximate adder with normal error distribution (HOAANED) separately as two variants.
Abstract: Approximate computing in general has garnered much needed attention in the design community owing to high power saving benefits, and at the same time quick generation of results. Approximate computing as a design technique continues to offer design advantages which is recently ceased by the ever decreasing technology scaling. Approximate computing is mostly applied to arithmetic designs, that has resulted in significant research interests. The paper proposes a reliable and efficient approximate multiplier design, that uses optimized lower part constant OR adder (OLOCA) design and hardware optimized approximate adder with normal error distribution (HOAANED) separately as two variants. The two approximate multipliers derived from OLOCA adder and HOAANED adder were found to be highly power and footprint efficient, and in addition offers performance improvement over other approximate multipliers. The error characteristics for the proposed multiplier designs were evaluated and compared with the existing approximate multiplier design. The proposed multiplier design along with the existing ones were synthesized using 45 nm CMOS technology and results were analyzed. The proposed approximate multipliers were further explored for canny edge detection application, and results for different standard images were found to be highly acceptable showing 99.9% of outcome similar to exact multiplier design.