Showing papers by "Fabrizio Lombardi published in 2019"

PDF

Open Access

Journal Article•DOI•

Low-Power Approximate Unsigned Multipliers With Configurable Error Recovery

[...]

Honglan Jiang¹, Cong Liu¹, Fabrizio Lombardi², Jie Han¹•Institutions (2)

University of Alberta¹, Northeastern University²

01 Jan 2019-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: A novel approximate multiplier with a low power consumption and a short critical path is proposed for high-performance DSP applications that leverages a newly designed approximate adder that limits its carry propagation to the nearest neighbors for fast partial product accumulation.

...read moreread less

Abstract: Approximate circuits have been considered for applications that can tolerate some loss of accuracy with improved performance and/or energy efficiency. Multipliers are key arithmetic circuits in many of these applications including digital signal processing (DSP). In this paper, a novel approximate multiplier with a low power consumption and a short critical path is proposed for high-performance DSP applications. This multiplier leverages a newly designed approximate adder that limits its carry propagation to the nearest neighbors for fast partial product accumulation. Different levels of accuracy can be achieved by using either OR gates or the proposed approximate adder in a configurable error recovery circuit. The approximate multipliers using these two error reduction strategies are referred to as AM1 and AM2, respectively. Both AM1 and AM2 have a low mean error distance, i.e., most of the errors are not significant in magnitude. Compared with a Wallace multiplier optimized for speed, an $8\times 8$ AM1 using four most significant bits for error reduction shows a 60% reduction in delay (when optimized for delay) and a 42% reduction in power dissipation (when optimized for area). In a $16\times 16$ design, half of the least significant partial products are truncated for AM1 and AM2, which are thus denoted as TAM1 and TAM2, respectively. Compared with the Wallace multiplier, TAM1 and TAM2 save from 50% to 66% in power, when optimized for area. Compared with existing approximate multipliers, AM1, AM2, TAM1, and TAM2 show significant advantages in accuracy with a low power-delay product. AM2 has a better accuracy compared with AM1 but with a longer delay and higher power consumption. Image processing applications, including image sharpening and smoothing, are considered to show the quality of the approximate multipliers in error-tolerant applications. By utilizing an appropriate error recovery scheme, the proposed approximate multipliers achieve similar processing accuracy as exact multipliers, but with significant improvements in power.

...read moreread less

71 citations

Journal Article•DOI•

Design and Analysis of Approximate Redundant Binary Multipliers

[...]

Weiqiang Liu¹, Cao Tian¹, Peipei Yin¹, Yuying Zhu¹, Chenghua Wang¹, Earl E. Swartzlander², Fabrizio Lombardi³ - Show less +3 more•Institutions (3)

Nanjing University of Aeronautics and Astronautics¹, University of Texas at Austin², Northeastern University³

01 Jun 2019-IEEE Transactions on Computers

TL;DR: The proposed approximate RB multiplier designs are compared with previous approximate Booth multipliers; the results show that the approximate RB multipliers are better than approximate NB Boothmultipliers especially when the word size is large.

...read moreread less

Abstract: As technology scaling is reaching its limits, new approaches have been proposed for computional efficiency. Approximate computing is a promising technique for high performance and low power circuits as used in error-tolerant applications. Among approximate circuits, approximate arithmetic designs have attracted significant research interest. In this paper, the design of approximate redundant binary (RB) multipliers is studied. Two approximate Booth encoders and two RB 4:2 compressors based on RB (full and half) adders are proposed for the RB multipliers. The approximate design of the RB-Normal Binary (NB) converter in the RB multiplier is also studied by considering the error characteristics of both the approximate Booth encoders and the RB compressors. Both approximate and exact regular partial product arrays are used in the approximate RB multipliers to meet different accuracy requirements. Error analysis and hardware simulation results are provided. The proposed approximate RB multipliers are compared with previous approximate Booth multipliers; the results show that the approximate RB multipliers are better than approximate NB Booth multipliers especially when the word size is large. Case studies of error-resilient applications are also presented to show the validity of the proposed designs.

...read moreread less

59 citations

Journal Article•DOI•

Approximate Designs for Fast Fourier Transform (FFT) With Application to Speech Recognition

[...]

Weiqiang Liu¹, Liao Qicong¹, Fei Qiao², Weijie Xia¹, Chenghua Wang¹, Fabrizio Lombardi³ - Show less +2 more•Institutions (3)

Nanjing University of Aeronautics and Astronautics¹, Tsinghua University², Northeastern University³

23 Aug 2019-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: This paper presents different approximate designs for computing the FFT, where the tradeoff between accuracy and performance is achieved by adjusting the word length in each computational stage by two algorithms for word length modification under a specific error margin.

...read moreread less

Abstract: This paper presents different approximate designs for computing the FFT. The tradeoff between accuracy and performance is achieved by adjusting the word length in each computational stage. Two algorithms for word length modification under a specific error margin are proposed. The first algorithm targets an approximate FFT for an area-limited design compared to the conventional fixed design; the second algorithm targets performance so it achieves a higher operating frequency. Both of the proposed algorithms show that an efficient balance between hardware utilization and performance is possible at stage-level. The proposed approximate FFT designs are implemented on FPGA; experimental results show that hardware utilization using the first approximate algorithm are reduced by at least nearly 40%. The second algorithm increases performance of the designs by over 20%. Fine granularity design is also investigated, where the FPGA resources for a 256-point FFT computation can be further reduced by nearly 10% compared to a coarse design. Finally, the proposed approximate designs are applied to a feature extraction module in an isolated word recognition system; the numbers of LUTs and FFs for the Mel frequency cepstrum coefficients (MFCC) extraction module are decreased by up to 47.2% and 39.0%, respectively with a power reduction of up to 27.0% at a loss in accuracy of less than 2%.

...read moreread less

42 citations

Journal Article•DOI•

XOR-Based Low-Cost Reconfigurable PUFs for IoT Security

[...]

Weiqiang Liu¹, Lei Zhang¹, Zhengran Zhang¹, Chongyan Gu², Chenghua Wang¹, Maire O'Neill², Fabrizio Lombardi³ - Show less +3 more•Institutions (3)

Nanjing University of Aeronautics and Astronautics¹, Queen's University Belfast², Northeastern University³

02 Apr 2019-ACM Transactions in Embedded Computing Systems

TL;DR: The proposed XRRO and XRBR PUFs are very efficient designs with good uniqueness and reliability and require only 12.5% of the hardware resources of previous bitstable ring PUFs and reconfigurable RO PUFs, respectively, to generate a 1-bit response.

...read moreread less

Abstract: With the rapid development of the Internet of Things (IoT), security has attracted considerable interest. Conventional security solutions that have been proposed for the Internet based on classical cryptography cannot be applied to IoT nodes as they are typically resource-constrained. A physical unclonable function (PUF) is a hardware-based security primitive and can be used to generate a key online or uniquely identify an integrated circuit (IC) by extracting its internal random differences using so-called challenge-response pairs (CRPs). It is regarded as a promising low-cost solution for IoT security. A logic reconfigurable PUF (RPUF) is highly efficient in terms of hardware cost. This article first presents a new classification for RPUFs, namely circuit-based RPUF (C-RPUF) and algorithm-based RPUF (A-RPUF); two Exclusive OR (XOR)-based RPUF circuits (an XOR-based reconfigurable bistable ring PUF (XRBR PUF) and an XOR-based reconfigurable ring oscillator PUF (XRRO PUF)) are proposed. Both the XRBR and XRRO PUFs are implemented on Xilinx Spartan-6 field-programmable gate arrays (FPGAs). The implementation results are compared with previous PUF designs and show good uniqueness and reliability. Compared to conventional PUF designs, the most significant advantage of the proposed designs is that they are highly efficient in terms of hardware cost. Moreover, the XRRO PUF is the most efficient design when compared with previous RPUFs. Also, both the proposed XRRO and XRBR PUFs require only 12.5% of the hardware resources of previous bitstable ring PUFs and reconfigurable RO PUFs, respectively, to generate a 1-bit response. This confirms that the proposed XRBR and XRRO PUFs are very efficient designs with good uniqueness and reliability.

...read moreread less

41 citations

Journal Article•DOI•

A Flip-Flop Based Arbiter Physical Unclonable Function (APUF) Design with High Entropy and Uniqueness for FPGA Implementation

[...]

Chongyan Gu¹, Weiqiang Liu², Yijun Cui², Neil Hanley¹, Maire O'Neill¹, Fabrizio Lombardi - Show less +2 more•Institutions (2)

Queen's University Belfast¹, Nanjing University of Aeronautics and Astronautics²

05 Sep 2019-IEEE Transactions on Emerging Topics in Computing

TL;DR: Initial tests show that to attack the proposed FF-APUF design requires more effort for the adversary than a conventional APUF design, and the empirical min-entropy of the FF-apUF design across different devices is shown to be more than twice that of the conventional APF design.

...read moreread less

Abstract: The PUF is a physical security primitive that permits to extract intrinsic digital identifiers from electronic devices. As low-cost nature PUF is a promising candidate to meet security in lightweight devices for IoT application. The Arbiter PUF or APUF has been widely studied in the technical literature. However it often suffers from disadvantages such as poor uniqueness and reliability, particularly when implemented on FPGAs due to features such as physical layout restrictions. To address these problems, a new design known as the FF-APUF has been proposed; it offers a compact architecture, combined with good uniqueness and reliability, as well as suitable for FPGA implementation. Many PUF designs have been shown to be vulnerable to ML based modeling attacks. In this paper, it is initially shown that the FF-APUF design requires more efforts than a conventional APUF design for the adversary to attack. A comprehensive analysis of the experimental results for the FF-APUF design is also presented. An improved APUF design with a balanced arbiter and a FF-APUF design are proposed and implemented on the Xilinx Artix-7 FPGA at 28 nm technology. The experimental min-entropy of the FF-APUF design across different devices is more than twice of a conventional APUF design.

...read moreread less

40 citations

Journal Article•DOI•

A High-Performance and Energy-Efficient FIR Adaptive Filter Using Approximate Distributed Arithmetic Circuits

[...]

Honglan Jiang¹, Leibo Liu², Pieter Jonker³, Duncan G. Elliott¹, Fabrizio Lombardi⁴, Jie Han¹ - Show less +2 more•Institutions (4)

University of Alberta¹, Tsinghua University², Delft University of Technology³, Northeastern University⁴

01 Jan 2019-IEEE Transactions on Circuits and Systems I-regular Papers

TL;DR: The radix-8 Booth algorithm is used to reduce the number of partial products in the DA architecture, although no multiplication is explicitly performed, and the proposed design achieves 45%–61% lower EPO compared with the DLMS design.

...read moreread less

Abstract: In this paper, a fixed-point finite impulse response adaptive filter is proposed using approximate distributed arithmetic (DA) circuits. In this design, the radix-8 Booth algorithm is used to reduce the number of partial products in the DA architecture, although no multiplication is explicitly performed. In addition, the partial products are approximately generated by truncating the input data with an error compensation. To further reduce hardware costs, an approximate Wallace tree is considered for the accumulation of partial products. As a result, the delay, area, and power consumption of the proposed design are significantly reduced. The application of system identification using a 48-tap bandpass filter and a 103-tap high-pass filter shows that the approximate design achieves a similar accuracy as its accurate counterpart. Compared with the state-of-the-art adaptive filter using bit-level pruning in the adder tree (referred to as the delayed least mean square (DLMS) design), it has a lower steady-state mean squared error and a smaller normalized misalignment. Synthesis results show that the proposed design attains on average a 55% reduction in energy per operation (EPO) and a $3.2\times $ throughput per area compared with an accurate design. Moreover, the proposed design achieves 45%–61% lower EPO compared with the DLMS design. A saccadic system using the proposed approximate adaptive filter-based cerebellar model achieves a similar retinal slip as using an accurate filter. These results are promising for the large-scale integration of approximate circuits into high-performance and energy-efficient systems for error-resilient applications.

...read moreread less

38 citations

Journal Article•DOI•

An Energy-Efficient and Noise-Tolerant Recurrent Neural Network Using Stochastic Computing

[...]

Yidong Liu¹, Leibo Liu², Fabrizio Lombardi³, Jie Han¹•Institutions (3)

University of Alberta¹, Tsinghua University², Northeastern University³

26 Jun 2019-IEEE Transactions on Very Large Scale Integration Systems

TL;DR: In this SC-RNN, a hybrid structure is developed by utilizing SC designs and binary circuits to improve the hardware efficiency without significant loss of accuracy and achieves a higher noise tolerance compared to binary implementations.

...read moreread less

Abstract: Recurrent neural networks (RNNs) are widely used to solve a large class of recognition problems, including prediction, machine translation, and speech recognition. The hardware implementation of RNNs is, however, challenging due to the high area and energy consumption of these networks. Recently, stochastic computing (SC) has been considered for implementing neural networks and reducing the hardware consumption. In this paper, we propose an energy-efficient and noise-tolerant long short-term memory-based RNN using SC. In this SC-RNN, a hybrid structure is developed by utilizing SC designs and binary circuits to improve the hardware efficiency without significant loss of accuracy. The area and energy consumption of the proposed design are between 1.6%–2.3% and 6.5%–11.2%, respectively, of a 32-bit floating-point (FP) implementation. The SC-RNN requires significantly smaller area and lower energy consumption in most cases compared to an 8-bit fixed point implementation. The proposed design achieves a higher noise tolerance compared to binary implementations. The inference accuracy is from 10% to 13% higher than an FP design when the noise level is high in the computation process.

...read moreread less

23 citations