scispace - formally typeset
Search or ask a question
Author

Peipei Yin

Other affiliations: Nanjing University
Bio: Peipei Yin is an academic researcher from Nanjing University of Aeronautics and Astronautics. The author has contributed to research in topics: Multiplication & Multiplier (economics). The author has an hindex of 4, co-authored 7 publications receiving 60 citations. Previous affiliations of Peipei Yin include Nanjing University.

Papers
More filters
Journal ArticleDOI
TL;DR: The proposed approximate RB multiplier designs are compared with previous approximate Booth multipliers; the results show that the approximate RB multipliers are better than approximate NB Boothmultipliers especially when the word size is large.
Abstract: As technology scaling is reaching its limits, new approaches have been proposed for computional efficiency. Approximate computing is a promising technique for high performance and low power circuits as used in error-tolerant applications. Among approximate circuits, approximate arithmetic designs have attracted significant research interest. In this paper, the design of approximate redundant binary (RB) multipliers is studied. Two approximate Booth encoders and two RB 4:2 compressors based on RB (full and half) adders are proposed for the RB multipliers. The approximate design of the RB-Normal Binary (NB) converter in the RB multiplier is also studied by considering the error characteristics of both the approximate Booth encoders and the RB compressors. Both approximate and exact regular partial product arrays are used in the approximate RB multipliers to meet different accuracy requirements. Error analysis and hardware simulation results are provided. The proposed approximate RB multipliers are compared with previous approximate Booth multipliers; the results show that the approximate RB multipliers are better than approximate NB Booth multipliers especially when the word size is large. Case studies of error-resilient applications are also presented to show the validity of the proposed designs.

59 citations

Journal ArticleDOI
25 Jun 2020
TL;DR: In this article, dynamic range approximate LMs (DR-ALMs) for machine learning applications are proposed; they use Mitchell’s approximation and a dynamic range operand truncation scheme and the results show that the power-delay product and the mean relative error distance of the best proposed DR-ALM are decreased.
Abstract: Approximate computing provides an emerging approach to design high performance and low power arithmetic circuits The logarithmic multiplier (LM) converts multiplication into addition and has inherent approximate characteristics In this paper, dynamic range approximate LMs (DR-ALMs) for machine learning applications are proposed; they use Mitchell's approximation and a dynamic range operand truncation scheme The worst case errors for the proposed DR-ALMs are analyzed The accuracy and the hardware overhead of these designs are provided to select the best approximate scheme according to differentmetrics The proposed DR-ALMs are compared with the conventional LM with exact operands and previous approximate multipliers; the results show that the power-delay product (PDP) of the best proposed DR-ALM (DR-ALM-6) are decreased by up to 5407% with the MRED decreasing 2130% compared with 16-bit conventional design Case studies for three applications show the viability of the proposed DR-ALMs Compared with the exact multiplier and its conventional counterpart, the back-propagation classifier with DR-ALMs in the truncation length larger than 4 has a similar classification result; the K-means clustering with all DR-ALMs has a similar clustering result; and the handwritten digit recognition with DR-ALM-5 or DR-ALM-6 for LeNet-5 remains similar or even slightly higher recognition rate

29 citations

Proceedings ArticleDOI
01 Jul 2016
TL;DR: It is shown that the proposed approximate single precision FP multiplier design reduces power consumption, area and delay by up to 61, 55, and 49% respectively compared with its exact counterpart while incurring in a moderate error.
Abstract: Approximate/inexact computing has become an attractive approach for designing high performance and low power arithmetic circuits. Floating-point (FP) arithmetic is required in many applications, such as digital signal processing and machine learning. Different approximate FP multipliers are proposed in this paper, the accuracy and the circuit requirements of these designs are assessed to select the best approximate scheme as according to different metrics. It is shown that the proposed approximate single precision FP multiplier design reduces power consumption, area and delay by up to 61%, 55%, and 49% respectively compared with its exact counterpart while incurring in a moderate error, moreover this paper shows that the so-called IFPM24-15 multiplier is the most efficient design in terms of PDP and NMED compared with previous inexact FP multipliers. High dynamic range (HDR) images are processed using the proposed approximate FP multipliers to show the validity of the approximate design.

19 citations

Journal ArticleDOI
TL;DR: It is shown that the proposed approximate FLP multiplier designs further reduce delay, area, power consumption and power-delay product (PDP) while incurring about half of the normalized mean error distance (NMED) compared with the previous designs.
Abstract: Approximate/inexact computing has become an attractive approach for designing high performance and low power arithmetic circuits. Floating-point (FLP) arithmetic is required in many applications, such as digital signal processing, image processing and machine learning. Approximate FLP multipliers with variable accuracy are proposed in this paper; the accuracy and the circuit requirements of these designs are analyzed and assessed according to different metrics. It is shown that the proposed approximate FLP multiplier designs further reduce delay, area, power consumption and power-delay product (PDP) while incurring about half of the normalized mean error distance (NMED) compared with the previous designs. The proposed IFLPM24–15 is the most efficient design when considering both PDP and NMED. Case studies with three error-tolerant applications show the validity of the proposed approximate designs.

9 citations

Proceedings ArticleDOI
01 Jul 2021
TL;DR: In this paper, four Radix-4 Booth multipliers with different approximation levels are proposed to reduce the hardware complexity of the complex multiplication unit of the Fast Fourier Transform (FFT).
Abstract: The Fast Fourier Transform (FFT) is an efficient algorithm to calculate the Discrete Fourier Transform (DFT), which is often employed in Digital Signal Processing (DSP) and communication. In FFT, complex multiplication and addition units in the butterfly module consume most of the hardware resources. Compared to the addition operation, multiplication is more complicated. In this paper, the multiplier in the complex multiplication unit of the FFT is approximated. Four Radix-4 Booth multipliers with different approximation levels are proposed to reduce the hardware complexity. The pipeline FFT and the parallel FFT based on the proposed approximate multipliers are implemented and extensively evaluated. Compared with the state-of-the-art FFT designs, the LUTs amount is reduced up to 20.3% and 29.1% for pipeline and parallel FFTs, respectively. The power is reduced up to 69.9% for pipeline FFT, and the delay is reduced up to 45.7%. Moreover, the PSNR is reduced by less than 1dB in both pipeline FFT and parallel FFT. Proved by experiment results, the overall performance of the proposed designs is better than FFT designs using other approximate multipliers.

8 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This study highlights that there is no unique winning approximate compressor topology since the best solution depends on the required precision, on the signedness of the multiplier and on the considered error metric.
Abstract: Approximate multipliers attract a large interest in the scientific literature that proposes several circuits built with approximate 4-2 compressors. Due to the large number of proposed solutions, the designer who wishes to use an approximate 4-2 compressor is faced with the problem of selecting the right topology. In this paper, we present a comprehensive survey and comparison of approximate 4-2 compressors previously proposed in literature. We present also a novel approximate compressor, so that a total of twelve different approximate 4-2 compressors are analyzed. The investigated circuits are employed to design $8\times 8$ and $16\times 16$ multipliers, implemented in 28nm CMOS technology. For each operand size we analyze two multiplier configurations, with different levels of approximations, both signed and unsigned. Our study highlights that there is no unique winning approximate compressor topology since the best solution depends on the required precision, on the signedness of the multiplier and on the considered error metric.

126 citations

Proceedings ArticleDOI
18 Jun 2017
TL;DR: This paper proposes a novel approximate floating point multiplier, called CFPU, which significantly reduces energy and improves performance of multiplication at the expense of accuracy, and shows that it can outperforms a standard FPU when at least 4% of multiplications are performed in approximate mode.
Abstract: Many applications, such as machine learning and data sensing are statistical in nature and can tolerate some level of inaccuracy in their computation. Approximate computation is a viable method to save energy and increase performance by trading energy for accuracy. There are a number of proposed approximate solutions, however, they are limited to a small range of applications because they cannot control the error rate of their output. In this paper, we propose a novel approximate floating point multiplier, called CFPU, which significantly reduces energy and improves performance of multiplication at the expense of accuracy. Our design approximately models multiplication by replacing the most costly step of the operation with a lower energy alternative. In order to tune the level of approximation, CFPU dynamically identifies the inputs which will produce the largest approximation error and processes them in precise CFPU mode. We showed that our CFPU can outperforms a standard FPU when at least 4% of multiplications are performed in approximate mode. In our tested applications this percentage of multiplications is substantially higher, leading to significant energy savings. Our experimental evaluation on AMD Southern Island GPU shows that replacing the proposed CFPU with traditional FPUs results in 77% energy savings and 3.5× energy-delay product improvement over eight general OpenCL applications while providing acceptable quality of service. In addition, for the same level of accuracy, the CFPU provides 2.4× energy-delay product improvement compared to state-of-the-art approximate multipliers.

70 citations

Journal ArticleDOI
18 Aug 2020
TL;DR: This work reviews methods devised to synthesize approximate circuits, given their exact functionality and an approximability threshold, and summarizes strategies for evaluating the error that circuit simplification can induce on the output, which guides synthesis techniques in choosing the circuit transformations that lead to a given amount of induced error.
Abstract: Approximate computing is an emerging paradigm that, by relaxing the requirement for full accuracy, offers benefits in terms of design area and power consumption This paradigm is particularly attractive in applications where the underlying computation has inherent resilience to small errors Such applications are abundant in many domains, including machine learning, computer vision, and signal processing In circuit design, a major challenge is the capability to synthesize the approximate circuits automatically without manually relying on the expertise of designers In this work, we review methods devised to synthesize approximate circuits, given their exact functionality and an approximability threshold We summarize strategies for evaluating the error that circuit simplification can induce on the output, which guides synthesis techniques in choosing the circuit transformations that lead to the largest benefit for a given amount of induced error We then review circuit simplification methods that operate at the gate or Boolean level, including those that leverage classical Boolean synthesis techniques to realize the approximations We also summarize strategies that take high-level descriptions, such as C or behavioral Verilog, and synthesize approximate circuits from these descriptions

43 citations

Journal ArticleDOI
TL;DR: This paper presents different approximate designs for computing the FFT, where the tradeoff between accuracy and performance is achieved by adjusting the word length in each computational stage by two algorithms for word length modification under a specific error margin.
Abstract: This paper presents different approximate designs for computing the FFT. The tradeoff between accuracy and performance is achieved by adjusting the word length in each computational stage. Two algorithms for word length modification under a specific error margin are proposed. The first algorithm targets an approximate FFT for an area-limited design compared to the conventional fixed design; the second algorithm targets performance so it achieves a higher operating frequency. Both of the proposed algorithms show that an efficient balance between hardware utilization and performance is possible at stage-level. The proposed approximate FFT designs are implemented on FPGA; experimental results show that hardware utilization using the first approximate algorithm are reduced by at least nearly 40%. The second algorithm increases performance of the designs by over 20%. Fine granularity design is also investigated, where the FPGA resources for a 256-point FFT computation can be further reduced by nearly 10% compared to a coarse design. Finally, the proposed approximate designs are applied to a feature extraction module in an isolated word recognition system; the numbers of LUTs and FFs for the Mel frequency cepstrum coefficients (MFCC) extraction module are decreased by up to 47.2% and 39.0%, respectively with a power reduction of up to 27.0% at a loss in accuracy of less than 2%.

42 citations

Journal ArticleDOI
TL;DR: This paper presents approximate multipliers which are efficiently deployed on Field Programmable Gate Arrays (FPGAs) by using newly proposed approximate logic compressors at different levels of accuracy and are expected to be appropriate with high-performance and low-power error-resilient applications.
Abstract: This paper presents approximate multipliers which are efficiently deployed on Field Programmable Gate Arrays (FPGAs) by using newly proposed approximate logic compressors at different levels of accuracy. Our approximate multiplier designs offer higher gains of power-delay-area products (PDAP) than those of the state-of-the-art works at comparable accuracies. Furthermore, in terms of delay, occupied area, and dynamic power dissipation, our designs are much better than Lookup Table based multiplier Intellectual Properties that are available on an FPGA. Particularly, our proposed 8-, 16-, and 32-bit multipliers can deliver PDAP gains up to 7.1 x, 8.3 x, and 5.0 x, respectively. The effectiveness and applicability of our designs are also demonstrated by image processing applications such as image multiplication and sharpening. The experiments show that for the image sharpening, our 8 x 8 multipliers can deliver a good peak signal-to-noise ratio (PSNR) of 46.81 dB, a structural similarity index metric (SSIM) of 0.9989, and a dynamic power saving of up to 36.7% with regard to the exact multiplier. For the image multiplication, approximate 16 x 16 multipliers can offer a high PSNR of 80.25 dB, an SSIM of 1.0, and a dynamic power saving of up to 58.15%. From these demonstrations, the proposed multipliers are expected to be appropriate with high-performance and low-power error-resilient applications.

36 citations