scispace - formally typeset
Search or ask a question
Book

Vlsi Digital Signal Processing Systems: Design And Implementation

01 Jan 2007-
TL;DR: This book discusses Digital Signal Processing Systems, Pipelining and Parallel Processing, Synchronous, Wave, and Asynchronous Pipelines, and Bit-Level Arithmetic Architectures.
Abstract: Introduction to Digital Signal Processing Systems. Iteration Bound. Pipelining and Parallel Processing. Retiming. Unfolding. Folding. Systolic Architecture Design. Fast Convolution. Algorithmic Strength Reduction in Filters and Transforms. Pipelined and Parallel Recursive and Adaptive Filters. Scaling and Roundoff Noise. Digital Lattice Filter Structures. Bit-Level Arithmetic Architectures. Redundant Arithmetic. Numerical Strength Reduction. Synchronous, Wave, and Asynchronous Pipelines. Low-Power Design. Programmable Digital Signal Processors. Appendices. Index.

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
TL;DR: This paper proposes logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy, and demonstrates the utility of these approximate adders in two digital signal processing architectures with specific quality constraints.
Abstract: Low power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, human beings can gather useful information from slightly erroneous outputs. Therefore, we do not need to produce exactly correct numerical outputs. Previous research in this context exploits error resiliency primarily through voltage overscaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate full adder cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units and evaluate them to demonstrate the efficacy of our approach. We also derive simple mathematical models for error and power consumption of these approximate adders. Furthermore, we demonstrate the utility of these approximate adders in two digital signal processing architectures (discrete cosine transform and finite impulse response filter) with specific quality constraints. Simulation results indicate up to 69% power savings using the proposed approximate adders, when compared to existing implementations using accurate adders.

637 citations

Journal ArticleDOI
20 Mar 2020
TL;DR: This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.
Abstract: Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to the foreseeable end of Moore’s Law. Machine learning, especially deep neural networks (DNNs), has become the most dazzling domain witnessing successful applications in a wide spectrum of artificial intelligence (AI) tasks. The incomparable accuracy of DNNs is achieved by paying the cost of hungry memory consumption and high computational complexity, which greatly impedes their deployment in embedded systems. Therefore, the DNN compression concept was naturally proposed and widely used for memory saving and compute acceleration. In the past few years, a tremendous number of compression techniques have sprung up to pursue a satisfactory tradeoff between processing efficiency and application accuracy. Recently, this wave has spread to the design of neural network accelerators for gaining extremely high performance. However, the amount of related works is incredibly huge and the reported approaches are quite divergent. This research chaos motivates us to provide a comprehensive survey on the recent advances toward the goal of efficient compression and execution of DNNs without significantly compromising accuracy, involving both the high-level algorithms and their applications in hardware design. In this article, we review the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification. We explain their compression principles, evaluation metrics, sensitivity analysis, and joint-way use. Then, we answer the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. In the end, we discuss several existing issues such as fair comparison, testing workloads, automatic compression, influence on security, and framework/hardware-level support, and give promising topics in this field and the possible challenges as well. This article attempts to enable readers to quickly build up a big picture of neural network compression and acceleration, clearly evaluate various methods, and confidently get started in the right way.

499 citations


Cites background from "Vlsi Digital Signal Processing Syst..."

  • ...The above quantization technique looks like the function of analog-to-digital converters (ADCs) in signal processing systems [141]....

    [...]

Proceedings ArticleDOI
01 Aug 2011
TL;DR: This paper proposes logic complexity reduction as an alternative approach to take advantage of the relaxation of numerical accuracy, and demonstrates this concept by proposing various imprecise or approximate Full Adder cells with reduced complexity at the transistor level, and utilizing them to design approximate multi-bit adders.
Abstract: Low-power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, the final output is interpreted by human senses, which are not perfect. This fact obviates the need to produce exactly correct numerical outputs. Previous research in this context exploits error-resiliency primarily through voltage over-scaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate Full Adder (FA) cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units, and evaluate them to demonstrate the efficacy of our approach. Post-layout simulations indicate power savings of up to 60% and area savings of up to 37% with an insignificant loss in output quality, when compared to existing implementations.

386 citations


Cites methods from "Vlsi Digital Signal Processing Syst..."

  • ...One-dimensional integer DCT y(k) for an 8-point sequence x(i) is given by [16]...

    [...]

Journal ArticleDOI
TL;DR: In this article, the phase estimation methods are numerically modeled: the maximum a posteriori (MAP) phase estimate, decision directed estimate, power law-Wiener filter estimate and power law PLL estimate.
Abstract: The advent of digital signal processing (DSP) to optical coherent detection means that more phase estimation options are available, compared to the earlier generation where phase-locked loops (PLLs) were invariably deployed in synchronous coherent receivers. Several phase estimation methods are numerically modeled: the maximum a posteriori (MAP) phase estimate, decision directed estimate, power law-Wiener filter estimate and power law-PLL estimate. An asynchronous coherent detection case is also modeled. The phase estimates are evaluated with respect to their tolerance of finite laser linewidth and their suitability for implementation in a parallel digital processor. Laser phase noise causes transmission system performance to be degraded by excess bit errors and cycle slips. The optimal phase estimate is the MAP estimate, and it is also included as a baseline. The power law-Wiener filter phase estimate is found to perform only marginally worse than the MAP estimate. It must be recast using a look-ahead computation to be implemented in a parallel digital processor, and the impact is investigated of the increase in the number of computations required. Differential logical detection is often used to reduce the impact of cycle slip events, and the implications of this operation on the bit error rate are studied. It is found that by choosing the correct FEC scheme differential logical detection does not increase the Q-factor penalty.

289 citations


Cites background or methods from "Vlsi Digital Signal Processing Syst..."

  • ...For the case where , further computation savings are available by writing the sum in the numerator of (15) as a power-of-2 decomposition [27]....

    [...]

  • ...A resolution to this issue is found by recasting the algorithms using a look-ahead computation [27]....

    [...]

  • ...The amount of resources can be reduced by using incremental block processing [27]....

    [...]

  • ...There are methods of adapting algorithms for VLSI processors that leave the signal processing behavior unchanged [27], and one of these methods will be used....

    [...]

Journal ArticleDOI
TL;DR: An efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform that can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement by flipping conventional lifting structures.
Abstract: In this paper, an efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform. It can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement of the lifting-based discrete wavelet transform by flipping conventional lifting structures. The precision issues are also analyzed. By case studies of the JPEG2000 default lossy (9,7) filter, an integer (9,7) filter, and the (6,10) filter, the efficiency of the proposed flipping structure is demonstrated.

221 citations

References
More filters
Journal ArticleDOI
TL;DR: A general overview of VLSI array processors and a unified treatment from algorithm, architecture, and application perspectives is provided in this article, where a broad range of application domains including digital filtering, spectrum estimation, adaptive array processing, image/vision processing, and seismic and tomographic signal processing.
Abstract: High speed signal processing depends critically on parallel processor technology. In most applications, general-purpose parallel computers cannot offer satisfactory real-time processing speed due to severe system overhead. Therefore, for real-time digital signal processing (DSP) systems, special-purpose array processors have become the only appealing alternative. In designing or using such array Processors, most signal processing algorithms share the critical attributes of regularity, recursiveness, and local communication. These properties are effectively exploited in innovative systolic and wavefront array processors. These arrays maximize the strength of very large scale integration (VLSI) in terms of intensive and pipelined computing, and yet circumvent its main limitation on communication. The application domain of such array processors covers a very broad range, including digital filtering, spectrum estimation, adaptive array processing, image/vision processing, and seismic and tomographic signal processing, This article provides a general overview of VLSI array processors and a unified treatment from algorithm, architecture, and application perspectives.

1,633 citations

ReportDOI
11 Jan 1982
TL;DR: Detailed design of the Arithmetic Processor Unit (APU) chip has been completed and all cell types have been run through the design rule check (DRC) programs, corrected and verified.
Abstract: : Detail design of the Arithmetic Processor Unit (APU) chip has been completed. All cell types (100) have been run through the design rule check (DRC) programs, corrected and verified. DRC runs on the entire chip have been run and all corrections have been made. Fifteen out of eighteen of the chip DRC corrections have been verified. The metal, polysilicon and information data layers of the APU layout is shown. The attached drawings, titled 'VLSI Array Processor Arithmetic Processor Unit Chip Plan' is a detail drawing of the APU chip Plan. The functional level simulator of the APU has been built and verified using a set of APU diagnostic code. A gate level logic simulation of the APU has been built. The APU breadboard modules have been fabricated and check out has been initiated. The Array Processor Demonstration System (APDS) modules are in the wire-wrap process. The APDS and APU microcode assembler have been built and checked out. The linker and loader for the APDS have also been built.

46 citations

Book
30 Jun 2001
TL;DR: This paper presents a framework for Algorithmic and Architectural Transformations for Multiplication-Free Linear Transforms and some examples of how this framework has been applied to DSP implementation.
Abstract: List of Figures. List of Tables. Foreword. Acknowledgments. Preface. 1. Introduction. 2. Programmable DSP Based Implementation. 3. Implementation Using Hardware Multiplier(s) and Adder(s). 4. Distributed Arithmetic Based Implementation. 5. Multiplier-Less Implementation. 6. Implementation of Multiplication-Free Linear Transforms. 7. Residue Number System Based Implementation. 8. A Framework for Algorithmic and Architectural Transformations. 9. Summary. References. Topic Index. About the Authors. Index.

11 citations