Vlsi Digital Signal Processing Systems: Design And Implementation

Home
/
Papers
/
Vlsi Digital Signal Processing Systems: Design And Implementation

Book•

Vlsi Digital Signal Processing Systems: Design And Implementation

01 Jan 2007-

TL;DR: This book discusses Digital Signal Processing Systems, Pipelining and Parallel Processing, Synchronous, Wave, and Asynchronous Pipelines, and Bit-Level Arithmetic Architectures.

read less

Abstract: Introduction to Digital Signal Processing Systems. Iteration Bound. Pipelining and Parallel Processing. Retiming. Unfolding. Folding. Systolic Architecture Design. Fast Convolution. Algorithmic Strength Reduction in Filters and Transforms. Pipelined and Parallel Recursive and Adaptive Filters. Scaling and Roundoff Noise. Digital Lattice Filter Structures. Bit-Level Arithmetic Architectures. Redundant Arithmetic. Numerical Strength Reduction. Synchronous, Wave, and Asynchronous Pipelines. Low-Power Design. Programmable Digital Signal Processors. Appendices. Index.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Low-Power Digital Signal Processing Using Approximate Adders

[...]

Vaibhav Kumar Gupta¹, Debabrata Mohapatra², Anand Raghunathan¹, Kaushik Roy¹•Institutions (2)

Purdue University¹, Intel²

01 Jan 2013-IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems

TL;DR: This paper proposes logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy, and demonstrates the utility of these approximate adders in two digital signal processing architectures with specific quality constraints.

...read moreread less

Abstract: Low power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, human beings can gather useful information from slightly erroneous outputs. Therefore, we do not need to produce exactly correct numerical outputs. Previous research in this context exploits error resiliency primarily through voltage overscaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction at the transistor level as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate full adder cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units and evaluate them to demonstrate the efficacy of our approach. We also derive simple mathematical models for error and power consumption of these approximate adders. Furthermore, we demonstrate the utility of these approximate adders in two digital signal processing architectures (discrete cosine transform and finite impulse response filter) with specific quality constraints. Simulation results indicate up to 69% power savings using the proposed approximate adders, when compared to existing implementations using accurate adders.

...read moreread less

637 citations

Journal Article•DOI•

Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey

[...]

Lei Deng¹, Guoqi Li¹, Song Han², Luping Shi¹, Yuan Xie³ - Show less +1 more•Institutions (3)

Tsinghua University¹, Massachusetts Institute of Technology², University of California, Santa Barbara³

20 Mar 2020

TL;DR: This article reviews the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification, and answers the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures.

...read moreread less

Abstract: Domain-specific hardware is becoming a promising topic in the backdrop of improvement slow down for general-purpose processors due to the foreseeable end of Moore’s Law. Machine learning, especially deep neural networks (DNNs), has become the most dazzling domain witnessing successful applications in a wide spectrum of artificial intelligence (AI) tasks. The incomparable accuracy of DNNs is achieved by paying the cost of hungry memory consumption and high computational complexity, which greatly impedes their deployment in embedded systems. Therefore, the DNN compression concept was naturally proposed and widely used for memory saving and compute acceleration. In the past few years, a tremendous number of compression techniques have sprung up to pursue a satisfactory tradeoff between processing efficiency and application accuracy. Recently, this wave has spread to the design of neural network accelerators for gaining extremely high performance. However, the amount of related works is incredibly huge and the reported approaches are quite divergent. This research chaos motivates us to provide a comprehensive survey on the recent advances toward the goal of efficient compression and execution of DNNs without significantly compromising accuracy, involving both the high-level algorithms and their applications in hardware design. In this article, we review the mainstream compression approaches such as compact model, tensor decomposition, data quantization, and network sparsification. We explain their compression principles, evaluation metrics, sensitivity analysis, and joint-way use. Then, we answer the question of how to leverage these methods in the design of neural network accelerators and present the state-of-the-art hardware architectures. In the end, we discuss several existing issues such as fair comparison, testing workloads, automatic compression, influence on security, and framework/hardware-level support, and give promising topics in this field and the possible challenges as well. This article attempts to enable readers to quickly build up a big picture of neural network compression and acceleration, clearly evaluate various methods, and confidently get started in the right way.

...read moreread less

499 citations

Cites background from "Vlsi Digital Signal Processing Syst..."

...The above quantization technique looks like the function of analog-to-digital converters (ADCs) in signal processing systems [141]....
[...]

Proceedings Article•DOI•

IMPACT: imprecise adders for low-power approximate computing

[...]

Vaibhav Kumar Gupta¹, Debabrata Mohapatra¹, Sang Phill Park¹, Anand Raghunathan¹, Kaushik Roy¹ - Show less +1 more•Institutions (1)

Purdue University¹

01 Aug 2011

TL;DR: This paper proposes logic complexity reduction as an alternative approach to take advantage of the relaxation of numerical accuracy, and demonstrates this concept by proposing various imprecise or approximate Full Adder cells with reduced complexity at the transistor level, and utilizing them to design approximate multi-bit adders.

...read moreread less

Abstract: Low-power is an imperative requirement for portable multimedia devices employing various signal processing algorithms and architectures. In most multimedia applications, the final output is interpreted by human senses, which are not perfect. This fact obviates the need to produce exactly correct numerical outputs. Previous research in this context exploits error-resiliency primarily through voltage over-scaling, utilizing algorithmic and architectural techniques to mitigate the resulting errors. In this paper, we propose logic complexity reduction as an alternative approach to take advantage of the relaxation of numerical accuracy. We demonstrate this concept by proposing various imprecise or approximate Full Adder (FA) cells with reduced complexity at the transistor level, and utilize them to design approximate multi-bit adders. In addition to the inherent reduction in switched capacitance, our techniques result in significantly shorter critical paths, enabling voltage scaling. We design architectures for video and image compression algorithms using the proposed approximate arithmetic units, and evaluate them to demonstrate the efficacy of our approach. Post-layout simulations indicate power savings of up to 60% and area savings of up to 37% with an insignificant loss in output quality, when compared to existing implementations.

...read moreread less

386 citations

Cites methods from "Vlsi Digital Signal Processing Syst..."

...One-dimensional integer DCT y(k) for an 8-point sequence x(i) is given by [16]...
[...]

Journal Article•DOI•

Phase Estimation Methods for Optical Coherent Detection Using Digital Signal Processing

[...]

M.G. Taylor

01 Apr 2009-Journal of Lightwave Technology

TL;DR: In this article, the phase estimation methods are numerically modeled: the maximum a posteriori (MAP) phase estimate, decision directed estimate, power law-Wiener filter estimate and power law PLL estimate.

...read moreread less

Abstract: The advent of digital signal processing (DSP) to optical coherent detection means that more phase estimation options are available, compared to the earlier generation where phase-locked loops (PLLs) were invariably deployed in synchronous coherent receivers. Several phase estimation methods are numerically modeled: the maximum a posteriori (MAP) phase estimate, decision directed estimate, power law-Wiener filter estimate and power law-PLL estimate. An asynchronous coherent detection case is also modeled. The phase estimates are evaluated with respect to their tolerance of finite laser linewidth and their suitability for implementation in a parallel digital processor. Laser phase noise causes transmission system performance to be degraded by excess bit errors and cycle slips. The optimal phase estimate is the MAP estimate, and it is also included as a baseline. The power law-Wiener filter phase estimate is found to perform only marginally worse than the MAP estimate. It must be recast using a look-ahead computation to be implemented in a parallel digital processor, and the impact is investigated of the increase in the number of computations required. Differential logical detection is often used to reduce the impact of cycle slip events, and the implications of this operation on the bit error rate are studied. It is found that by choosing the correct FEC scheme differential logical detection does not increase the Q-factor penalty.

...read moreread less

289 citations

Cites background or methods from "Vlsi Digital Signal Processing Syst..."

...For the case where , further computation savings are available by writing the sum in the numerator of (15) as a power-of-2 decomposition [27]....
[...]
...A resolution to this issue is found by recasting the algorithms using a look-ahead computation [27]....
[...]
...The amount of resources can be reduced by using incremental block processing [27]....
[...]
...There are methods of adapting algorithms for VLSI processors that leave the signal processing behavior unchanged [27], and one of these methods will be used....
[...]

Journal Article•DOI•

Flipping structure: an efficient VLSI architecture for lifting-based discrete wavelet transform

[...]

Chao-Tsung Huang¹, Po-Chih Tseng¹, Liang-Gee Chen¹•Institutions (1)

National Taiwan University¹

01 Apr 2004-IEEE Transactions on Signal Processing

TL;DR: An efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform that can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement by flipping conventional lifting structures.

...read moreread less

Abstract: In this paper, an efficient very large scale integration (VLSI) architecture, called flipping structure, is proposed for the lifting-based discrete wavelet transform. It can provide a variety of hardware implementations to improve and possibly minimize the critical path as well as the memory requirement of the lifting-based discrete wavelet transform by flipping conventional lifting structures. The precision issues are also analyzed. By case studies of the JPEG2000 default lossy (9,7) filter, an integer (9,7) filter, and the (6,10) filter, the efficiency of the proposed flipping structure is demonstrated.

...read moreread less

221 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

VLSI Array processors

[...]

Sun-Yuan Kung

01 Jan 1985-IEEE Assp Magazine

TL;DR: A general overview of VLSI array processors and a unified treatment from algorithm, architecture, and application perspectives is provided in this article, where a broad range of application domains including digital filtering, spectrum estimation, adaptive array processing, image/vision processing, and seismic and tomographic signal processing.

...read moreread less

Abstract: High speed signal processing depends critically on parallel processor technology. In most applications, general-purpose parallel computers cannot offer satisfactory real-time processing speed due to severe system overhead. Therefore, for real-time digital signal processing (DSP) systems, special-purpose array processors have become the only appealing alternative. In designing or using such array Processors, most signal processing algorithms share the critical attributes of regularity, recursiveness, and local communication. These properties are effectively exploited in innovative systolic and wavefront array processors. These arrays maximize the strength of very large scale integration (VLSI) in terms of intensive and pipelined computing, and yet circumvent its main limitation on communication. The application domain of such array processors covers a very broad range, including digital filtering, spectrum estimation, adaptive array processing, image/vision processing, and seismic and tomographic signal processing, This article provides a general overview of VLSI array processors and a unified treatment from algorithm, architecture, and application perspectives.

...read moreread less

1,633 citations

Report•DOI•

VLSI Array Processor

[...]

Ed Greenwood

11 Jan 1982

TL;DR: Detailed design of the Arithmetic Processor Unit (APU) chip has been completed and all cell types have been run through the design rule check (DRC) programs, corrected and verified.

...read moreread less

Abstract: : Detail design of the Arithmetic Processor Unit (APU) chip has been completed. All cell types (100) have been run through the design rule check (DRC) programs, corrected and verified. DRC runs on the entire chip have been run and all corrections have been made. Fifteen out of eighteen of the chip DRC corrections have been verified. The metal, polysilicon and information data layers of the APU layout is shown. The attached drawings, titled 'VLSI Array Processor Arithmetic Processor Unit Chip Plan' is a detail drawing of the APU chip Plan. The functional level simulator of the APU has been built and verified using a set of APU diagnostic code. A gate level logic simulation of the APU has been built. The APU breadboard modules have been fabricated and check out has been initiated. The Array Processor Demonstration System (APDS) modules are in the wire-wrap process. The APDS and APU microcode assembler have been built and checked out. The linker and loader for the APDS have also been built.

...read moreread less

46 citations

Book•

VLSI Synthesis of DSP Kernels: Algorithmic and Architectural Transformations

[...]

Mahesh Mehendale, Sunil D. Sherlekar

30 Jun 2001

TL;DR: This paper presents a framework for Algorithmic and Architectural Transformations for Multiplication-Free Linear Transforms and some examples of how this framework has been applied to DSP implementation.

...read moreread less

Abstract: List of Figures. List of Tables. Foreword. Acknowledgments. Preface. 1. Introduction. 2. Programmable DSP Based Implementation. 3. Implementation Using Hardware Multiplier(s) and Adder(s). 4. Distributed Arithmetic Based Implementation. 5. Multiplier-Less Implementation. 6. Implementation of Multiplication-Free Linear Transforms. 7. Residue Number System Based Implementation. 8. A Framework for Algorithmic and Architectural Transformations. 9. Summary. References. Topic Index. About the Authors. Index.

...read moreread less

11 citations