Home
/
Authors
/
M. Torkelson

Author

M. Torkelson

Bio: M. Torkelson is an academic researcher from Lund University. The author has contributed to research in topics: Very-large-scale integration & Pipeline (computing). The author has an hindex of 5, co-authored 5 publications receiving 1002 citations.

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A new approach to pipeline FFT processor

[...]

Shousheng He¹, M. Torkelson¹•Institutions (1)

Lund University¹

15 Apr 1996

TL;DR: A new VLSI architecture for a real-time pipeline FFT processor is proposed, derived by integrating a twiddle factor decomposition technique in the divide-and-conquer approach, which has the same multiplicative complexity as the radix-4 algorithm, but retains the butterfly structure of the Radix-2 algorithm.

...read moreread less

Abstract: A new VLSI architecture for a real-time pipeline FFT processor is proposed. A hardware-oriented radix-2/sup 2/ algorithm is derived by integrating a twiddle factor decomposition technique in the divide-and-conquer approach. The radix-2/sup 2/ algorithm has the same multiplicative complexity as the radix-4 algorithm, but retains the butterfly structure of the radix-2 algorithm. The single-path delay-feedback architecture is used to exploit the spatial regularity in the signal flow graph of the algorithm. For length-N DFT computation, the hardware requirement of the proposed architecture is minimal on both dominant components: log/sub 4/N-1 complexity multipliers and N-1 complexity data memory. The validity and efficiency of the architecture have been verified by simulation in the hardware description language VHDL.

...read moreread less

410 citations

Proceedings Article•DOI•

Designing pipeline FFT processor for OFDM (de)modulation

[...]

Shousheng He¹, M. Torkelson•Institutions (1)

Lund University¹

29 Sep 1998

TL;DR: By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized and the area/power efficiency has been enhanced.

...read moreread less

Abstract: The FFT processor is one of the key components in the implementation of wideband OFDM systems. Architectures with a structured pipeline have been used to meet the fast, real-time processing demand and low-power consumption requirement in a mobile environment. Architectures based on new forms of FFT, the radix-2/sup i/ algorithm derived by cascade decomposition, is proposed. By exploiting the spatial regularity of the new algorithm, the requirement for both dominant elements in VLSI implementation, the memory size and the number of complex multipliers, have been minimized. Progressive wordlength adjustment has been introduced to optimize the total memory size with a given signal-to-quantization-noise-ratio (SQNR) requirement in fixed-point processing. A new complex multiplier based on distributed arithmetic further enhanced the area/power efficiency of the design. A single-chip processor for 1 K complex point FFT transform is used to demonstrate the design issues under consideration.

...read moreread less

322 citations

Proceedings Article•DOI•

Design and implementation of a 1024-point pipeline FFT processor

[...]

Shousheng He¹, M. Torkelson¹•Institutions (1)

Lund University¹

11 May 1998

TL;DR: By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor.

...read moreread less

Abstract: The design and implementation of a 1024-point pipeline FFT processor is presented. The architecture is based on a new form of FFT, the radix-2/sup 2/ algorithm. By exploiting the spatial regularity of the new algorithm, minimal requirement for both dominant components in VLSI implementation has been achieved: only 4 complex multipliers and 1024 complex-word data memory for the pipelined 1K FFT processor. The chip has been implement in 0.5 /spl mu/m CMOS technology and takes an area of 40 mm/sup 2/. With 3.3 V power supply, it can compute 2/sup n/, n=0, 1, ..., 10 complex point forward and inverse FFT in real time with up to 30 MHz sampling frequency. The SQNR is above 50 dB for white noise input.

...read moreread less

243 citations

Proceedings Article•DOI•

FPGA implementation of FIR filters using pipelined bit-serial canonical signed digit multipliers

[...]

Shousheng He¹, M. Torkelson¹•Institutions (1)

Lund University¹

01 May 1994

TL;DR: A pipelinable bit-serial multiplier using Canonic Signed Digit, or CSD code to represent constant coefficients is introduced and it is shown that FPGA architecture is an ideal vehicle for thus optimized bit- serial processing.

...read moreread less

Abstract: A pipelinable bit-serial multiplier using Canonic Signed Digit, or CSD code to represent constant coefficients is introduced. A bit-serial module for a(x/spl plusmn/y)z/sup -1/ type computation is further developed. Optimization over discrete power-of-two coefficient space has been retargeted on this type of multiplier to generate minimized no-zero bit coefficients. This also make it possible to confine the latency to be equivalent to the data wordlength without causing a large delay in partial product sum propagation. A single chip FPGA implementation of a full 16-bit 31-tap Hilbert transformer is used as an example to demonstrate the application of the multiplier module with the special consideration of FPGA architectures. It is shown that FPGA architecture is an ideal vehicle for thus optimized bit-serial processing. >

...read moreread less

38 citations

Proceedings Article•DOI•

A complex array multiplier using distributed arithmetic

[...]

Shousheng He¹, M. Torkelson¹•Institutions (1)

Lund University¹

05 May 1996

TL;DR: The design of an efficient array architecture for the multiplication of complex numbers applying distributed arithmetic is presented and VHDL module with generic parameters has been written and successfully simulated, which enable the complex multiplier module to be included in large designs with required word-lengths for both operands.

...read moreread less

Abstract: The design of an efficient array architecture for the multiplication of complex numbers applying distributed arithmetic is presented. The complex multiplier takes an area just over that of two real multipliers and its speed is almost the same as a single real multiplier. The texture of the design is obtained by an in-depth examination of a real multiplier structure with data in the off-set binary representation. Residue error compensation and the functional requirement of various boundary cells, such as negative weight addition, are discussed in detail. VHDL module with generic parameters has been written and successfully simulated, which enable the complex multiplier module to be included in large designs with required word-lengths for both operands. A test chip has been implemented with a standard library in 0.8 /spl mu/m CMOS process and fabricated.

...read moreread less

9 citations

Cited by

PDF

Open Access

More filters

Journal Article•DOI•

Reconfigurable Computing for Digital Signal Processing: A Survey

[...]

Russell Tessier¹, Wayne Burleson¹•Institutions (1)

University of Massachusetts Amherst¹

01 May 2001

TL;DR: A survey of academic research and commercial development in reconfigurable computing for DSP systems over the past fifteen years is presented in this article, with a focus on the application domain of digital signal processing.

...read moreread less

Abstract: Steady advances in VLSI technology and design tools have extensively expanded the application domain of digital signal processing over the past decade. While application-specific integrated circuits (ASICs) and programmable digital signal processors (PDSPs) remain the implementation mechanisms of choice for many DSP applications, increasingly new system implementations based on reconfigurable computing are being considered. These flexible platforms, which offer the functional efficiency of hardware and the programmability of software, are quickly maturing as the logic capacity of programmable devices follows Moore's Law and advanced automated design techniques become available. As initial reconfigurable technologies have emerged, new academic and commercial efforts have been initiated to support power optimization, cost reduction, and enhanced run-time performance. This paper presents a survey of academic research and commercial development in reconfigurable computing for DSP systems over the past fifteen years. This work is placed in the context of other available DSP implementation media including ASICs and PDSPs to fully document the range of design choices available to system engineers. It is shown that while contemporary reconfigurable computing can be applied to a variety of DSP applications including video, audio, speech, and control, much work remains to realize its full potential. While individual implementations of PDSP, ASIC, and reconfigurable resources each offer distinct advantages, it is likely that integrated combinations of these technologies will provide more complete solutions.

...read moreread less

390 citations

Proceedings Article•DOI•

Designing pipeline FFT processor for OFDM (de)modulation

[...]

Shousheng He¹, M. Torkelson•Institutions (1)

Lund University¹

29 Sep 1998

...read moreread less

322 citations

Journal Article•DOI•

A low-power, high-performance, 1024-point FFT processor

[...]

Bevan M. Baas¹•Institutions (1)

Stanford University¹

01 Mar 1999-IEEE Journal of Solid-state Circuits

TL;DR: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor, which has been fabricated in a standard 0.7 /spl mu/m CMOS process and is fully functional on first-pass silicon.

...read moreread less

Abstract: This paper presents an energy-efficient, single-chip, 1024-point fast Fourier transform (FFT) processor. The 460000-transistor design has been fabricated in a standard 0.7 /spl mu/m (L/sub poly/=0.6 /spl mu/m) CMOS process and is fully functional on first-pass silicon. At a supply voltage of 1.1 V, it calculates a 1024-point complex FFT in 330 /spl mu/s while consuming 9.5 mW, resulting in an adjusted energy efficiency more than 16 times greater than the previously most efficient known FFT processor. At 3.3 V, it operates at 173 MHz-which is a clock rate 2.6 times greater than the previously fastest rate.

...read moreread less

319 citations

Proceedings Article•DOI•

Design and implementation of a 1024-point pipeline FFT processor

[...]

Shousheng He¹, M. Torkelson¹•Institutions (1)

Lund University¹

11 May 1998

...read moreread less

243 citations

Journal Article•DOI•

A 1-GS/s FFT/IFFT processor for UWB applications

[...]

Yu-Wei Lin, Hsuan-Yu Liu, Chen-Yi Lee

25 Jul 2005-IEEE Journal of Solid-state Circuits

TL;DR: A novel 128-point FFT/IFFT processor for ultrawideband (UWB) systems and the proposed pipelined FFT architecture, called mixed-radix multipath delay feedback (MRMDF), can provide a higher throughput rate by using the multidata-path scheme.

...read moreread less

Abstract: In this paper, we present a novel 128-point FFT/IFFT processor for ultrawideband (UWB) systems. The proposed pipelined FFT architecture, called mixed-radix multipath delay feedback (MRMDF), can provide a higher throughput rate by using the multidata-path scheme. Furthermore, the hardware costs of memory and complex multipliers in MRMDF are only 38.9% and 44.8% of those in the known FFT processor by means of the delay feedback and the data scheduling approaches. The high-radix FFT algorithm is also realized in our processor to reduce the number of complex multiplications. A test chip for the UWB system has been designed and fabricated using 0.18-/spl mu/m single-poly and six-metal CMOS process with a core area of 1.76/spl times/1.76 mm/sup 2/, including an FFT/IFFT processor and a test module. The throughput rate of this fabricated FFT processor is up to 1 Gsample/s while it consumes 175 mW. Power dissipation is 77.6 mW when its throughput rate meets UWB standard in which the FFT throughput rate is 409.6 Msample/s.

...read moreread less

220 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178

Collapse