scispace - formally typeset
Search or ask a question
Author

E Wold

Bio: E Wold is an academic researcher. The author has contributed to research in topics: Very-large-scale integration & Adder. The author has an hindex of 1, co-authored 1 publications receiving 4 citations.

Papers
More filters
ReportDOI
01 Nov 1982
TL;DR: Pipeline and parallel-pipeline organizations are developed and are shown to meet the constraints imposed by VLSI and a broad set of possible FFT organizations is discussed.
Abstract: The construction of Fast Fourier Transform (FFT) processors is discussed. Pipeline and parallel-pipeline organizations are developed and are shown to meet the constraints imposed by VLSI. Various circuit technologies for the construction of these processors are compared, and the description of a set of NMOS chips are given. A technique for reducing the latency of the adders internal to the chips is also presented. Finally, a broad set of possible FFT organizations is discussed.

4 citations


Cited by
More filters
Journal ArticleDOI
D. A. Carlson, B. Sugla1
TL;DR: Uniform, systolic constructions of limited width parallel prefix circuits are provided here and shown to be asymptotically optimal by associating the width of the circuit with the number of processors and the fan-out capabilities of the Circuit with the interconnection structure of a multiprocessor, time- and processor-efficient algorithms may be developed.
Abstract: In this paper, we present lower and upper bounds on the size of limited width, bounded and unbounded fan-out parallel prefix circuits. The lower bounds on the sizes of such circuits are a function of the depth, width, and number of inputs. The size requirement of an N input bounded fan-out parallel prefix circuit having limited width W and extra depth k (the difference between allowed and minimum possible depth) is shown to be Ω(N log2 W/2 k + N) for k ≤ log2 W. This implies that insisting on minimum depth causes the circuit size to be nonlinear, while as little as log2log2 W of extra depth can possibly reduce the size to linear. Also, we show that there is a clear difference between the two cases of bounded and unbounded fan-out by proving the size of a limited width, unbounded fan-out parallel prefix circuit lies between a lower bound of Ω((2 + 21−k /3)N) and an upper bound of O((2 + 21−k )N). Uniform, systolic constructions of limited width parallel prefix circuits are provided here and shown to be asymptotically optimal. By associating the width of the circuit with the number of processors and the fan-out capabilities of the circuit with the interconnection structure of a multiprocessor, time- and processor-efficient algorithms may be developed.

21 citations

Journal ArticleDOI
TL;DR: This work shows how to compute the discrete Fourier transform at n points with an optimal speed-up as long as the memory is large enough and the control is shown to be simple and easily implementable in VLSI.

16 citations

Proceedings ArticleDOI
02 Sep 1991
TL;DR: The error performance and simulation results of the Arithmetic Cube are examined and it is argued that the WFTA performs better, with respect to accuracy, than the Prime Factor Algorithm (PFA), if both are computed on the Cube.
Abstract: This paper examines the error performance and presents simulation results of the Arithmetic Cube. The Arithmetic Cube is a special purpose architecture for computing high speed convolution and the DFT. An error analysis is performed for convolution and the DFT, as computed on the Cube. An upper bound on the number of bits lost is derived. The Cube looses at most an extra two bits (four bits), while computing convolution (DFT), more than the number of bits lost if computed by the direct, limited precision convolution (DFT). A VHDL description of the Cube was written and simulations were run. Simulation results substantiate the derived upper bounds. A comparison of the Winograd Fourier-transform-algorithm (WFTA), computed by the Cube, and a rounded FFT, shows that the Cube is at least as accurate as the rounded FFT. Contrary to previous results, it is argued that the WFTA performs better, with respect to accuracy, than the Prime Factor Algorithm (PFA), if both are computed on the Cube. >

4 citations

Proceedings ArticleDOI
04 Jun 1985
TL;DR: This paper presents a propotype chip, which has been designed in 2 μm NMOS technology, for the generalized butterfly unit, which is a two-stage pipelined processor.
Abstract: Architectures based on the vector-radix 2DFFT algorithm and hence can avoid the matrix transpose problem have been proposed. The unique feature of the proposed architectures is that the data can be driven into the arithmetic processors in a pipeline fashion. This paper presents a propotype chip, which has been designed in 2 μm NMOS technology, for the generalized butterfly unit. The chip is a two-stage pipelined processor. The design experience, timing information, and the chip features including four multipliers, one adder/subtracter and PLA controllers are presented.

1 citations