scispace - formally typeset
Journal ArticleDOI

Optimal choice of intermediate latching to maximize throughput in VLSI circuits

Reads0
Chats0
TLDR
The results show that significant reductions in AP product (reciprocal of throughput per unit area) can be achieved by intermediate latching in many typical signal processing applications, for a wide range of circuit parameters.
Abstract
In many computational tasks, especially in signal processing, it is the throughput that is important, rather than the latency, or delay. If a special-purpose VLSI chip is designed for a particular signal processing task, such as FIR filtering, for example, the maximum clock rate, and hence throughput, is determined by the depth of the combinational logic between registers and the time required for the distribution and operation of the clock. If the combinational logic is sufficiently deep (in bit-parallel circuits, for example), the throughput can be increased by inserting intermediate stages of clocked latches. This is at the expense of increased area and delay to operate and clock the intermediate registers. Roughly speaking, the strategy amounts to using more of the chip area to store information useful for pipelining. This paper investigates the optimal tradeoff between the degree of intermediate latching and cost, using the measure AP, where A is the chip area and P is the period (the reciprocal of throughput). We derive expressions for the time and area before and after intermediate latching, using the Mead-Conway model, both for the cases of on-chip and off-chip clock drivers. The results show that significant reductions in AP product (reciprocal of throughput per unit area) can be achieved by intermediate latching in many typical signal processing applications, for a wide range of circuit parameters. The array multiplier is used as an example.

read more

Citations
More filters
Journal ArticleDOI

Parallel bit-level pipelined VLSI designs for high-speed signal processing

TL;DR: Issues involved in designing fully pipelined VLSI architectures, including clock skew, clock distribution networks, buffering, timing simulation, area overhead due to pipelining, and testing are discussed.
Journal ArticleDOI

Pipeline interleaved programmable DSP's: Architecture

TL;DR: This paper proposes applying an old but rarely used architectural approach to the design of single-chip signal processors so that the potential benefits of extensive pipelining can be fully realized.
Journal ArticleDOI

Computer-aided design of VLSI FIR filters

TL;DR: The purpose of the paper is to illustrate the benefits of applying both bit-level systolic array architecture and application-specific CAD to the problem of FIR filtering and reduce the costs of very high-throughput FIR filters with respect to design, fabrication, and operation.
Journal ArticleDOI

A note on 'free accumulation' in VLSI filter architectures

TL;DR: Completely pipelined inner product architectures are presented for FIR filtering and linear transformation, using only full adders, organized to form multipliers.
Journal ArticleDOI

A VLSI systolic adder for digital filtering of delta-modulated signals

TL;DR: A fully systolic VLSI architecture allowing addition of N sequentially available input numbers is presented and, by introducing a useful mathematical notation, the correctness of the structure is proved.
References
More filters
Journal ArticleDOI

Signal Delay in RC Tree Networks

TL;DR: Upper and lower bounds for delay that are computationally simple are presented in this paper and can be used to bound the delay, given the signal threshold, and to certify that a circuit is "fast enough," given both the maximum delay and the voltage threshold.

Optimizing synchronous systems

TL;DR: A transformation that converts synchronous systems into more time-efficient, systolic implementations by removing combinational rippling is presented, showing how the problem of determining the optimized system can be reduced to the graph-theoretic single-destination-shortest-paths problem.
Proceedings ArticleDOI

Signal Delay in RC Tree Networks

TL;DR: Upper and lower bounds for delay that are computationally simple are presented here to certify that a circuit is "fast enough", given both the maximum delay and the voltage threshold.
Proceedings ArticleDOI

Optimizing synchronous systems

TL;DR: In this paper, the problem of determining the optimized system can be reduced to the graph-theoretic single-destination-shortest-paths (SDP) problem.
Book ChapterDOI

A Two-Level Pipelined Systolic Array for Convolutions

TL;DR: A two-level pipelined systolic array that is capable of performing convolutions of any dimension and the designs take full advantages of the pipelining assumed to be available at each cell are described.
Related Papers (5)