Computational Aspects of Vlsi

Book•

Computational Aspects of Vlsi

01 Jan 1984-

About: The article was published on 1984-01-01 and is currently open access. It has received 862 citations till now. The article focuses on the topics: Very-large-scale integration.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A VLSI design methodology for distributed arithmetic

[...]

Wayne Burleson¹, Louis L. Scharf²•Institutions (2)

University of Massachusetts Amherst¹, University of Colorado Boulder²

01 Jun 1991

TL;DR: This work proposes a systematic method for synthesizing optimal VLSI architectures using distributed arithmetic and compares distributed arithmetic with more conventional methods for inner product computation and shows how area, latency and period may be traded off while maintaining constant error.

...read moreread less

Abstract: Real-time signal processing requires fast computation of inner products. Distributed arithmetic is a method of inner product computation that uses table-lookup and addition in place of multiplication. Distributed arithmetic has previously been shown to produce novel and seemingly efficient architectures for a variety of signal processing computations; however the methods of design, analysis and comparison have been ad hoc. We propose a systematic method for synthesizing optimal VLSI architectures using distributed arithmetic. A partition of the inner product computation at the word and bit level produces a computation consisting of lookups and additions. We study two classes of algorithms to implement this computation, regular iterative algorithms and tree algorithms, each of which can be expressed in the form of a dependency graph. We use linear and nonlinear maps to assign computations to processors in space and time. Expressions are developed for the area, latency, period and arithmetic error for a particular partition and space/time map of the dependecy graph. We use these expressions to formulate a constrained optimization problem over a large class of architectures. We compare distributed arithmetic with more conventional methods for inner product computation and show how area, latency and period may be traded off while maintaining constant error.

...read moreread less

28 citations

Cites background or methods from "Computational Aspects of Vlsi"

...We measure the complexity of a VLSI architecture in terms of area A, latency L, and period P. VLSI communication models [ 32 ], [33] have been used to analyze numerous computations....
[...]
...VLSI Complexity Theory [ 32 ], [33] uses communication models for the asymptotic analysis of interconnect complexity....
[...]
...Ullman [ 32 ] proves a number of fundamental theorems for the layout of tree structures in VLSI....
[...]

Journal Article•

Stability, fairness and scalability of multi-agent systems

[...]

Philippe De Wilde, Hyacinth S. Nwana, L. C. Lee

01 Jan 1999-International Journal of Knowledge-based and Intelligent Engineering Systems

TL;DR: A general network of agents that can be built with Zeus is studied, to determine how much discounting the agents can allow, and how to control the coordination time.

...read moreread less

Abstract: We study a general network of agents that can be built with Zeus [?]. Relationships between agents can be peer, slave, master, discounter, or no relation at all. There are four possible strategies: the cheapest agent is selected, preference to slaves first, cut-price discounting based on the utility, and cheapest agent chosen, but preference given to cheapest slave. The cost of a task for the agent originating it, is the cost of the resources used. The size of the initial endowment is determined so that there are never any lost tasks in the system. We also establish the influence of agent strategies on stability and fairness. We were able to determine how much discounting the agents can allow, and how to control the coordination time. The growth of the maximum communication load with respect to the number of agents is calculated for various topologies of networks of agents. A performance measure related to the speed of the network is also calculated.

...read moreread less

28 citations

Proceedings Article•

Time-Space Optimal Parallel Merging and Sorting.

[...]

Xiaojun Guan¹, Michael A. Langston²•Institutions (2)

Washington State University¹, University of Tennessee²

01 Jan 1989

TL;DR: The authors present a parallel merging algorithm that, on an exclusive-read exclusive-write (EREW) parallel random-access machine (PRAM) with k processors merges two sorted lists of total length n in O(n/k+log n) time and constant extra space per processor, and hence is time-space optimal for any value of k.

...read moreread less

Abstract: The authors present a parallel merging algorithm that, on an exclusive-read exclusive-write (EREW) parallel random-access machine (PRAM) with k processors merges two sorted lists of total length n in O(n/k+log n) time and constant extra space per processor, and hence is time-space optimal for any value of k >

...read moreread less

27 citations

Journal Article•DOI•

A note on 'free accumulation' in VLSI filter architectures

[...]

Peter R. Cappello, Kenneth Steiglitz

01 Mar 1985-IEEE Transactions on Circuits and Systems

TL;DR: Completely pipelined inner product architectures are presented for FIR filtering and linear transformation, using only full adders, organized to form multipliers.

...read moreread less

Abstract: Completely pipelined inner product architectures are presented for FIR filtering and linear transformation. The designs use only full adders, organized to form multipliers. By cascading these multiplier structures, no additional area or time is needed to sum their products. The merits of the FFT are briefly reconsidered in the context of high throughput VLSI structures for digital signal processing.

...read moreread less

27 citations

Journal Article•DOI•

Cache-Friendly implementations of transitive closure

[...]

Michael Penner¹, Viktor K. Prasanna¹•Institutions (1)

University of Southern California¹

09 Feb 2007-ACM Journal of Experimental Algorithms

TL;DR: This paper begins by applying two standard cache-friendly optimizations to the Floyd--Warshall algorithm and shows limited performance improvements, then discusses the unidirectional space time representation (USTR), which can be used to reduce the amount of processor-memory traffic by a factor of O(&sqrt;C), where C is the cache size.

...read moreread less

Abstract: The topic of cache performance has been well studied in recent years. Compiler optimizations exist and optimizations have been done for many problems. Much of this work has focused on dense linear algebra problems. At first glance, the Floyd--Warshall algorithm appears to fall into this category. In this paper, we begin by applying two standard cache-friendly optimizations to the Floyd--Warshall algorithm and show limited performance improvements. We then discuss the unidirectional space time representation (USTR). We show analytically that the USTR can be used to reduce the amount of processor-memory traffic by a factor of O(√C), where C is the cache size, for a large class of algorithms. Since the USTR leads to a tiled implementation, we develop a tile size selection heuristic to intelligently narrow the search space for the tile size that minimizes total execution time. Using the USTR, we develop a cache-friendly implementation of the Floyd--Warshall algorithm. We show experimentally that this implementation minimizes the level-1 and level-2 cache misses and TLB misses and, therefore, exhibits the best overall performance. Using this implementation, we show a 2x improvement in performance over the best compiler optimized implementation on three different architectures. Finally, we show analytically that our implementation of the Floyd--Warshall algorithm is asymptotically optimal with respect to processor-memory traffic. We show experimental results for the Pentium III, Alpha, and MIPS R12000 machines using problem sizes between 1024 and 2048 vertices. We demonstrate improved cache performance using the Simplescalar simulator.

...read moreread less

27 citations

Collapse

Computational Aspects of Vlsi

Citations

Cites background or methods from "Computational Aspects of Vlsi"

Related Papers (5)