scispace - formally typeset
Search or ask a question
Book

Computational Aspects of Vlsi

01 Jan 1984-
About: The article was published on 1984-01-01 and is currently open access. It has received 862 citations till now. The article focuses on the topics: Very-large-scale integration.
Citations
More filters
Journal ArticleDOI
TL;DR: A simulated annealing algorithm is introduced that can map any physical model onto an FPGA regardless of the model's underlying topology, and complex models that could not previously be routed due to complexity were made routable when using placement constraints.
Abstract: Physical models utilize mathematical equations to characterize physical systems like airway mechanics, neuron networks, or chemical reactions. Previous work has shown that field programmable gate arrays (FPGAs) execute physical models efficiently. To improve the implementation of physical models on FPGAs, this article leverages graph theoretic techniques to synthesize physical models onto FPGAs. The first phase maps physical model equations onto a structured virtual processing element (PE) graph using graph theoretic folding techniques. The second phase maps the structured virtual PE graph onto physical PE regions on an FPGA using graph embedding theory. A simulated annealing algorithm is introduced that can map any physical model onto an FPGA regardless of the model's underlying topology. We further extend the simulated annealing approach by leveraging existing graph drawing algorithms to generate the initial placement. Compared to previous work on physical model implementation on FPGAs, embedding increases clock frequency by 25p on average (for applicable topologies), whereas simulated annealing increases frequency by 13p on average. The embedding approach typically produces a circuit whose frequency is limited by the FPGA clock instead of routing. Additionally, complex models that could not previously be routed due to complexity were made routable when using placement constraints.

5 citations

Journal ArticleDOI
TL;DR: The design of a special-purpose VLSI CMOS processor network for the implementation of the decimation-in-frequency FFT algorithm is described, composed of several pipelined processors that allow the user to perform complex number addition subtraction or multiplication by the use of single commands.

5 citations

Journal Article
TL;DR: A refined step-wise assembly method is proposed, which provides control of the assembly in distinct steps, and the main results are LP-BMC algorithms for some fundamental problems that form the basis of many parallel computations.
Abstract: Biomolecular Computation(BMC) is computation at the molecular scale, using biotechnology engineering techniques. Most proposed methods for BMC used distributed (molecular) parallelism (DP); where operations are executed in parallel on large numbers of distinct molecules. BMC done exclusively by DP requires that the computation execute sequentially within any given molecule (though done in parallel for multiple molecules). In contrast, local parallelism (LP) allows operations to be executed in parallel on each given molecule. Winfree, et al [W96, WYS96]) proposed an innovative method for LPBMC, that of computation by unmediated self-assembly of 2D arrays of DNA molecules, applying known domino tiling techniques (see Buchi [B62], Berger [B66], Robinson [R71], and Lewis and Papadimitriou [LP81]) in combination with the DNA self-assembly techniques of Seeman et al [SZC94]. We develop improved techniques to more fully exploit the potential power of LP-BMC. we propose a refined step-wise assembly method, which provides control of the assembly in distinct steps. Step-wise assembly may increase the likelihood of success of assembly, decrese the number of tiles required, and provide additional control of the assembly process. The assembly depth is the number of stages of assembly required and the assembly size is the number of tiles required. We also introduce the assembly frame, a rigid nanostructure which binds the input DNA strands in place on its boundaries and constrains the shape of the assembly. Our main results are LP-BMC algorithms for some fundamental problems that form the basis of many parallel computations. For these problems we decrease the assembly size to linear in the input size and and significantly decrease the assembly depth. We give LP-BMC algorithms with linear assembly size and logarithmic assembly depth, for the parallel prefix computation problems, which include integer addition, subtraction, multiplication by a constant number, finite state automata simulation, and ∗A preliminary version of this paper appeared in Proc. DNA-Based Computers, III: University of Pennsylvania, June 23-26, 1997. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, H. Rubin and D. H. Wood, editors. American Mathematical Society, Providence, RI, vol. 48, 1999, pp. 217-254. †Department of Computer Science, Duke University, Durham, NC , USA and Adjunct, King Abdulaziz University (KAU), Jeddah, Saudi Arabia

5 citations

Proceedings ArticleDOI
23 Feb 1988
TL;DR: In this article, the authors present algorithms for transposing a matrix on a mesh-connected array processor (MCAP) in parallel, which make a very efficient use of the processing elements (PE's) in a parallel fashion.
Abstract: Matrix transposition is one of the major tasks in image and signal processing and matrix decompositions. This paper presents algorithms for transposing a matrix on a mesh-connected array processor (MCAP). These algorithms make a very efficient use of the processing elements (PE's) in parallel. We discuss both synchronous and asynchronous algorithms. In the synchronized approach algorithms use a global clock to synchronize the communications between PE's. The number of time units required by synchronous algorithms for transposing an m x n matrix (n ≥ m) on an n x n MCAP is 2(n - 1). The synchronous algorithms eliminate simultaneous requests for using channels between PE's. Clock skews and delays are inevitable problems when we have a large array size (large n). An asynchronous (self-time) approach is proposed to circumvent this problem. The feasibility of the asynchronous algorithm have been demonstrated by the simulation of the algorithm for different sizes of matrices.

5 citations

Book ChapterDOI
19 Apr 2010
TL;DR: It is shown that every layered Boolean circuits of size s can be simulated by a layered Boolean circuit of depth $O(\sqrt{s\log s}), and an adaptive strategy based on the two-person pebble game introduced by Dymond and Tompa is used.
Abstract: We consider the relationship between size and depth for layered Boolean circuits, synchronous circuits and planar circuits as well as classes of circuits with small separators. In particular, we show that every layered Boolean circuit of size s can be simulated by a layered Boolean circuit of depth $O(\sqrt{s\log s})$. For planar circuits and synchronous circuits of size s, we obtain simulations of depth $O(\sqrt{s})$. The best known result so far was by Paterson and Valiant [16], and Dymond and Tompa [6], which holds for general Boolean circuits and states that D(f)=O(C(f)/logC(f)), where C(f) and D(f) are the minimum size and depth, respectively, of Boolean circuits computing f. The proof of our main result uses an adaptive strategy based on the two-person pebble game introduced by Dymond and Tompa [6]. Improving any of our results by polylog factors would immediately improve the bounds for general circuits.

5 citations


Cites methods from "Computational Aspects of Vlsi"

  • ...The following gives a formal definition of a node separator in the fashion of Ullman [23]....

    [...]