scispace - formally typeset
Search or ask a question

Showing papers by "Charles E. Leiserson published in 1997"


Proceedings ArticleDOI
01 Jun 1997
TL;DR: A provably efficient determinacy-race detector for Cilk, an algorithmic multithreaded programming language, that determines at least one location in the program that is subject to a determinacy race and certifies that the program is race free when run on the data set.
Abstract: A parallel multithreaded program that is ostensibly deterministic may nevertheless behave nondeterministically due to bugs in the code. These bugs are called determinacy races, and they result when one thread updates a location in shared memory while another thread is concurrently accessing the location. We have implemented a provably efficient determinacy-race detector for Cilk, an algorithmic multithreaded programming language. If a Cilk program is run on a given input data set, our debugging tool, which we call the ``Nondeterminator,'' either determines at least one location in the program that is subject to a determinacy race, or else it certifies that the program is race free when run on the data set.

106 citations


Journal ArticleDOI
TL;DR: Two strategies for reducing the clock period of a two-phase, level-clocked circuit are investigated: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches, which can be used to convert a circuit with edge-triggered latches into a faster level-Clocked one.
Abstract: We investigate two strategies for reducing the clock period of a two-phase, level-clocked circuit: clock tuning, which adjusts the waveforms that clock the circuit, and retiming, which relocates circuit latches. These methods can be used to convert a circuit with edge-triggered latches into a faster level-clocked one. We model a two-phase circuit as a graph G 5 (V, E) whose vertex set V is a collection of combinational logic blocks, and whose edge set E is a set of interconnections. Each interconnection passes through zero or more latches, where each latch is clocked by one of two periodic, nonoverlapping waveforms, or phases. We give efficient polynomial-time algorithms for problems involving the timing verification and optimization of two-phase circuitry. Included are algorithms for —verifying proper timing: O(VE) time. —minimizing the clock period by clock tuning: O(VE) time. —retiming to achieve a given clock period when the phases are symmetric: O(VE 1 V lg V) time. —retiming to achieve a given clock period when either the duty cycle (high time) of one phase or the ratio of the phases’ duty cycles is fixed: O(V) time. We give fully polynomial-time approximation schemes for clock period minimization, within any given relative error e . 0, by —retiming and tuning when the duty cycles of the two phases are required to be equal: O((VE 1 V lg V)lg(V/e)) time. —retiming and tuning when either the duty cycle of one phase is fixed or the ratio of the phases’ duty cycles is fixed: O(V lg(V/e)) time. —simultaneous retiming and clock tuning with no conditions on the duty cycles of the two phases: O(V(1/e)lg(1/e) 1 (VE 1 V lg V)lg(V/e)) time. The first two of these approximation algorithms can be used to obtain the optimum clock period in the special case where all propagation delays are integers. We generalize most of the results for two-phase clocking schemes to simple multiphase clocking disciplines, including ones with overlapping phases. Typically, the algorithms to verify and optimize This research was supported in part by the Defense Advanced Research Projects Agency under Grant N00014-91-J-1698. Authors’ present addresses: A. T. Ishii, NEC USA C&C Research Laboratories, Princeton, NJ 08540; C. E. Leiserson, Massachusetts Institute of Technology, Laboratory for Computer Science, Cambridge, MA 02139; M. C. Papaefthymiou, Advanced Computer Architecture Laboratory, Room 2218 EECS Building, Ann Arbor, MI 48109-2122. Permission to make digital / hard copy of part or all of this work for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial advantage, the copyright notice, the title of the publication, and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery (ACM), Inc. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and /or a fee. q 1997 ACM 0004-5411/97/0100-0148 $03.50 Journal of the ACM, Vol. 44, No. 1, January 1997, pp. 148–199. the timing of k-phase circuitry are at most a factor of k slower than the corresponding algorithms for two-phase circuitry. Our algorithms have been implemented in TIM, a timing package for two-phase, level-clocked circuitry developed at MIT.

69 citations


Book ChapterDOI
TL;DR: Cilk (pronounced “silk”) is a C-based language for multithreaded parallel programming that makes it easy to program irregular parallel applications, especially as compared with data-parallel or message-passing programming systems.
Abstract: Cilk (pronounced “silk”) is a C-based language for multithreaded parallel programming. Cilk makes it easy to program irregular parallel applications, especially as compared with data-parallel or message-passing programming systems. A Cilk programmer need not worry about protocols and load balancing, which are handled by Cilk's provably efficient runtime system. Many regular and irregular Cilk applications run nearly as fast on one processor as comparable C programs, but the Cilk programs scale well to many processors.

22 citations


Journal ArticleDOI
TL;DR: An optimistic, asynchronous, parallel algorithm that runs in $O(W/P+D+\lg W + \lg P)$ expected time, where W and D are the size and embedded depth, respectively, of the ``volatile'' subcircuit, the subcircuits of elements that have inputs which either change or glitch as a result of the update.
Abstract: The circuit value update problem is the problem of updating values in a representation of a combinational circuit when some of the inputs are changed. We assume for simplicity that each combinational element has bounded fan-in and fan-out and can be evaluated in constant time. This problem is easily solved on an ordinary serial computer in O(W+D) time, where W is the number of elements in the altered subcircuit and D is the subcircuit's embedded depth (its depth measured in the original circuit). In this paper we show how to solve the circuit value update problem efficiently on a P-processor parallel computer. We give a straightforward synchronous, parallel algorithm that runs in $O(W/P + D\lg P)$ expected time. Our main contribution, however, is an optimistic, asynchronous, parallel algorithm that runs in $O(W/P+D+\lg W + \lg P)$ expected time, where W and D are the size and embedded depth, respectively, of the ``volatile'' subcircuit, the subcircuit of elements that have inputs which either change or glitch as a result of the update. To our knowledge, our analysis provides the first analytical bounds on the running time of an optimistic, asynchronous, parallel algorithm.

2 citations