scispace - formally typeset
Search or ask a question

Showing papers by "Gianfranco Bilardi published in 1995"


Journal ArticleDOI
TL;DR: The ultimate impact of fundamental physical limitations on parallel computing machines is considered, and it is found that scalability holds only for neighborly interconnections of bounded-size synchronous modules, presumably of the area-universal type.

75 citations


Journal ArticleDOI
TL;DR: It is shown that parallelism and locality combined may yield speedups superlinear in the number of processors, due to the optimality of the obtained tradeoffs as established in a companion paper.
Abstract: Upper bounds are derived for the processor-time tradeoffs of machines such as linear arrays and two-dimensional meshes, which are compatible with the physical limitation expressed by bounded-speed propagation of messages (due to the finiteness of the speed of light). It is shown that parallelism and locality combined may yield speedups superlinear in the number of processors. The speedups are inherent, due to the optimality of the obtained tradeoffs as established in a companion paper. Simulations are developed of multiprocessor machines by analogous machines with fewer processors. A crucial role is played by the hierarchical nature of the memory system. A divide-and-conquer technique for hierarchical memories is developed, based on the graph-theoretic notion of topological separator. For multiprocessors, this technique also requires a careful balance of memory access and interprocessor communication costs, which leads to non-intuitive orchestrations of the simulation process.

45 citations


Journal ArticleDOI
TL;DR: In this paper, the pruned butterfly and the sorting fat-tree are presented, which can simulate any other routing network fitting in similar area with polylogarithmic slowdown.
Abstract: Two deterministic routing networks are presented: the pruned butterfly and the sorting fat-tree. Both networks are area-universal, that is, they can simulate any other routing network fitting in similar area with polylogarithmic slowdown. Previous area-universal networks were either for the off-line problem, where the message set to be routed is known in advance and substantial precomputation is permitted, or involved randomization, yielding results that hold only with high probability. The two networks introduced here are the first that are simultaneously deterministic and on-line, and they use two substantially different routing techniques. The performance of their routing algorithms depends on the difficulty of the problem instance, which is measured by a quantity l known as the load factor. The pruned butterfly runs in time O(llog2N), is the number of possible sources and destinations for messages and l is assumed to be polynomial in N. The sorting fat-tree algorithm runs in O(l log N + log2N) time for a restricted class of message sets including partial permutations. Other results of this work include a “flexible” circuit that is area-time optimal across a range of different input sizes and an area-time lower bound for routers based on wire-length arguments.

25 citations


Proceedings ArticleDOI
01 Jun 1995
TL;DR: A data structure called the augmented post-dominator tree (APT) is introduced which is constructed in space and time proportional to the size of the program, and which can answer control dependence queries in time proportionally to thesize of the output.
Abstract: The control dependence relation is used extensively in restructuring compilers. This relation is usually represented using the control dependence graph; unfortunately, the size of this data structure can be quadratic in the size of the program, even for some structured programs. In this paper, we introduce a data structure called the augmented post-dominator tree (APT) which is constructed in space and time proportional to the size of the program, and which can answer control dependence queries in time proportional to the size of the output. Therefore, APT is an optimal representation of control dependence. We also show that using APT, we can compute SSA graphs, as well as sparse dataflow evaluator graphs, in time proportional to the size of the program. Finally, we put APT in perspective by showing that it can be viewed as a factored representation of control dependence graph in which filtered search is used to answer queries.

22 citations


Journal ArticleDOI
TL;DR: It is shown that, when exactness is not required, prudence, consistency and responsiveness, even together, do not restrict the power of conservative learners.

17 citations


Proceedings ArticleDOI
20 Jul 1995
TL;DR: Lower bounds are developed for the processor-time tradeoffs of machines such us linear arrays and two-dimensional meshes, which are compatible with the physical limitation on speed propagation of messages.
Abstract: Lower bounds are developed for the processor-time tradeoffs of machines such us linear arrays and two-dimensional meshes, which are compatible with the physical limitation on speed propagation of messages. It is shown that, under this limitation, parallelism and locality combined may yield speedups superlinear in the number of processors. The results are obtained by means of on a novel technique, called the “closed-dichotomy-size technique”, designed to obtain lower bounds to the computation time for networks of processors, each of which is equipped with a local hierarchical memory.

11 citations


Book ChapterDOI
16 Aug 1995
TL;DR: Lower bounds are developed for the processor-time tradeoffs of machines such us linear arrays and two-dimensional meshes, which are compatible with the physical limitation on speed propagation of messages.
Abstract: Lower bounds are developed for the processor-time tradeoffs of machines such us linear arrays and two-dimensional meshes, which are compatible with the physical limitation on speed propagation of messages. It is shown that, under this limitation, parallelism and locality combined may yield speedups superlinear in the number of processors. The results are obtained by means of on a novel technique, called the “closed-dichotomy-size technique”, designed to obtain lower bounds to the computation time for networks of processors, each of which is equipped with a local hierarchical memory.

6 citations


Book ChapterDOI
29 Aug 1995
TL;DR: Under the assumption that memory cells containing list nodes bear no distinctive tags distinguishing them from other cells, an Ω (min{l, n/p}) randomized lower bound for l-node lists is established and a deterministic algorithm whose running time is within a logarithmic additive term of this bound is presented.
Abstract: The list marking problem involves marking the nodes of an l-node linked list stored in the memory of a (p, n)-PRAM, when only the location of the head of the list is initially known. Under the assumption that memory cells containing list nodes bear no distinctive tags distinguishing them from other cells, we establish an Ω (min{l, n/p}) randomized lower bound for l-node lists and present a deterministic algorithm whose running time is within a logarithmic additive term of this bound. In the case where list cells are tagged in a way that differentiates them from other cells, we establish a tight θ (min {l,l/p + √(n/p) log n }) bound for randomized algorithms.

1 citations