scispace - formally typeset
Search or ask a question
Conference

ACM Symposium on Parallel Algorithms and Architectures 

About: ACM Symposium on Parallel Algorithms and Architectures is an academic conference. The conference publishes majorly in the area(s): Parallel algorithm & Scheduling (computing). Over the lifetime, 1458 publications have been published by the conference receiving 46163 citations.


Papers
More filters
Proceedings ArticleDOI
01 Jun 1997
TL;DR: A simple randomized algorithm for accessing shared objects that tends to satisfy each access request with a nearby copy is designed, based on a novel mechanism to maintain and distribute information about object locations, and requires only a small amount of additional memory at each node.
Abstract: Consider a set of shared objects in a distributed network, where several copies of each object may exist at any given time. To ensure both fast access to the objects as well as efficient utilization of network resources, it is desirable that each access request be satisfied by a copy "close" to the requesting node. Unfortunately, it is not clear how to efficiently achieve this goal in a dynamic, distributed environment in which large numbers of objects are continuously being created, replicated, and destroyed. In this paper, we design a simple randomized algorithm for accessing shared objects that tends to satisfy each access request with a nearby copy. The algorithm is based on a novel mechanism to maintain and distribute information about object locations, and requires only a small amount of additional memory at each node. We analyze our access scheme for a class of cost functions that captures the hierarchical nature of wide-area networks. We show that under the particular cost model considered: (i) the expected cost of an individual access is asymptotically optimal, and (ii) if objects are sufficiently large, the memory used for objects dominates the additional memory used by our algorithm with high probability. We also address dynamic changes in both the network as well as the set of object copies.

792 citations

Proceedings ArticleDOI
03 Jul 2001
TL;DR: Several compact routing schemes for general weighted undirected networks are described, which achieve a near-optimal tradeoff between the size of the routing tables used and the resulting stretch.
Abstract: We describe several compact routing schemes for general weighted undirected networks. Our schemes are simple and easy to implement. The routing tables stored at the nodes of the network are all very small. The headers attached to the routed messages, including the name of the destination, are extremely short. The routing decision at each node takes constant time. Yet, the stretch of these routing schemes, i.e., the worst ratio between the cost of the path on which a packet is routed and the cost of the cheapest path from source to destination, is a small constant. Our schemes achieve a near-optimal tradeoff between the size of the routing tables used and the resulting stretch. More specifically, we obtain: A routing scheme that uses only O (n 1/2) bits of memory at each node of an n-node network that has stretch 3. The space is optimal, up to logarithmic factors, in the sense that every routing scheme with stretch n2), and every routing scheme with stretch n3/2). The headers used are only (1 + O(1)) log2> n-bits long and each routing decision takes constant time. A variant of this scheme with [log2 n] -bit headers makes routing decisions in O(log log n) time. Also, for every integer k > 2, a general handshaking based routing scheme that uses O (n1/k) bits of memory at each node that has stretch 2k - 1. A conjecture of Erdos from 1963, settled for k = 3, 5, implies that the routing tables are of near-optimal size relative to the stretch. The handshaking is similar in spirit to a DNS lookup in TCP/IP. Headers are O(log2 n) bits long and each routing decision takes constant time. Without handshaking, the stretch of the scheme increases to 4k - 5. One ingredient used to obtain the routing schemes mentioned above, may be of independent practical and theoretical interest: A shortest path routing scheme for trees of arbitrary degree and diameter that assigns each vertex of an n-node tree a (1 + O(1)) log2 n-bit label. Given the label of a source node and the label of a destination it is possible to compute, in constant time, the port number of the edge from the source that heads in the direction of the destination. The general scheme for k > 2 also uses a clustering technique introduced recently by the authors. The clusters obtained using this technique induce a sparse and low stretch tree cover of the network. This essentially reduces routing in general networks into routing problems in trees that could be solved using the above technique.

560 citations

Proceedings ArticleDOI
20 Jul 1995
TL;DR: This paper presents a new model of parallel computation---the LogGP model---and uses it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast), examining the all- to-all remap, FFT, and radix sort.
Abstract: We present a new model of parallel computation---the LogGP model---and use it to analyze a number of algorithms, most notably, the single node scatter (one-to-all personalized broadcast). The LogGP model is an extension of the LogP model for parallel computation which abstracts the communication of fixed-sized short messages through the use of four parameters: the communication latency (L), overhead (o), bandwidth (g), and the number of processors (P). As evidenced by experimental data, the LogP model can accurately predict communication performance when only short messages are sent (as on the CM-5). However, many existing parallel machines have special support for long messages and achieve a much higher bandwidth for long messages compared to short messages (e.g., IBM SP-2, Paragon, Meiko CS-2, Ncube/2). We extend the basic LogP model with a linear model for long messages. This combination, which we call the LogGP model of parallel computation, has one additional parameter, G, which captures the bandwidth obtained for long messages. Experimental data collected on the Meiko CS-2 shows that this simple extension of the LogP model can quite accurately predict communication performance for both short and long messages. This paper discusses algorithm design and analysis under the new model, examining the all-to-all remap, FFT, and radix sort. We also examine, in more detail, the single node scatter problem. We derive solutions for this problem and prove their optimality under the LogGP model. These solutions are qualitatively different from those obtained under the simpler LogP model, reflecting the importance of capturing long messages in a model.

555 citations

Proceedings ArticleDOI
Maged M. Michael1
10 Aug 2002
TL;DR: The experimental results show that the new algorithm outperforms the best known lock-free as well as lock-based hash table implementations by significant margins, and indicate that it is the algorithm of choice for implementing shared hash tables.
Abstract: Lock-free (non-blocking) shared data structures promise more robust performance and reliability than conventional lock-based implementations. However, all prior lock-free algorithms for sets and hash tables suffer from serious drawbacks that prevent or limit their use in practice. These drawbacks include size inflexibility, dependence on atomic primitives not supported on any current processor architecture, and dependence on highly-inefficient or blocking memory management techniques.Building on the results of prior researchers, this paper presents the first CAS-based lock-free list-based set algorithm that is compatible with all lock-free memory management methods. We use it as a building block of an algorithm for lock-free hash tables. In addition to being lock-free, the new algorithm is dynamic, linearizable, and space-efficient.Our experimental results show that the new algorithm outperforms the best known lock-free as well as lock-based hash table implementations by significant margins, and indicate that it is the algorithm of choice for implementing shared hash tables.

475 citations

Proceedings ArticleDOI
01 Jun 1998
TL;DR: A user-level thread scheduler for shared-memory multiprocessors, which achieves linear speedup whenever P is small relative to the parallelism T1/T∈fty .
Abstract: We present a user-level thread scheduler for shared-memory multiprocessors, and we analyze its performance under multiprogramming. We model multiprogramming with two scheduling levels: our scheduler runs at user-level and schedules threads onto a fixed collection of processes, while below this level, the operating system kernel schedules processes onto a fixed collection of processors. We consider the kernel to be an adversary, and our goal is to schedule threads onto processes such that we make efficient use of whatever processor resources are provided by the kernel. Our thread scheduler is a non-blocking implementation of the work-stealing algorithm. For any multithreaded computation with work T1 and critical-path length T∈fty , and for any number P of processes, our scheduler executes the computation in expected time O(T1/PA + T∈fty P/PA) , where PA is the average number of processors allocated to the computation by the kernel. This time bound is optimal to within a constant factor, and achieves linear speedup whenever P is small relative to the parallelism T1/T∈fty .

395 citations

Performance
Metrics
No. of papers from the Conference in previous years
YearPapers
202149
202068
201945
201848
201747
201654