scispace - formally typeset
Search or ask a question

Showing papers on "Distributed algorithm published in 1990"


Book
01 Aug 1990
TL;DR: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.
Abstract: This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels. The material concentrates on fundamental theories as well as techniques and algorithms. The advent of the Internet and the World Wide Web, and, more recently, the emergence of cloud computing and streaming data applications, has forced a renewal of interest in distributed and parallel data management, while, at the same time, requiring a rethinking of some of the traditional techniques. This book covers the breadth and depth of this re-emerging field. The coverage consists of two parts. The first part discusses the fundamental principles of distributed data management and includes distribution design, data integration, distributed query processing and optimization, distributed transaction management, and replication. The second part focuses on more advanced topics and includes discussion of parallel database systems, distributed object management, peer-to-peer data management, web data management, data stream systems, and cloud computing. New in this Edition: New chapters, covering database replication, database integration, multidatabase query processing, peer-to-peer data management, and web data management. Coverage of emerging topics such as data streams and cloud computing Extensive revisions and updates based on years of class testing and feedback Ancillary teaching materials are available.

2,395 citations


Journal ArticleDOI
TL;DR: In this article, a comprehensive study of the problem of scheduling broadcast transmissions in a multihop, mobile packet radio network is provided that is based on throughput optimization subject to freedom from interference.
Abstract: A comprehensive study of the problem of scheduling broadcast transmissions in a multihop, mobile packet radio network is provided that is based on throughput optimization subject to freedom from interference. It is shown that the problem is NP complete. A centralized algorithm that runs in polynomial time and results in efficient (maximal) schedules is proposed. A distributed algorithm that achieves the same schedules is then proposed. The algorithm results in a maximal broadcasting zone in every slot. >

498 citations


Journal ArticleDOI
TL;DR: A distributed version of GENITOR which uses many smaller distributed populations in place of a single large population is introduced, and is able to optimize a broad range of sample problems more accurately and more consistently than GENITor with a single population.
Abstract: GENITOR is a genetic algorithm which employs one-at-a-time reproduction and allocates reproductive opportunities according to rank to achieve selective pressure. Theoretical arguments and empirical evidence suggest that GENITOR is less vulnerable to some of the biases that degrade performance in standard genetic algorithms. A distributed version of GENITOR which uses many smaller distributed populations in place of a single large population is introduced. GENITOR II is able to optimize a broad range of sample problems more accurately and more consistently than GENITOR with a single population. GENITOR II also appears to be more robust than a single population genetic algorithm, yielding better performance without parameter tuning. We present some preliminary analyses to explain the performance advantage of the distributed algorithm. A distributed search is shown to yield improved search on several classes of problems, including binary encoded feedforward neural networks, the Traveling Salesman Pr...

294 citations


08 Aug 1990
TL;DR: An important problem in distributed computing is to provide a user with a non-distributed view of a distributed system to implement a distributed file system that allows the client programmer to ignore the physical location of his data.
Abstract: Jan van Leeuwen asked me to write a chapter on distributed systems for this handbook. I realized that I wasn’t familiar enough with the literature on distributed algorithms to write it by myself, so I asked Nancy Lynch to help. I also observed that there was no chapter on assertional verification of concurrent algorithms. (That was probably due to the handbook’s geographical origins, since process algebra rules in Europe.) So I included a major section on proof methods. As I recall, I wrote most of the first three sections and Lynch wrote the fourth section on algorithms pretty much by herself.

160 citations


Journal ArticleDOI
TL;DR: Two new translation mechanisms for synchronous systems are described that can be used to translate any protocol tolerant of the most benign failures into a protocol tolerantOf the most severe with respect to fault-tolerance.

133 citations


Journal ArticleDOI
David Peleg1
TL;DR: This note presents a simple time-optimal distributed algorithm for electing a leader in a general network that is also message-Optimal and thus performs better than previous algorithms for the problem.

113 citations



Journal ArticleDOI
TL;DR: A column-oriented distributed algorithm for factoring a large sparse symmetric positive definite matrix on a local-memory parallel processor that achieves good speedups on an Intel iPSC/2 hypercube.
Abstract: This paper presents a column-oriented distributed algorithm for factoring a large sparse symmetric positive definite matrix on a local-memory parallel processor. Processors cooperate in computing each column of the Cholesky factor by calculating independent updates to the corresponding column of the original matrix. These updates are sent in a fan-in manner to the processor assigned to the column, which then completes the computation. Experimental results on an Intel iPSC/2 hypercube demonstrate that the method is effective and achieves good speedups.

83 citations


Proceedings ArticleDOI
01 Aug 1990
TL;DR: The control protocols of the PARIS experimental network are described, which is currently operational as a laboratory prototype and will also be deployed within the AURORA Testbed that is part of the NSF/DARPA Gigabit Networking program.
Abstract: 1 Introduction We describe the control protocols of the PARIS experimental network. This high bandwidth network for integrated communication (data, voice, video) ia currently operational as a laboratory prototype. It will also be deployed within the AURORA Testbed that is part of the NSF/DARPA Gigabit Networking program. The high bandwidth dictates the need of specialized hardware to support faster packet handling and control protocols. A new network control architecture is presented which exploits the specialized hardware in order to support the expected real time needs of future traffic. In particular, since control information can be distributed quickly, decisions can be made based upon more complete and accurate information. In some respects , this has the effect of having the benefits of centralized control (e.g. easier bandwidth resource allocation to connections), while retaining the fault-tolerance and scalability of a distributed architecture. Packet switching networks have changed considerably in recent years. One factor has been the dramatic increase in the capacity of the communication links. The advent of fiber optic media has pushed the transmission speed of communication links to more than a Gigabit/set, representing an increase of several orders of magnitude over typical links in most packet switching networks ([KMS87]) that are still in use today. Increases in link speeds have not been matched by proportionate increases in the processing speeds of communication nodes. Another factor is the changed nature of t&Tic carried by these networks. As opposed to solely data networks, or solely voice networks, it is now accepted that packet switching networks (or variants of packet switching networks like ATM ([K&37])) will form the basis for multimedia high speed networks that will carry voice, data and video through a common set of nodes and links. The disparity between communication and processing speeds suggests that processing may become the main bottleneck in future networks. To avoid this possibility, these networks will be built with high speed switching hardware to off-load the routine packet handling and routing functions from the processor ([CGK88]). In addition, the real time trafEc (e.g. voice) requires that the route selection function be capable of guaranteeing the avaiiability of bandwidth on the links along the chosen path for a particular traffic stream. Otherwise, conges-Permission to copy without fee all or part of this material is granted provided that the copies are not made or dixr-ibuted for direct commercial advantage, the ACM copyright notice and the title of …

79 citations


Proceedings ArticleDOI
26 Jun 1990
TL;DR: A DSD project that consists of the implementation of a distributed self-diagnosis algorithm and its application to distributed computer networks is presented and the EVENT-SELF algorithm presented combines the rigor associated with theoretical results with the resource limitations associated with actual systems.
Abstract: A DSD (distributed self-diagnosing) project that consists of the implementation of a distributed self-diagnosis algorithm and its application to distributed computer networks is presented. The EVENT-SELF algorithm presented combines the rigor associated with theoretical results with the resource limitations associated with actual systems. Resource limitations identified in real systems include available message capacity for the communication network and limited processor execution speed. The EVENT-SELF algorithm differs from previously published algorithms by adopting an event-driven approach to self-diagnosability. Algorithm messages are reduced to those messages required to indicate changes in system those messages required to indicate changes in system state. Practical issues regarding the CMU-ECE DSD implementation are considered. These issues include the reconfiguration of the testing subnetwork for environments in which processors can be added and removed. One of the goals of this work is to utilize the developed CMU-ECE DSD system as an experimental test-bed environment for distributed applications. >

67 citations


Journal ArticleDOI
TL;DR: Project Athena, established in 1983 to improve the quality of education at MIT by providing campuswide, high-quality computing based on a large network of workstations, is discussed, focusing on the design of Athena's distributed workstation system.
Abstract: Project Athena, established in 1983 to improve the quality of education at MIT (Massachussetts Institute of Technology) by providing campuswide, high-quality computing based on a large network of workstations, is discussed, focusing on the design of Athena's distributed workstation system. The requirements of the system are outlined distributed system models are reviewed, other distributed operating systems are described, and issues in distributed systems are examined. The distributed-system model for Athena is discussed. Athena has three major components; workstations a network, and servers. The approach taken by the Athena developers was to implement a set of network services to replace equivalent time-sharing services, in essence converting the time-sharing Unix model into a distributed operating system. >

Proceedings ArticleDOI
09 Oct 1990
TL;DR: There is a tradeoff between efficiency and reliability, and a system can be designed to balance these two criteria properly and achieve a higher degree of fault tolerance at the expense of increased message traffic.
Abstract: A fault-tolerant mutual exclusion algorithm for distributed systems is presented. The algorithm uses a distributed queue strategy and maintains alternative paths at each site to provide a high degree of fault tolerance. However, owing to these alternative paths, the algorithm must use reverse messages to avoid the occurrence of directed cycles, which may form when the direction of edges is reversed after the token passes through. If there is no alternative path, the total number of the messages exchanged is O (2*log N) in light traffic and two messages in heavy traffic; however, in this case the system cannot tolerate even a single communication link or site failure. If there are alternative paths between sites, the system can achieve a higher degree of fault tolerance at the expense of increased message traffic (owing to reverse messages). Thus, there is a tradeoff between efficiency and reliability, and a system can be designed to balance these two criteria properly. A recovery procedure for restoring a recovering site consistently into the system is also presented. >

Journal ArticleDOI
M. Ahuja1
TL;DR: Three channel primitives for sending messages: two-way- flush, forward-flush, and backward-flush are presented, collectively termed Flush, which can permit as much concurrency as non-FIFO channels and yet retain the properties of FIFOannels.

Journal ArticleDOI
TL;DR: A novel neural network parallel algorithm for sorting problems is presented that requires only two steps, and does not depend on the size of the problem, while the conventional parallel sorting algorithm using O(n) processors by F.T. Leighton (1984) needs the computation time O(log n/sup 2/).
Abstract: A novel neural network parallel algorithm for sorting problems is presented. The proposed algorithm using O(n/sup 2/) processors requires only two steps, and does not depend on the size of the problem, while the conventional parallel sorting algorithm using O(n) processors by F.T. Leighton (1984) needs the computation time O(log n/sup 2/). A set of simulation results substantiates the proposed algorithm. The hardware system based on the proposed parallel algorithm is also presented. >

Journal ArticleDOI
TL;DR: Simple distributed algorithms for successfully embedding a ring of size at least 2 n −2 f in an n -cube with f ⩽⌞(n + 1)/2⌟ faults are contributed.

Journal ArticleDOI
TL;DR: In this article, a distributed algorithm is simplified by ignoring the time needed to send and deliver messages and instead pretending that a process sends a collection of messages as a single atomic action, with the messages delivered instantaneously as part of the action.
Abstract: Reasoning about a distributed algorithm is simplified if we can ignore the time needed to send and deliver messages and can instead pretend that a process sends a collection of messages as a single atomic action, with the messages delivered instantaneously as part of the action. A theorem is derived that proves the validity of such reasoning for a large class of algorithms. It generalizes and corrects a well-known folk theorem about when an operation in a multiprocess program can be considered atomic.

Proceedings ArticleDOI
01 Jan 1990
TL;DR: The problem of finding an optimal assignment of task modules with a precedence relationship in a distributed computing system is considered, and a well-known state-space reduction technique, branch-and-bound-with-underestimates, is applied, and two underestimate functions are defined.
Abstract: The problem of finding an optimal assignment of task modules with a precedence relationship in a distributed computing system is considered. The objective of task assignment is to minimize the task turnaround time. The problem is known to be NP-complete for more than three processors. To solve the problem, a well-known state-space reduction technique, branch-and-bound-with-underestimates, is applied, and two underestimate functions are defined. Through experiments, their effectiveness is shown by comparing the proposed algorithm with both Wang and Tsai's (1988) algorithm and the A* algorithm with h(x)=0. >

01 Jan 1990
TL;DR: In this paper, a large class of problems that can be solved using logical clocks as if they were perfectly synchronized clocks is formally characterized, and a broadcast primitive is also proposed to simplify the task of designing and verifying distributed algorithms.
Abstract: Time and knowledge are studied in synchronous and asynchronous distributed systems. A large class of problems that can be solved using logical clocks as if they were perfectly synchronized clocks is formally characterized. For the same class of problems, a broadcast primitive that can be used as if it achieves common knowledge is also proposed. Thus, logical clocks and the broadcast primitive simplify the task of designing and verifying distributed algorithms: The designer can assume that processors have access to perfectly synchronized clocks and the ability to achieve common knowledge.

Proceedings ArticleDOI
28 May 1990
TL;DR: A straightforward and efficient algorithm for optimal load balancing of multiclass jobs is derived and it is shown that for obtaining the optimal solution the authors' algorithm and the Dafermos algorithm require comparable computation times that are far less than that of the FD algorithm.
Abstract: This model is an extension of the Tantawi and Towsley (1985) single-job-class model as applied to a multiple-job-class model. Some properties of the optimal solution are shown. On the basis of these properties, a straightforward and efficient algorithm for optimal load balancing of multiclass jobs is derived. The performance of this algorithm is compared with that of two other well-known algorithms for multiclass jobs, the flow deviation (FD) algorithm and the Dafermos algorithm. The authors' algorithm and the FD algorithm both require a comparable amount of storage that is far less than that required by the Dafermos algorithm. Numerical experiments show that for obtaining the optimal solution the authors' algorithm and the Dafermos algorithm require comparable computation times that are far less than that of the FD algorithm. >

Proceedings ArticleDOI
05 Dec 1990
TL;DR: An algorithm is presented by which nonfaulty processors of a group of fixed size will be able to maintain a consistent and timely knowledge of the group membership.
Abstract: An algorithm is presented by which nonfaulty processors of a group of fixed size will be able to maintain a consistent and timely knowledge of the group membership. The authors assume an architecture in which the broadcast network is accessed by some time domain multiplexing techniques where the exclusive right to transmit messages is granted to each processor once in every 'cycle'. In an execution of the proposed algorithm, every nonfaulty processor knows of any processor failure within at most two cycles following the cycle in which the failure occurred, and a restarted processor can join the group in two cycles. At most less than half the number of processors are assumed to fail in any three consecutive cycles. >

Book
01 Jan 1990
TL;DR: This paper presents three case studies of Gaussian elimination in vector multiprocessor computing, a model system for Gaussian elimation, and methodologies for systolic arrays for dependence mapping method, complexity results, folding.
Abstract: Introduction: background - Gaussian elimination, speedup and efficiency vector and parallel architectures: pipeline computers vector computers parallel computers three case studies. Part 1 Parallel algorithm design - vector multiprocessor computing - vectorization of vector-vectr operations, Gaussian elimination in terms of vector-vector kernels, vector register re-use, Gaussian elimination interms of matrix-vector kernels, cache re-use, Gaussian elimination in terms of matrix-matrix kernels, vectorization epilogue, fine-grain parallelism, parallel Gaussian elimination hypercube computing - topological properties of hypercubes, broadcasting, centralized Gaussian elimination, local pipelined algorithms, a word on speedup evaluation, matrices over finite fields systolic computing - 2D arrays, solving the triangular system on the fly, 1D arrays, matrices over finite fields. Part 2 Models and tools: task graph scheduling - task system for Gaussian elimation, bounds for parallel execution, an optimal schedule, with an arbitrary number of processors analysis of distributed algorithms - data allocation strategies, speedup evaluation on distributed memory machines design methodologies for systolic arrays - dependence mapping method, complexity results, folding.

Proceedings ArticleDOI
03 Jul 1990
TL;DR: The relationship between CRS and distributed computing is discussed and solutions to two problems encountered in designing pattern generation protocols for CRS, related to distributed mutual exclusion problem and distributed deadlock detection problem, are presented.
Abstract: Cellular robotic systems (CRS) employ a large number of robots operating in cellular spaces under distributed control. In this paper, the relationship between CRS and distributed computing is discussed. Two problems encountered in designing pattern generation protocols for CRS, the n-way intersection problem and the knot detection problem, are related to distributed mutual exclusion problem and distributed deadlock detection problem, respectively. Solutions to these two problems, derived from their counterparts in distributed computing, are presented in the CRS context. >

Journal ArticleDOI
TL;DR: Two algorithms developed utilizing a priority-based event-ordering which manage mutual exclusion in distributed systems—computer networks—are proposed, which are fully distributed and are insensitive to the relative speeds of node computers and communication links.

Journal ArticleDOI
TL;DR: This paper proposes that a resource management system for large distributed systems should have two levels --- a lower one, responsible for export and allocation of resources in local distributed systems, and an upper one, which manages special resources/services that are not provided locally.
Abstract: In this paper, we propose that a resource management system for large distributed systems should have two levels --- a lower one, responsible for export and allocation of resources in local distributed systems, and an upper one, which manages special resources/services that are not provided locally. For a local environment, load balancing (implementing export and allocation of computational resources) is realized in a distributed way; and management of peripheral resources is developed based on a name server, which can be centralized, or distributed and replicated. The upper level has a centralized resource management center, which is responsible for export and allocation of both peripheral and computational resources. It contains two parts: a name server, which stores attributed names of all shareable resources and a resource manager, which allocates resources to requesting users of a large distributed system. Communication between the resource management center and the local systems is facilitated through integrating modules. This system is now designed based on the RHODOS distributed operating system.

Proceedings ArticleDOI
09 Oct 1990
TL;DR: Replicated execution of distributed programs, which provides a means of masking hardware (processor) failures in a distributed system, is discussed and a generic mechanism for ensuring that nonfaulty replicas process messages in identical order, thereby preventing state divergence among such replicate entities, is presented.
Abstract: Replicated execution of distributed programs, which provides a means of masking hardware (processor) failures in a distributed system, is discussed. Application-level entities (processes, objects) are replicated to execute on distinct processors. Such replica entities communicate by message passing. Nondeterminism within the replicas could cause messages to be processed in nonidentical order, producing a divergence of state. Possible sources of nondeterminism are identified, and a generic mechanism for ensuring that nonfaulty replicas process messages in identical order, thereby preventing state divergence among such replicate entities, is presented. >

Book ChapterDOI
01 Mar 1990
TL;DR: A distributed algorithm for searching game trees using a general strategy for distributed computing that can be applied also to other search algorithms and two new concepts are introduced in order to reduce search overhead and communication overhead.
Abstract: We present a distributed algorithm for searching game trees. A general strategy for distributed computing is used that can be applied also to other search algorithms. Two new concepts are introduced in order to reduce search overhead and communication overhead: the “Young Brothers Wait Concept” and the “Helpful Master Concept”. We describe some properties of our distributed algorithm including optimal speedup on best ordered game trees.

Book ChapterDOI
01 Nov 1990
TL;DR: It is shown that O(kn) messages are sufficient for rollingback all of the processors to the maximum consistent states when there are k failures, and for recovery in general networks and in ring networks Θ(n) message are necessary and sufficient when an arbitrary number of processors fail.
Abstract: We consider the problem of recovering from processor failures efficiently in distributed systems. Each message received is logged in volatile storage when it is processed. At irregular intervals, each processor independently saves the contents of its volatile storage in stable storage. By appending only O(1) extra information to each message, we show that for recovery in general networks O(n2) messages are sufficient and in ring networks Θ(n) messages are necessary and sufficient when an arbitrary number of processors fail. By appending O(n) extra information to each message that is sent, we show that O(kn) messages are sufficient for rollingback all of the processors to the maximum consistent states when there are k failures.

Proceedings ArticleDOI
26 Jun 1990
TL;DR: By utilizing the structure of objects and operation invocations, the authors have derived efficient algorithms that involve fewer participants than when invocations are treated as messages and existing algorithms for message-based systems are used.
Abstract: Checkpointing and rollback-recovery algorithms in distributed object-based systems are presented. By utilizing the structure of objects and operation invocations, the authors have derived efficient algorithms that involve fewer participants than when invocations are treated as messages and existing algorithms for message-based systems are used. It is planned to implement these algorithms and evaluate their performance in the context of the Clouds project at Georgia Tech. >