Showing papers in &quot;ACM Transactions on Computer Systems in 1985&quot;

Optimistic recovery in distributed systems

TL;DR: An algorithm by which a process in a distributed system determines a global state of the system during a computation, which helps to solve an important class of problems: stable property detection.

...read moreread less

Abstract: This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties are “computation has terminated,” “ the system is deadlocked” and “all tokens in a token ring have disappeared.” The stable property detection problem is that of devising algorithms to detect a given stable property. Global state detection can also be used for checkpointing.

...read moreread less

2,738 citations

Journal Article•DOI•

[...]

Rob Strom¹, Shaula Yemini¹•Institutions (1)

IBM¹

01 Aug 1985-ACM Transactions on Computer Systems

TL;DR: Optimistic Recovery is a new technique supporting application-independent transparent recovery from processor failures in distributed systems that can tolerate the failure of an arbitrary number of processors and yields better throughput and response time than other general recovery techniques whenever failures are infrequent.

...read moreread less

Abstract: Optimistic Recovery is a new technique supporting application-independent transparent recovery from processor failures in distributed systems. In optimistic recovery communication, computation and checkpointing proceed asynchronously. Synchronization is replaced by causal dependency tracking, which enables a posteriori reconstruction of a consistent distributed system state following a failure using process rollback and message replay.Because there is no synchronization among computation, communication, and checkpointing, optimistic recovery can tolerate the failure of an arbitrary number of processors and yields better throughput and response time than other general recovery techniques whenever failures are infrequent.

...read moreread less

784 citations

Journal Article•DOI•

A √N algorithm for mutual exclusion in decentralized systems

[...]

Mamoru Maekawa¹•Institutions (1)

University of Tokyo¹

Distributed process groups in the V Kernel

TL;DR: An algorithm is presented that uses only only c√N messages to create mutual exclusion in a computernetwork, where N is the number of nodes and c is a constant between 3 and 5.

...read moreread less

Abstract: An algorithm is presented that uses only c√N messages to create mutual exclusion in a computernetwork, where N is the number of nodes and c a constant between 3 and 5. The algorithm issymmetric and allows fully parallel operation.

...read moreread less

782 citations

Journal Article•DOI•

[...]

David R. Cheriton¹, Willy Zwaenepoel²•Institutions (2)

Stanford University¹, Rice University²

A distributed mutual exclusion algorithm

TL;DR: The V kernel as discussed by the authors is a distributed kernel that supports an abstraction of processes, with operations for interprocess communication, process management, and memory management, which is used as a software base for constructing distributed systems.

...read moreread less

Abstract: The V kernel supports an abstraction of processes, with operations for interprocess communication, process management, and memory management. This abstraction is used as a software base for constructing distributed systems. As a distributed kernel, the V kernel makes intermachine boundaries largely transparent.In this environment of many cooperating processes on different machines, there are many logical groups of processes. Examples include the group of tile servers, a group of processes executing a particular job, and a group of processes executing a distributed parallel computation.In this paper we describe the extension of the V kernel to support process groups. Operations on groups include group interprocess communication, which provides an application-level abstraction of network multicast. Aspects of the implementation and performance, and initial experience with applications are discussed.

...read moreread less

410 citations

Journal Article•DOI•

[...]

Ichiro Suzuki¹, Tadao Kasami²•Institutions (2)

Texas Tech University¹, Osaka University²

Disk cache—miss ratio analysis and design considerations

TL;DR: A distributed algorithm is presented that realizes mutual exclusion among N nodes in a computer network that requires at most N message exchanges for one mutual exclusion invocation.

...read moreread less

Abstract: A distributed algorithm is presented that realizes mutual exclusion among N nodes in a computer network. The algorithm requires at most N message exchanges for one mutual exclusion invocation. Accordingly, the delay to invoke mutual exclusion is smaller than in an algorithm of Ricart and Agrawala, which requires 2*(N - 1) message exchanges per invocation. A drawback of the algorithm is that the sequence numbers contained in the messages are unbounded. It is shown that this problem can be overcome by slightly increasing the number of message exchanges.

...read moreread less

333 citations

Journal Article•DOI•

[...]

Alan Jay Smith¹•Institutions (1)

University of California, Berkeley¹

01 Aug 1985-ACM Transactions on Computer Systems

TL;DR: It is found that disk cache is a powerful means of extending the performance limits of high-end computer systems.

...read moreread less

Abstract: The current trend of computer system technology is toward CPUs with rapidly increasing processing power and toward disk drives of rapidly increasing density, but with disk performance increasing very slowly if at all. The implication of these trends is that at some point the processing power of computer systems will be limited by the throughput of the input/output (I/O) system.A solution to this problem, which is described and evaluated in this paper, is disk cache. The idea is to buffer recently used portions of the disk address space in electronic storage. Empirically, it is shown that a large (e.g., 80-90 percent) fraction of all I/O requests are captured by a cache of an 8-Mbyte order-of-magnitude size for our workload sample. This paper considers a number of design parameters for such a cache (called cache disk or disk cache), including those that can be examined experimentally (cache location, cache size, migration algorithms, block sizes, etc.) and others (access time, bandwidth, multipathing, technology, consistency, error recovery, etc.) for which we have no relevant data or experiments. Consideration is given to both caches located in the I/O system, as with the storage controller, and those located in the CPU main memory. Experimental results are based on extensive trace-driven simulations using traces taken from three large IBM or IBM-compatible mainframe data processing installations. We find that disk cache is a powerful means of extending the performance limits of high-end computer systems.

...read moreread less

233 citations

Journal Article•DOI•

The Alpine file system

[...]

M. R. Brown¹, K. N. Kolling¹, E. A. Taft¹•Institutions (1)

Xerox¹

Performance of the VAX-11/780 translation buffer: simulation and measurement

TL;DR: Alpine is a file system that supports atomic transactions and is designed to operate as a service on a computer network, written in Cedar, a strongly typed modular programming language that includes garbage-collected storage.

...read moreread less

Abstract: Alpine is a file system that supports atomic transactions and is designed to operate as a service on a computer network. Alpine's primary purpose is to store files that represent databases. An important secondary goal is to store ordinary files representing documents, program modules, and the like.Unlike other file servers described in the literature, Alpine uses a log-based technique to implement atomic file update. Another unusual aspect of Alpine is that it performs all communication via a general-purpose remote procedure call facility. Both of these decisions have worked out well. This paper describes Alpine's design and implementation, and evaluates the system in light of our experience to date.Alpine is written in Cedar, a strongly typed modular programming language that includes garbage-collected storage. We report on using the Cedar language and programming environment to develop Alpine.

...read moreread less

145 citations

Journal Article•DOI•

[...]

Douglas W. Clark, Joel Emer

Secure communication using remote procedure calls

TL;DR: In this article, the authors present the results of a set of measurements and simulations of translation buffer performance in the VAX-11/780 computers, and measurement were made under normal time sharing use and while running reproducible synthetic time-sharing work loads.

...read moreread less

Abstract: A virtual-address translation buffer (TB) is a hardware cache of recently used virtual-to-physical address mappings. The authors present the results of a set of measurements and simulations of translation buffer performance in the VAX-11/780. Two different hardware monitors were attached to VAX-11/780 computers, and translation buffer behavior was measured. Measurements were made under normal time-sharing use and while running reproducible synthetic time-sharing work loads. Reported measurements include the miss ratios of data and instruction references, the rate of TB invalidations due to context switches, and the amount of time taken to service TB misses. Additional hardware measurements were made with half the TB disabled. Trace-driven simulations of several programs were also run; the traces captured system activity as well as user-mode execution. Several variants of the 11/780 TB structure were simulated.

...read moreread less

128 citations

Journal Article•DOI•

[...]

Andrew D. Birrell¹•Institutions (1)

Xerox¹

On the power of cascade ciphers

TL;DR: The design of an end-to-end secure protocols, built as part of a remote procedure call package, the security abstraction presented to users, the authentication mechanisms, and the protocol for encrypting and verifying remote calls are described.

...read moreread less

Abstract: Research on encryption-based secure communication protocols has reached a stage where it is feasible to construct end-to-end secure protocols. The design of such a protocol, built as part of a remote procedure call package, is described. The security abstraction presented to users of the package, the authentication mechanisms, and the protocol for encrypting and verifying remote calls are also described.

...read moreread less

124 citations

Journal Article•DOI•

[...]

Shimon Even¹, Oded Goldreich¹•Institutions (1)

Technion – Israel Institute of Technology¹

A discipline for constructing multiphase communication protocols

TL;DR: It is shown that, with high probability, the number of permutations realizable by a cascade of random ciphers, each having lkk key bits, is 2, and that two stages are not worse than one.

...read moreread less

Abstract: The unicity distance of a cascade of random ciphers, with respect to known plaintext attack, is shown to be the sum of the key lengths. A time-space trade-off for the exhaustive cracking of a cascade of ciphers is shown. The structure of the set of permutations realized by a cascade is studied; it is shown that only l.2k exhaustive experiments are necessary to determine the behavior of a cascade of l stages, each having k key bits. It is concluded that the cascade of random ciphers is not a random cipher. Yet, it is shown that, with high probability, the number of permutations realizable by a cascade of l random ciphers, each having k key bits, is 2lk. Next, it is shown that two stages are not worse than one, by a simple reduction of the cracking problem of any of the stages to the cracking problem of the cascade. Finally, it is shown that proving a nonpolynomial lower bound on the cracking problem of long cascades is a hard task, since such a bound implies that P n NP.

...read moreread less

89 citations

Journal Article•DOI•

[...]

Ching-Hua Chow¹, Mohamed G. Gouda¹, Simon S. Lam¹•Institutions (1)

University of Texas at Austin¹

Determining the last process to fail

TL;DR: A multiphase model for such protocols is presented and it is shown how to connect these phases together to realize the multifunction protocol.

...read moreread less

Abstract: Many communication protocols can be observed to go through different phases performing a distinct function in each phase. A multiphase model for such protocols is presented. A phase is formally defined to be a network of communicating finite-state machines with certain desirable correctness properties; these include proper termination and freedom from deadlocks and unspecified receptions. A multifunction protocol is constructed by first constructing separate phases to perform its different functions. It is shown how to connect these phases together to realize the multifunction protocol so that the resulting network of communicating finite state machines is also a phase (i.e., it possesses the desirable properties defined for phases). The modularity inherent in multiphase protocols facilitates not only their construction but also their understanding and modification. An abundance of protocols have been found in the literature that can be constructed as multiphase protocols. Three examples are presented here: two versions of IBM's BSC protocol for data link control and a token ring network protocol.

...read moreread less

Journal Article•DOI•

[...]

Dale Skeen¹•Institutions (1)

Cornell University¹

Error bounds for performance prediction in queuing networks

TL;DR: Nessary and sufficient conditions are derived here for computing LAST from the local failure data of recovered processes, and these conditions are then translated into procedures for deciding LAST membership, using either complete or incomplete failure data.

...read moreread less

Abstract: A total failure occurs whenever all processes cooperatively executing a distributed task fail before the task completes. A frequent prerequisite for recovery from a total failure is identification of the last set (LAST) of processes to fail. Necessary and sufficient conditions are derived here for computing LAST from the local failure data of recovered processes. These conditions are then translated into procedures for deciding LAST membership, using either complete or incomplete failure data. The choice of failure data is itself dictated by two requirements: (1) it can be cheaply maintained, and (2) it must afford maximum fault-tolerance in the sense that the expected number of recoveries required for identifying LAST is minimized.

...read moreread less

Journal Article•DOI•

[...]

Y. C. Tay¹, Rajan Suri²•Institutions (2)

National University of Singapore¹, Harvard University²

01 Aug 1985-ACM Transactions on Computer Systems

TL;DR: This paper studies the effect of small estimation errors and provides bounds on prediction errors based on bounds on estimation errors, and results are illustrated with three examples.

...read moreread less

Abstract: Analytic models based on closed queuing networks (CQNS) are widely used for performance prediction in practical systems. In using such models, there is always a prediction error, that is, a difference between the predicted performance and the actual outcome. This prediction error is due both to modeling errors and estimation errors, the latter being the difference between the estimated values of the CQN parameters and the actual outcomes. This paper considers the second class of errors; in particular, it studies the effect of small estimation errors and provides bounds on prediction errors based on bounds on estimation errors. Estimation errors may be divided into two types: (1) the difference between the estimated value and the average value of the outcome, and (2) the deviation of the actual value from its average. The analysis first studies the sum of both types of errors, then the second type alone. The results are illustrated with three examples.

...read moreread less

Journal Article•DOI•

Performance analysis of redundant-path networks for multiprocessor systems

[...]

Krishnan Padmanabhan¹, Duncan H. Lawrie²•Institutions (2)

Bell Labs¹, University of Illinois at Urbana–Champaign²

A recursive algorithm for binary multiplication and its implementation

TL;DR: Improvements in performance and very graceful degradation are also shown to result from the availability of redundant paths.

...read moreread less

Abstract: Performance of a class of multistage interconnection networks employing redundant paths is investigated. Redundant path networks provide significant tolerance to faults at minimal costs; in this paper improvements in performance and very graceful degradation are also shown to result from the availability of redundant paths. A Markov model is introduced for the operation of these networks in the circuit-switched mode and is solved numerically to obtain the performance measures of interest. The structure of the networks that provide maximal performance is also characterized.

...read moreread less

Journal Article•DOI•

[...]

Renato De Mori¹, Régis Cardin¹•Institutions (1)

Concordia University Wisconsin¹