Virtual time

doi:10.1145/3916.3988

Home
/
Papers
/
Virtual time

Journal Article•DOI•

Virtual time

David Jefferson¹•Institutions (1)

University of Southern California¹

01 Jul 1985-ACM Transactions on Programming Languages and Systems-Vol. 7, Iss: 3, pp 404-425

TL;DR: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control.

read less

Abstract: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control. Virtual time provides a flexible abstraction of real time in much the same way that virtual memory provides an abstraction of real memory. It is implemented using the Time Warp mechanism, a synchronization protocol distinguished by its reliance on lookahead-rollback, and by its implementation of rollback via antimessages.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A survey of rollback-recovery protocols in message-passing systems

[...]

Elmootazbellah Nabil Elnozahy¹, Lorenzo Alvisi², Yi-Min Wang³, David B. Johnson⁴•Institutions (4)

IBM¹, University of Texas at Austin², Microsoft³, Rice University⁴

01 Sep 2002-ACM Computing Surveys

TL;DR: This survey covers rollback-recovery techniques that do not require special language constructs and distinguishes between checkpoint-based and log-based protocols, which rely solely on checkpointing for system state restoration.

...read moreread less

Abstract: This survey covers rollback-recovery techniques that do not require special language constructs. In the first part of the survey we classify rollback-recovery protocols into checkpoint-based and log-based.Checkpoint-based protocols rely solely on checkpointing for system state restoration. Checkpointing can be coordinated, uncoordinated, or communication-induced. Log-based protocols combine checkpointing with logging of nondeterministic events, encoded in tuples called determinants. Depending on how determinants are logged, log-based protocols can be pessimistic, optimistic, or causal. Throughout the survey, we highlight the research issues that are at the core of rollback-recovery and present the solutions that currently address them. We also compare the performance of different rollback-recovery protocols with respect to a series of desirable properties and discuss the issues that arise in the practical implementations of these protocols.

...read moreread less

1,772 citations

Additional excerpts

...In the area of distributed discrete-event simulation [59, 124], the Time Warp optimistic approach, which inspired the seminal work on optimistic message logging [168], uses rollbacks to cancel erroneous computations due to the out-of-order arrivals of time-stamped event messages [59, 60, 85, 118, 141]....
[...]

Journal Article•DOI•

Parallel discrete event simulation

[...]

Richard M. Fujimoto¹•Institutions (1)

Georgia Institute of Technology¹

01 Oct 1990-Communications of The ACM

TL;DR: This article deals with the execution of a simulation program on a parallel computer by decomposing the simulation application into a set of concurrently executing processes and introduces interesting synchronization problems that are at the heart of the PDES problem.

...read moreread less

Abstract: Parallel discrete event simulation (PDES), sometimes called distributed simulation, refers to the execution of a single discrete event simulation program on a parallel computer. PDES has attracted a considerable amount of interest in recent years. From a pragmatic standpoint, this interest arises from the fact that large simulations in engineering, computer science, economics, and military applications, to mention a few, consume enormous amounts of time on sequential machines. From an academic point of view, parallel simulation is interesting because it represents a problem domain that often contains substantial amounts of parallelism (e.g., see [59]), yet paradoxically, is surprisingly difficult to parallelize in practice. A sufficiently general solution to the PDES problem may lead to new insights in parallel computation as a whole. Historically, the irregular, data-dependent nature of PDES programs has identified it as an application where vectorization techniques using supercomputer hardware provide little benefit [14].A discrete event simulation model assumes the system being simulated only changes state at discrete points in simulated time. The simulation model jumps from one state to another upon the occurrence of an event. For example, a simulator of a store-and-forward communication network might include state variables to indicate the length of message queues, the status of communication links (busy or idle), etc. Typical events might include arrival of a message at some node in the network, forwarding a message to another network node, component failures, etc.We are especially concerned with the simulation of asynchronous systems where events are not synchronized by a global clock, but rather, occur at irregular time intervals. For these systems, few simulator events occur at any single point in simulated time; therefore parallelization techniques based on lock-step execution using a global simulation clock perform poorly or require assumptions in the timing model that may compromise the fidelity of the simulation. Concurrent execution of events at different points in simulated time is required, but as we shall soon see, this introduces interesting synchronization problems that are at the heart of the PDES problem.This article deals with the execution of a simulation program on a parallel computer by decomposing the simulation application into a set of concurrently executing processes. For completeness, we conclude this section by mentioning other approaches to exploiting parallelism in simulation problems.Comfort and Shepard et al. have proposed using dedicated functional units to implement specific sequential simulation functions, (e.g., event list manipulation and random number generation [20, 23, 47]). This method can provide only a limited amount of speedup, however. Zhang, Zeigler, and Concepcion use the hierarchical decomposition of the simulation model to allow an event consisting of several subevents to be processed concurrently [21, 98]. A third alternative is to execute independent, sequential simulation programs on different processors [11, 39]. This replicated trials approach is useful if the simulation is largely stochastic and one is performing long simulation runs to reduce variance, or if one is attempting to simulate a specific simulation problem across a large number of different parameter settings. However, one drawback with this approach is that each processor must contain sufficient memory to hold the entire simulation. Furthermore, this approach is less suitable in a design environment where results of one experiment are used to determine the experiment that should be performed next because one must wait for a sequential execution to be completed before results are obtained.

...read moreread less

1,615 citations

Book•

Distributed Operating Systems

[...]

Andrew S. Tanenbaum¹, Robbert van Renesse¹•Institutions (1)

University of Amsterdam¹

30 Jan 2009

TL;DR: What constitutes a distributed operating system and how it is distinguished from a computer network are discussed, and several examples of current research projects are examined in some detail.

...read moreread less

Abstract: Distributed operating systems have many aspects in common with centralized ones, but they also differ in certain ways This paper is intended as an introduction to distributed operating systems, and especially to current university research about them After a discussion of what constitutes a distributed operating system and how it is distinguished from a computer network, various key design issues are discussed Then several examples of current research projects are examined in some detail, namely, the Cambridge Distributed Computing System, Amoeba, V, and Eden

...read moreread less

1,327 citations

Cites background from "Virtual time"

...[1985], Sturgis et al. [1980], Svobodova [1981], and Swinehart et al....
[...]

Book•

Parallel and Distributed Simulation Systems

[...]

Richard M. Fujimoto¹•Institutions (1)

Georgia Institute of Technology¹

01 Jan 2000

TL;DR: The article gives an overview of technologies to distribute the execution of simulation programs over multiple computer systems, with particular emphasis on synchronization (also called time management) algorithms as well as data distribution techniques.

...read moreread less

Abstract: Originating from basic research conducted in the 1970's and 1980's, the parallel and distributed simulation field has matured over the last few decades. Today, operational systems have been fielded for applications such as military training, analysis of communication networks, and air traffic control systems, to mention a few. The article gives an overview of technologies to distribute the execution of simulation programs over multiple computer systems. Particular emphasis is placed on synchronization (also called time management) algorithms as well as data distribution techniques.

...read moreread less

1,217 citations

Cites background or methods from "Virtual time"

...The Time Warp mechanism (Jefferson 1985) is the most well known optimistic method....
[...]
...A few years later, seminal work by Jefferson developed the Time Warp algorithm (Jefferson 1985)....
[...]
...Memory management in Time Warp dates back to Jefferson's original paper where the message sendback mechanism for flow control is discussed (Jefferson 1985)....
[...]

Proceedings Article•DOI•

Parallel Discrete Event Simulation

[...]

Richard M. Fujimoto¹•Institutions (1)

Georgia Institute of Technology¹

01 Oct 1989

TL;DR: This tutorial surveys the state of the art in executing discrete event simulation programs on a parallel computer, and focuses attention on asynchronous simulation programs where few events occur at any single point in simulated time.

...read moreread less

Abstract: This tutorial surveys the state of the art in executing discrete event simulation programs on a parallel computer. Specifically, we will focus attention on asynchronous simulation programs where few events occur at any single point in simulated time, necessitating the concurrent execution of events occurring at different points in time. We first describe the parallel discrete event simulation problem, and examine why it so difficult. We review several simulation strategies that have been proposed, and discuss the underlying ideas on which they are based. We critique existing approaches in order to clarify their respective strengths and weaknesses.

...read moreread less

1,201 citations

Additional excerpts

...The Time Warp mechanism, based on the Virtual Time paradigm, is the most well known optimistic protocol [26]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

References

PDF

Open Access

More filters

Book Chapter•DOI•

Time, clocks, and the ordering of events in a distributed system

[...]

Leslie Lamport¹•Institutions (1)

CA Technologies¹

04 Oct 2019-Concurrency and Computation: Practice and Experience

TL;DR: In this paper, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.

...read moreread less

Abstract: The concept of one event happening before another in a distributed system is examined, and is shown to define a partial ordering of the events. A distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events. The use of the total ordering is illustrated with a method for solving synchronization problems. The algorithm is then specialized for synchronizing physical clocks, and a bound is derived on how far out of synchrony the clocks can become.

...read moreread less

8,381 citations

Journal Article•DOI•

Time, clocks, and the ordering of events in a distributed system

[...]

Leslie Lamport¹•Institutions (1)

CA Technologies¹

01 Jul 1978-Communications of The ACM

TL;DR: In this article, the concept of one event happening before another in a distributed system is examined, and a distributed algorithm is given for synchronizing a system of logical clocks which can be used to totally order the events.

...read moreread less

6,804 citations

Journal Article•DOI•

Distributed snapshots: determining global states of distributed systems

[...]

K. Mani Chandy¹, Leslie Lamport²•Institutions (2)

University of Texas at Austin¹, SRI International²

01 Feb 1985-ACM Transactions on Computer Systems

TL;DR: An algorithm by which a process in a distributed system determines a global state of the system during a computation, which helps to solve an important class of problems: stable property detection.

...read moreread less

Abstract: This paper presents an algorithm by which a process in a distributed system determines a global state of the system during a computation. Many problems in distributed systems can be cast in terms of the problem of detecting global states. For instance, the global state detection algorithm helps to solve an important class of problems: stable property detection. A stable property is one that persists: once a stable property becomes true it remains true thereafter. Examples of stable properties are “computation has terminated,” “ the system is deadlocked” and “all tokens in a token ring have disappeared.” The stable property detection problem is that of devising algorithms to detect a given stable property. Global state detection can also be used for checkpointing.

...read moreread less

2,738 citations

Journal Article•DOI•

Ethernet: distributed packet switching for local computer networks

[...]

Robert M. Metcalfe¹, David R. Boggs¹•Institutions (1)

PARC¹

01 Jul 1976-Communications of The ACM

TL;DR: The design principles and implementation are described, based on experience with an operating Ethernet of 100 nodes along a kilometer of coaxial cable, of a model for estimating performance under heavy loads and a packet protocol for error controlled communication.

...read moreread less

Abstract: Ethernet is a branching broadcast communication system for carrying digital data packets among locally distributed computing stations. The packet transport mechanism provided by Ethernet has been used to build systems which can be viewed as either local computer networks or loosely coupled multiprocessors. An Ethernet's shared communication facility, its Ether, is a passive broadcast medium with no central control. Coordination of access to the Ether for packet broadcasts is distributed among the contending transmitting stations using controlled statistical arbitration. Switching of packets to their destinations on the Ether is distributed among the receiving stations using packet address recognition. Design principles and implementation are described, based on experience with an operating Ethernet of 100 nodes along a kilometer of coaxial cable. A model for estimating performance under heavy loads and a packet protocol for error controlled communication are included for completeness.

...read moreread less

1,701 citations

Journal Article•DOI•

Concurrency Control in Distributed Database Systems

[...]

Philip A. Bernstein, Nathan Goodman

01 Jun 1981-ACM Computing Surveys

TL;DR: A survey of concurrency control methods for distributed database concurrency can be found in this paper, where the authors decompose the problem into two major subproblems, read-write and write-write synchronization, and describe a series of synchromzation techniques for solving each subproblem.

...read moreread less

Abstract: In this paper we survey, consolidate, and present the state of the art in distributed database concurrency control. The heart of our analysts is a decomposition of the concurrency control problem into two major subproblems: read-write and write-write synchronization. We describe a series of synchromzation techniques for solving each subproblem and show how to combine these techniques into algorithms for solving the entire concurrency control problem. Such algorithms are called "concurrency control methods." We describe 48 principal methods, including all practical algorithms that have appeared m the literature plus several new ones. We concentrate on the structure and correctness of concurrency control algorithms. Issues of performance are given only secondary treatment.

...read moreread less

1,124 citations