scispace - formally typeset
Search or ask a question
Journal ArticleDOI

The cost of conservative synchronization in parallel discrete event simulations

01 Apr 1993-Journal of the ACM (ACM)-Vol. 40, Iss: 2, pp 304-333
TL;DR: It is shown that on large problems—those for which parallel processing is ideally suited— there is often enough parallel workload so that processors are not usually idle, and the method is within a constant factor of optimal.
Abstract: This paper analytically studies the performance of a synchronous conservative parallel discrete-event simulation protocol The class of models considered simulates activity in a physical domain, and possesses a limited ability to predict future behavior Using a stochastic model, it is shown that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approaches the complexity of the average per-event overhead of a serial simulation, sometimes rapidly The method is therefore within a constant factor of optimal The result holds for the worst case “fully-connected” communication topology, where an event in any other portion of the domain can cause an event in any other protion of the domain Our analysis demonstrates that on large problems—those for which parallel processing is ideally suited— there is often enough parallel workload so that processors are not usually idle It also demonstrated the viability of the method empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor

Summary (1 min read)

Jump to: [Under] and [3.3 Example]

Under

  • The assumption of non-zero duration times, it will always be true that w,_ < 6(w,_).
  • Simulation time advances each window (even if no events occur in the window), and deadlock never occurs.

3.3 Example

  • An All measurementsreported are taken fi'om a thirty-two processormachine.
  • Each simulation model was run long enough to generateseveralmillions of events.
  • The execution time was typically a minute or two, oncethe problem wasloadedand running.
  • The measuredperformancesupports their analysis, and actually becomesquite good on large problems.
  • One can translate such efflcienciesinto "speedup"figuresby multiplying by the number of processors used,provided the resulting numbers are properly interpreted.

Did you find this useful? Give us your feedback

Content maybe subject to copyright    Report

NASA Contractor Report 182034
ICASE Report No. 90-20
ICASE
THE COST OF CONSERVATIVE SYNCHRONIZATION IN
PARALLEL DISCRETE EVENT SIMULATIONS
David M. Nicol
Contract No. NAS1-18605
May 1990
Institute for Computer Applications in Science and Engineering
NASA Langley Research Center
Hampton, Virginia 23665-5225
Operated by the Universities Space Research Association
(NASA-CR-1820_4) THE C_ST OF CF)NSERVATIVE
SYNCHRONI!ATIqN IN PArALLeL DISCR_TF EVENT
SIMULATIONS Final Report (ICASF) 32 p
CSCL 09B
III/ A
National Aeronautics and
Space Adminislralion
Langley Research Center
Hamplon, Virginia 23665-5225
G3/ol
Ngo-z3912


The
Cost of Conservative Synchronization in
Parallel Discrete Event Simulations
David M. Nicol
Department of Computer Science *
College of William and Mary
May 7, 1990
Abstract
This paper analytically studies the performance of a synchronous conserwtive par-
allel discrete-event simulation protocol. The class of simulation models considered are
oriented around a physical domain, and possess a limited ability to predict future be-
havior. Using a stochastic model we show that as the volume of simulation activity
in the model increases relative to a fixed architecture, the complexity of the average
per-event overhead due to synchronization, event list manipulation, lookahead calcu-
lations, and processor idle time approaches the complexity of the average per-event
overhead of a serial simulation. The method is therefore within a constant factor of
optimal. Our analysis demonstrates that on large problems--those for which parallel
processing is ideally suited--there is often enough parallel workload so that processors
are not usually idle. We also demonstrate the viability of the method empirically,
showing how good performance is achieved on large problems using a thirty-two node
Intel iPSC/2 distributed memory multiprocessor.
*Supported in part by the Virginia Center for Innovative Technology, by NASA grants NAG-I-060 and
NAS1-18605, and NSF grant ASC 8819393

............. _7

1 Introduction
The problem of parallelizing discrete-event simulations has received a great deal of attention
in the last several years. Simulations pose unique synchronization constraints due to their
underlying sense of time. When the simulation model can be simultaneously changed by
different processors, actions by one processor can affect actions by another. One must not
simulate any element of the model too far ahead of any other in simulation time, to avoid the
risk of having its logical past affected. Alternately, one must be prepared to fix the logical
past of any element determined to have been simulated too far.
Two schools of thought have emerged concerning synchronization, The conservative
school [5], [13], [23], [24] employs methods which prevent any processor from simulating
beyond a point at which another processor might affect it. These synchronization points
need to be re-established periodically to allow the simulation to progress. Early efforts
focussed on finding protocols which were either free from deadlock, or which detected and
corrected deadlock [17]. The optimistic school [7] allows a processor to simulate as far forward
in time as it wants, without regard for the risk of having its simulation past affected. If its
past is changed (due to interaction with a processor farther behind in simulation time) it
must then be able to "rollback" in time at least that far, and must cancel any erroneous
actions it has taken in its false future.
Conservative protocols are sometimes faulted for leaving processors idle, due to overly
pessimistic synchronization assumptions. It is almost always true that individual model
elements are blocked because of pessimistic synchronization; the conclusion that processors
tend to be blocked requires the assumption that all model elements assigned to a processor
tend to be blocked simultaneously, or that each processor has only one model element. The
latter assumption pervades many performance studies, and is unrealistic for fined-grained
simulation models executed on coarser grained multiprocessors. Intuition suggests that if
there are many model elements assigned to each processor, then it is unlikely that all model
elements on a processor will be blocked. Given sufficient workload, a properly designed
conservative method should not leave processors idle, because there is so much work to do.
While some model elements are blocked due to synchronization concerns, other elements,
with high probability, are not.
It is natural to ask how much performance degradation due to blocking a conservative
method suffers. We answer that question, by analyzing a simple conservative synchronization
method. The method assumes the ability to pre-sample activity duration times[20], and
assumes that any queueing discipline used is non-preemptive. The protocol itself is quite
simple. As applied to a queueing network it works as follows. First, whenever a job enters
service, the queue to which the job will be routed is immediately notified of that arrival
(sometime in the future), and the receiving queue computes a service time for the new arrival.
These two actions constitute lookahead, a concept which is key to the protocol's success. Now
imagine that all events with time-stamps less than t have already been processed and that
the processors are globally synchronized. For each queue we determine the time-stamp of the

Citations
More filters
Journal ArticleDOI
TL;DR: This article deals with the execution of a simulation program on a parallel computer by decomposing the simulation application into a set of concurrently executing processes and introduces interesting synchronization problems that are at the heart of the PDES problem.
Abstract: Parallel discrete event simulation (PDES), sometimes called distributed simulation, refers to the execution of a single discrete event simulation program on a parallel computer. PDES has attracted a considerable amount of interest in recent years. From a pragmatic standpoint, this interest arises from the fact that large simulations in engineering, computer science, economics, and military applications, to mention a few, consume enormous amounts of time on sequential machines. From an academic point of view, parallel simulation is interesting because it represents a problem domain that often contains substantial amounts of parallelism (e.g., see [59]), yet paradoxically, is surprisingly difficult to parallelize in practice. A sufficiently general solution to the PDES problem may lead to new insights in parallel computation as a whole. Historically, the irregular, data-dependent nature of PDES programs has identified it as an application where vectorization techniques using supercomputer hardware provide little benefit [14].A discrete event simulation model assumes the system being simulated only changes state at discrete points in simulated time. The simulation model jumps from one state to another upon the occurrence of an event. For example, a simulator of a store-and-forward communication network might include state variables to indicate the length of message queues, the status of communication links (busy or idle), etc. Typical events might include arrival of a message at some node in the network, forwarding a message to another network node, component failures, etc.We are especially concerned with the simulation of asynchronous systems where events are not synchronized by a global clock, but rather, occur at irregular time intervals. For these systems, few simulator events occur at any single point in simulated time; therefore parallelization techniques based on lock-step execution using a global simulation clock perform poorly or require assumptions in the timing model that may compromise the fidelity of the simulation. Concurrent execution of events at different points in simulated time is required, but as we shall soon see, this introduces interesting synchronization problems that are at the heart of the PDES problem.This article deals with the execution of a simulation program on a parallel computer by decomposing the simulation application into a set of concurrently executing processes. For completeness, we conclude this section by mentioning other approaches to exploiting parallelism in simulation problems.Comfort and Shepard et al. have proposed using dedicated functional units to implement specific sequential simulation functions, (e.g., event list manipulation and random number generation [20, 23, 47]). This method can provide only a limited amount of speedup, however. Zhang, Zeigler, and Concepcion use the hierarchical decomposition of the simulation model to allow an event consisting of several subevents to be processed concurrently [21, 98]. A third alternative is to execute independent, sequential simulation programs on different processors [11, 39]. This replicated trials approach is useful if the simulation is largely stochastic and one is performing long simulation runs to reduce variance, or if one is attempting to simulate a specific simulation problem across a large number of different parameter settings. However, one drawback with this approach is that each processor must contain sufficient memory to hold the entire simulation. Furthermore, this approach is less suitable in a design environment where results of one experiment are used to determine the experiment that should be performed next because one must wait for a sequential execution to be completed before results are obtained.

1,615 citations

Book
01 Jan 2000
TL;DR: The article gives an overview of technologies to distribute the execution of simulation programs over multiple computer systems, with particular emphasis on synchronization (also called time management) algorithms as well as data distribution techniques.
Abstract: Originating from basic research conducted in the 1970's and 1980's, the parallel and distributed simulation field has matured over the last few decades. Today, operational systems have been fielded for applications such as military training, analysis of communication networks, and air traffic control systems, to mention a few. The article gives an overview of technologies to distribute the execution of simulation programs over multiple computer systems. Particular emphasis is placed on synchronization (also called time management) algorithms as well as data distribution techniques.

1,217 citations


Cites background from "The cost of conservative synchroniz..."

  • ...More recently, the problem is discussed in Nicol and Liu (1997). ',....

    [...]

  • ...5, which is similar to Nicol's YAWNS protocol (Nicol 1993) and Steinman's Time Buckets protocol (Steinman 1991), all exploit this fact....

    [...]

Proceedings ArticleDOI
01 Oct 1989
TL;DR: This tutorial surveys the state of the art in executing discrete event simulation programs on a parallel computer, and focuses attention on asynchronous simulation programs where few events occur at any single point in simulated time.
Abstract: This tutorial surveys the state of the art in executing discrete event simulation programs on a parallel computer. Specifically, we will focus attention on asynchronous simulation programs where few events occur at any single point in simulated time, necessitating the concurrent execution of events occurring at different points in time. We first describe the parallel discrete event simulation problem, and examine why it so difficult. We review several simulation strategies that have been proposed, and discuss the underlying ideas on which they are based. We critique existing approaches in order to clarify their respective strengths and weaknesses.

1,201 citations

BookDOI
TL;DR: The Abstract Object class defines and characterizes all the essential properties every class in this design has in this 404 OBJECT-ORIENTED SIMULATION.
Abstract: Objects. The ~ b s t rac t ~ b j ect forms the fundamental base class for the entire design and all other classes are derived from this base class. The Abstract Object class defines and characterizes all the essential properties every class in this 404 OBJECT-ORIENTED SIMULATION

879 citations

Journal ArticleDOI
TL;DR: This paper surveys topics that presently define the state of the art in parallel simulation and includes discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic snchronization.
Abstract: This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

142 citations

References
More filters
Book
06 Dec 1982

6,033 citations

Journal ArticleDOI
TL;DR: Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition, offers a comprehensive introduction to probabiliby, stochastic processes, and statistics for students of computer science, electrical and computer engineering, and applied mathematics.
Abstract: Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition, offers a comprehensive introduction to probabiliby, stochastic processes, and statistics for students of computer science, electrical and computer engineering, and applied mathematics. Its wealth of practical examples and up-to-date information makes it an excellent resource for practitioners as well.

2,738 citations

Book
01 Jan 1982
TL;DR: Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition as discussed by the authors is a comprehensive introduction to probabiliby, stochastic processes, and statistics for students of computer science, electrical and computer engineering, and applied mathematics.
Abstract: Probability and Statistics with Reliability, Queuing and Computer Science Applications, Second Edition, offers a comprehensive introduction to probabiliby, stochastic processes, and statistics for students of computer science, electrical and computer engineering, and applied mathematics. Its wealth of practical examples and up-to-date information makes it an excellent resource for practitioners as well.

2,629 citations

Journal ArticleDOI
TL;DR: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control.
Abstract: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control. Virtual time provides a flexible abstraction of real time in much the same way that virtual memory provides an abstraction of real memory. It is implemented using the Time Warp mechanism, a synchronization protocol distinguished by its reliance on lookahead-rollback, and by its implementation of rollback via antimessages.

2,280 citations

Journal ArticleDOI
TL;DR: This article deals with the execution of a simulation program on a parallel computer by decomposing the simulation application into a set of concurrently executing processes and introduces interesting synchronization problems that are at the heart of the PDES problem.
Abstract: Parallel discrete event simulation (PDES), sometimes called distributed simulation, refers to the execution of a single discrete event simulation program on a parallel computer. PDES has attracted a considerable amount of interest in recent years. From a pragmatic standpoint, this interest arises from the fact that large simulations in engineering, computer science, economics, and military applications, to mention a few, consume enormous amounts of time on sequential machines. From an academic point of view, parallel simulation is interesting because it represents a problem domain that often contains substantial amounts of parallelism (e.g., see [59]), yet paradoxically, is surprisingly difficult to parallelize in practice. A sufficiently general solution to the PDES problem may lead to new insights in parallel computation as a whole. Historically, the irregular, data-dependent nature of PDES programs has identified it as an application where vectorization techniques using supercomputer hardware provide little benefit [14].A discrete event simulation model assumes the system being simulated only changes state at discrete points in simulated time. The simulation model jumps from one state to another upon the occurrence of an event. For example, a simulator of a store-and-forward communication network might include state variables to indicate the length of message queues, the status of communication links (busy or idle), etc. Typical events might include arrival of a message at some node in the network, forwarding a message to another network node, component failures, etc.We are especially concerned with the simulation of asynchronous systems where events are not synchronized by a global clock, but rather, occur at irregular time intervals. For these systems, few simulator events occur at any single point in simulated time; therefore parallelization techniques based on lock-step execution using a global simulation clock perform poorly or require assumptions in the timing model that may compromise the fidelity of the simulation. Concurrent execution of events at different points in simulated time is required, but as we shall soon see, this introduces interesting synchronization problems that are at the heart of the PDES problem.This article deals with the execution of a simulation program on a parallel computer by decomposing the simulation application into a set of concurrently executing processes. For completeness, we conclude this section by mentioning other approaches to exploiting parallelism in simulation problems.Comfort and Shepard et al. have proposed using dedicated functional units to implement specific sequential simulation functions, (e.g., event list manipulation and random number generation [20, 23, 47]). This method can provide only a limited amount of speedup, however. Zhang, Zeigler, and Concepcion use the hierarchical decomposition of the simulation model to allow an event consisting of several subevents to be processed concurrently [21, 98]. A third alternative is to execute independent, sequential simulation programs on different processors [11, 39]. This replicated trials approach is useful if the simulation is largely stochastic and one is performing long simulation runs to reduce variance, or if one is attempting to simulate a specific simulation problem across a large number of different parameter settings. However, one drawback with this approach is that each processor must contain sufficient memory to hold the entire simulation. Furthermore, this approach is less suitable in a design environment where results of one experiment are used to determine the experiment that should be performed next because one must wait for a sequential execution to be completed before results are obtained.

1,615 citations