scispace - formally typeset
Search or ask a question
Proceedings Article•DOI•

Parallel Discrete Event Simulation

01 Oct 1989-pp 19-28
TL;DR: This tutorial surveys the state of the art in executing discrete event simulation programs on a parallel computer, and focuses attention on asynchronous simulation programs where few events occur at any single point in simulated time.
Abstract: This tutorial surveys the state of the art in executing discrete event simulation programs on a parallel computer. Specifically, we will focus attention on asynchronous simulation programs where few events occur at any single point in simulated time, necessitating the concurrent execution of events occurring at different points in time. We first describe the parallel discrete event simulation problem, and examine why it so difficult. We review several simulation strategies that have been proposed, and discuss the underlying ideas on which they are based. We critique existing approaches in order to clarify their respective strengths and weaknesses.

Content maybe subject to copyright    Report






Citations
More filters
Proceedings Article•DOI•
01 Jul 1998
TL;DR: The paper describes the GloMoSim library, addresses a number of issues relevant to its parallelization, and presents a set of experimental results on the IBM 9076 SP, a distributed memory multicomputer.
Abstract: A number of library based parallel and sequential network simulators have been designed. The paper describes a library, called GloMoSim (Global Mobile system Simulator), for parallel simulation of wireless networks. GloMoSim has been designed to be extensible and composable: the communication protocol stack for wireless networks is divided into a set of layers, each with its own API. Models of protocols at one layer interact with those at a lower (or higher) layer only via these APIs. The modular implementation enables consistent comparison of multiple protocols at a given layer. The parallel implementation of GloMoSim can be executed using a variety of conservative synchronization protocols, which include the null message and conditional event algorithms. The paper describes the GloMoSim library, addresses a number of issues relevant to its parallelization, and presents a set of experimental results on the IBM 9076 SP, a distributed memory multicomputer. These experiments use models constructed from the library modules.

1,462 citations

Book•DOI•
TL;DR: The Abstract Object class defines and characterizes all the essential properties every class in this design has in this 404 OBJECT-ORIENTED SIMULATION.
Abstract: Objects. The ~ b s t rac t ~ b j ect forms the fundamental base class for the entire design and all other classes are derived from this base class. The Abstract Object class defines and characterizes all the essential properties every class in this 404 OBJECT-ORIENTED SIMULATION

879 citations

Proceedings Article•DOI•
12 Nov 2011
TL;DR: Interval simulation provides a balance between detailed cycle-accurate simulation and one-IPC simulation, allowing long-running simulations to be modeled much faster than with detailed cycle, while still providing the detail necessary to observe core-uncore interactions across the entire system.
Abstract: Two major trends in high-performance computing, namely, larger numbers of cores and the growing size of on-chip cache memory, are creating significant challenges for evaluating the design space of future processor architectures. Fast and scalable simulations are therefore needed to allow for sufficient exploration of large multi-core systems within a limited simulation time budget. By bringing together accurate high-abstraction analytical models with fast parallel simulation, architects can trade off accuracy with simulation speed to allow for longer application runs, covering a larger portion of the hardware design space. Interval simulation provides this balance between detailed cycle-accurate simulation and one-IPC simulation, allowing long-running simulations to be modeled much faster than with detailed cycle-accurate simulation, while still providing the detail necessary to observe core-uncore interactions across the entire system. Validations against real hardware show average absolute errors within 25% for a variety of multi-threaded workloads; more than twice as accurate on average as one-IPC simulation. Further, we demonstrate scalable simulation speed of up to 2.0 MIPS when simulating a 16-core system on an 8-core SMP machine.

818 citations


Cites background from "Parallel Discrete Event Simulation"

  • ...There exist a number of approaches to relax the synchronization imposed by cycle-by-cycle simulation [13]....

    [...]

Journal Article•DOI•
TL;DR: The simulation environment the authors developed at UCLA attempts to address some of the issues facing widespread use of parallel simulation, including a lack of tools for integrating parallel model execution into the overall framework of system simulation.
Abstract: Design and development costs for extremely large systems could be significantly reduced if only there were efficient techniques for evaluating design alternatives and predicting their impact on overall system performance metrics. Due to the systems' analytical intractability, simulation is the most common performance evaluation technique for such systems. However, the long execution times needed for sequential simulation models often hampers evaluation. The slow speeds of sequential model execution have led to growing interest in the use of parallel execution for simulating large-scale systems. Widespread use of parallel simulation, however; has been significantly hindered by a lack of tools for integrating parallel model execution into the overall framework of system simulation. Another drawback to widespread use of simulations is the cost of model design and maintenance. The simulation environment the authors developed at UCLA attempts to address some of these issues. It consists of three primary components: a parallel simulation language called Parsec (parallel simulation environment for complex systems), its GUI, called Pave, and the portable runtime system that implements the simulation algorithms.

699 citations

Proceedings Article•DOI•
23 Jun 2013
TL;DR: Zsim, a fast, scalable, and accurate simulator, is built using bound-weave, a two-phase parallelization technique that scales parallel simulation on multicore hosts efficiently with minimal loss of accuracy, and lightweight user-level virtualization is implemented to support complex workloads.
Abstract: Architectural simulation is time-consuming, and the trend towards hundreds of cores is making sequential simulation even slower. Existing parallel simulation techniques either scale poorly due to excessive synchronization, or sacrifice accuracy by allowing event reordering and using simplistic contention models. As a result, most researchers use sequential simulators and model small-scale systems with 16-32 cores. With 100-core chips already available, developing simulators that scale to thousands of cores is crucial.We present three novel techniques that, together, make thousand-core simulation practical. First, we speed up detailed core models (including OOO cores) with instruction-driven timing models that leverage dynamic binary translation. Second, we introduce bound-weave, a two-phase parallelization technique that scales parallel simulation on multicore hosts efficiently with minimal loss of accuracy. Third, we implement lightweight user-level virtualization to support complex workloads, including multiprogrammed, client-server, and managed-runtime applications, without the need for full-system simulation, sidestepping the lack of scalable OSs and ISAs that support thousands of cores.We use these techniques to build zsim, a fast, scalable, and accurate simulator. On a 16-core host, zsim models a 1024-core chip at speeds of up to 1,500 MIPS using simple cores and up to 300 MIPS using detailed OOO cores, 2-3 orders of magnitude faster than existing parallel simulators. Simulator performance scales well with both the number of modeled cores and the number of host cores. We validate zsim against a real Westmere system on a wide variety of workloads, and find performance and microarchitectural events to be within a narrow range of the real system.

481 citations


Cites background from "Parallel Discrete Event Simulation"

  • ...Multicore timing models are extremely challenging to parallelize using pessimistic PDES, as cores and caches are a few cycles away and interact often....

    [...]

  • ...Since the possible interactions between memory accesses are known from the first phase, this timing simulation incurs much lower synchronization overheads than conventional PDES techniques, while still being highly accurate....

    [...]

  • ...Approximation strategies: Since accurate PDES is hard, a scalable alternative is to relax accuracy and allow order violations....

    [...]

  • ...HORNET is a pessimistic PDES simulator [34], while SlackSim (unreleased) uses both pessimistic and optimistic PDES [10]....

    [...]

  • ...In PDES, events are distributed among host cores and executed concurrently while maintaining the illusion of full order....

    [...]

References
More filters
Journal Article•DOI•
TL;DR: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control.
Abstract: Virtual time is a new paradigm for organizing and synchronizing distributed systems which can be applied to such problems as distributed discrete event simulation and distributed database concurrency control. Virtual time provides a flexible abstraction of real time in much the same way that virtual memory provides an abstraction of real memory. It is implemented using the Time Warp mechanism, a synchronization protocol distinguished by its reliance on lookahead-rollback, and by its implementation of rollback via antimessages.

2,280 citations


Additional excerpts

  • ...The Time Warp mechanism, based on the Virtual Time paradigm, is the most well known optimistic protocol [26]....

    [...]

Book•
01 Nov 1988
TL;DR: One that the authors will refer to break the boredom in reading is choosing parallel program design a foundation as the reading material.
Abstract: Introducing a new hobby for other people may inspire them to join with you. Reading, as one of mutual hobby, is considered as the very easy hobby to do. But, many people are not interested in this hobby. Why? Boring is the reason of why. However, this feel actually can deal with the book and time of you reading. Yeah, one that we will refer to break the boredom in reading is choosing parallel program design a foundation as the reading material.

1,941 citations


Additional excerpts

  • ...The conditional knowledge approach to simulation arises from the Unity theory of parallel programming [ 11 ]....

    [...]

Journal Article•DOI•
TL;DR: This work proposes a distributed solution where processes communicate only through messages with their neighbors; there are no shared variables and there is no central process for message routing or process scheduling.
Abstract: The problem of system simulation is typically solved in a sequential manner due to the wide and intensive sharing of variables by all parts of the system. We propose a distributed solution where processes communicate only through messages with their neighbors; there are no shared variables and there is no central process for message routing or process scheduling. Deadlock is avoided in this system despite the absence of global control. Each process in the solution requires only a limited amount of memory. The correctness of a distributed system is proven by proving the correctness of each of its component processes and then using inductive arguments. The proposed solution has been empirically found to be efficient in preliminary studies. The paper presents formal, detailed proofs of correctness.

1,005 citations


Additional excerpts

  • ...[10] K.M. Chandy and J. Misra....

    [...]

  • ...They also identify situations where Time Warp will outper- form the Chandy-Misra algorithms....

    [...]

  • ...[33] J. Misra....

    [...]

  • ...[11] K.M. Chandy and J. Misra....

    [...]

  • ...Variants of the Chandy- Misra-Bryant Distributed Discrete-Event Simula-tion Algorithm....

    [...]

Journal Article•DOI•
TL;DR: The focus of this work is on the theory of distributed discrete-event simulation, which may provide better performance by partitioning the simulation among the component processors.
Abstract: Traditional discrete-event simulations employ an inherently sequential algorithm. In practice, simulations of large systems are limited by this sequentiality, because only a modest number of events can be simulated. Distributed discrete-event simulation (carried out on a network of processors with asynchronous message-communicating capabilities) is proposed as an alternative; it may provide better performance by partitioning the simulation among the component processors. The basic distributed simulation scheme, which uses time encoding, is described. Its major shortcoming is a possibility of deadlock. Several techniques for deadlock avoidance and deadlock detection are suggested. The focus of this work is on the theory of distributed discrete-event simulation.

968 citations


Additional excerpts

  • ...An alternative to sending null messages whenever a process blocks is to have processes query other processes when they need to receive a better link clock value [2, 33]....

    [...]

  • ...Deadlock detection mechanisms are described in [13, 21, 33]....

    [...]

Journal Article•DOI•

867 citations


Additional excerpts

  • ...Deadlock detection mechanisms are described in [13, 21, 33]....

    [...]