scispace - formally typeset
Search or ask a question

Showing papers by "Richard M. Fujimoto published in 1997"


Proceedings ArticleDOI
01 Dec 1997
TL;DR: The motivations and processes used to develop the High Level Architecture are described and a technical description of key elements of the architecture and supporting software are provided.
Abstract: The High Level Architecture (HLA) provides the specification of a common technical architecture for use across all classes of simulations in the US Department of Defense. It provides the structural basis for simulation interoperability. The baseline definition of the HLA includes (1) the HLA Rules, (2) the HLA Interface Specification, and (3) the HLA Object Model Template. This paper describes the motivations and processes used to develop the High Level Architecture and provides a technical description of key elements of the architecture and supporting software. Services defined in the interface specification for providing time management (TM) and data distribution management (DDM) for distributed simulations are described.

260 citations


Journal ArticleDOI
TL;DR: An efficient implementatin is described that eliminates the need for a processor to explicitly compute a local minimum for time warp systems using a lowest-timestamp-first scheduling policy in each processor and a new mechanism called on-the-fly fossil collection that enables efficient storage reclamation for simulations containing large numbers.
Abstract: Global virtual time (GVT) is used in the Time Warp synchronization mechanism to perform irrevocable operations such as I/O and to reclaim storage. Most existing algorithms for computing GVT assume a message-passing programming model. Here, GVT computation is examined in the context of a shared-memory model. We observe that computation of GVT is much simpler in shared-memory multiprocessors because these machines normally guarantee that no two processors will observe a set of memory operations as occurring in different orders. Exploiting this fact, we propose an efficient, asynchronous, shared-memory GVT algorithm and prove its correctness. This algorithm does not require message acknowledgments, special GVT messages, or FIFO delivery of messages, and requires only a minimal number of shared variables and data structures. The algorithm only requires one round of interprocessor communication to compute GVT, in contrast to many message-based algorithms that require two. An efficient implementatin is described that eliminates the need for a processor to explicitly compute a local minimum for time warp systems using a lowest-timestamp-first scheduling policy in each processor.In addition, we propose a new mechanism called on-the-fly fossil collection that enables efficient storage reclamation for simulations containing large numbers, e.g., hundreds of thousand or even millions of simulator objects. On-the-fly fossil collection can be used in time warp systems executing on either shared-memory or message-based machines. Performance measurements of the GVT algorithm and the on-the-fly fossil collection mechanism on a Kendall Square Research KSR-2 machine demonstrate that these techniques enable frequent GVT and fossil collections, e.g., every millisecond, without incurring a significant performance penalty

93 citations


Proceedings ArticleDOI
01 Dec 1997
TL;DR: Inefficiencies in the air traffic flow can be discovered by monitoring the simulation then allowing backward execution to evaluate cause and effect relationships.
Abstract: such as a runway closure, the controller interacting with the simulation may want to evaluate the effect of activating different scheduling policies. This can be done by dynamically cloning (replicating) the simulation with different scheduling policies. The effect of each policy within each clone is monitored to determine which policy offers the most effective solution. Inefficiencies in the air traffic flow can be discovered by monitoring the simulation then allowing backward execution to evaluate cause and effect relationships.

53 citations



Journal ArticleDOI
TL;DR: An adaptive mechanism is proposed based on the Cancelback memory management protocol for shared-memory multiprocessors that dynamically controls the amount of memory used in the simulation in order to maximize performance and track the time-varying nature of a communication network simulation.
Abstract: It is widely believed that the Time Warp protocol for parallel discrete event simulation is prone to two potential problems: an excessive amount of wasted, rolled back computation resulting from “rollback thrashing” behaviors, and inefficient use of memory, leading to poor performance of virtual memory and/or multiprocessor cache systems. An adaptive mechanism is proposed based on the Cancelback memory management protocol for shared-memory multiprocessors that dynamically controls the amount of memory used in the simulation in order to maximize performance. The proposed mechanism is adaptive in the sense that it monitors the execution of the Time Warp program, and using simple models, automatically adjusts the amount of memory used to reduce Time Warp overheads (fossil collection, Cancelback, the amount of rolled back computation, etc.) to a manageable level. We describe an implementation of this mechanism on a shared memory, Kendall Square Research KSR-1, multiprocessor and demonstrate its effectiveness in automatically maximizing performance while minimizing memory utilzation, for several synthetic and benchmark discrete event simulation applications. We also demonstrate the adaptive ability of the mechanism by showing that it “tracks” the time-varying nature of a communication network simulation.

51 citations


Journal ArticleDOI
01 Jun 1997
TL;DR: Experimental data indicates that the adaptive flow control scheme maintains high performance for "balanced workloads'', and achieves as much as a factor of 7 speedup over unthrottled TW for certain irregular workloads.
Abstract: It is well known that Time Warp may suffer from poor performance due to excessive rollbacks caused by overly optimistic execution. Here we present a simple flow control mechanism using only local information and GVT that limits the number of uncommitted messages generated by a processor, thus throttling overly optimistic TW execution. The flow control scheme is analogous to traditional networking flow control mechanisms. A ``window'' of messages defines the maximum number of uncommitted messages allowed to be scheduled by a process. Committing messages is analogous to acknowledgments in networking flow control. The initial size of the window is calculated using a simple analytical model that estimates the instantaneous number of messages that a process will eventually commit. This window is expanded so that the process may progress up to the next commit point (generally the next fossil collection), and to accommodate optimistic execution. The expansions to the window are based on monitoring TW performance statistics so the window size automatically adapts to changing program behaviors.The flow control technique presented here is simple and fully automatic. No global knowledge or synchronization (other than GVT) is required. We also develop an implementation of the flow control scheme for shared memory multiprocessors that uses dynamically sized pools of free message buffers. Experimental data indicates that the adaptive flow control scheme maintains high performance for "balanced workloads'', and achieves as much as a factor of 7 speedup over unthrottled TW for certain irregular workloads.

37 citations


Proceedings ArticleDOI
01 Dec 1997
TL;DR: The design and algorithms used to implement the HLA Time Management Services in F.O.0, the familiarization version of the RTI, were presented at the MITRE Corporation.
Abstract: The DoD High Level architecture (HLA) has recently become the required method for the interconnection of all DoD computer simulations. The HLA addresses the rules by which simulations are designed to facilitate interoperability, the method by which information exchanged between simulations is described, and a standard set of software services provided 5y a common Runtime Infrastructure (RTI). The RTI is responsible for the coordination of collections of cooperating simulations. The familiarization version of the RTI, dubbed F.0, was developed at the MITRE Corporation. One of the core components of the RTI is Time Management and is the focus of this paper. In particular, we present the design and algorithms used to implement the HLA Time Management Services in F.O.

31 citations


Proceedings ArticleDOI
05 Aug 1997
TL;DR: A novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed network that admits low cost implementations based only on off-the-shelf hardware components and can be used to communicate with any ATM-enabled host.
Abstract: This paper presents a novel networking architecture designed for communication intensive parallel applications running on clusters of workstations (COWs) connected by high speed network. This architecture permits: (1) the transfer of selected communication-related functionality the host machine to the network interface coprocessor and (2) the exposure of this functionality directly to applications as instructions of a Virtual Communication Machine (VCM) implemented by the coprocessor. The user-level code interacts directly with the network coprocessor as the host kernel only 'connects' the application to the VCM and does not participate in the data transfers. The distinctive feature of our design is its flexibility: the integration of the network with the application can be varied to maximize performance. The resulting communication architecture is characterized by a very low overhead on the host processor by latency and bandwidth close to the hardware limits, and by an application interface which enables zero-copy messaging and eases the port of some shared-memory parallel applications to COWs. The architecture admits low cost implementations based only on off-the-shelf hardware components. Additionally, its current ATM-based implementation can be used to communicate with any ATM-enabled host.

26 citations


Journal ArticleDOI
TL;DR: Results indicate that contrary to the common belief, memory usage by Time Warp can be controlled within reasonable limits without any significant loss of performance.
Abstract: The performance of the Time Warp mechanism is experimentally evaluated when only a limited amount of memory is available to the parallel computation. An implementation of the cancelback protocol is used for memory management on a shared memory architecture, viz., KSR to evaluate the performance vs. memory tradeoff. The implementation of the cancelback protocol supports canceling back more than one memory object when memory has been exhausted (the precise number is referred to as the salvage parameter) and incorporates a non-work-conserving processor scheduling technique to prevent starvation. Several synthetic and benchmark programs are used that provide interesting stress cases for evaluating the limited memory behavior. The experiments are extensively monitored to determine the extent to which various factors may affect performance. Several observations are made by analyzing the behavior of Time Warp under limited memory: (1) Depending on the available memory and asymmetry in the workload, canceling back several memory objects at one time (i.e. a salvage parameter value of more than one) improves performance significantly, by reducing certain overheads. However, performance is relatively insensitive to the salvage parameter except at extreme values. (2) The speedup vs. memory curve for Time Warp programs has a well-defined knee before which speedup increases very rapidly with memory and beyond which there is little performance gain with increased memory. (3) A performance nearly equivalent to that with large amounts of memory can be achieved with only a modest amount of additional memory beyond that required for sequential execution, if memory management overheads are small compared to the event granularity. These results indicate that contrary to the common belief, memory usage by Time Warp can be controlled within reasonable limits without any significant loss of performance.

25 citations


Proceedings ArticleDOI
01 Dec 1997
TL;DR: A general-purpose network computing visualization system is extended into a new system, called PVaniM-GTW, by adding middleware-specific views to better satisfy the needs of PDES middleware than general- Purpose visualization systems while also not requiring the development of application specific visualizations by the end user.
Abstract: Parallel discrete event simulation systems (PDES) are used to simulate large-scale applications such as modeling telecommunication networks, transportation grids, and battlefield scenarios. While a large amount of PDES research has focused on employing multiprocessors and multicomputers, the use of networks of workstations interconnected through Ethernet or ATM has evolved into a popular and effective platform for PDES. To improve performance in these environments, we investigate the use of graphical visualization to provide insight into performance evaluation and simulator execution. We began with a general-purpose network computing visualization system, PVaniM, and used it to investigate the execution of an advanced version of Time Warp, called Georgia Tech Time Warp (GTW), which executes in network computing environments. Because PDES systems such as GTW are essentially middleware that support their own applications, we soon realized these systems require their own middleware-specific visualization support. To this end we have extended PVaniM into a new system, called PVaniM-GTW by adding middleware-specific views. Our experiences with PVaniM-GTW indicate that these enhancements enable one to better satisfy the needs of PDES middleware than general-purpose visualization systems while also not requiring the development of application specific visualizations by the end user.

16 citations


01 Jan 1997
TL;DR: This manual gives an introduction to writing parallel discrete event simulation programs for the Georgia Tech Time Warp (GTW) system (version 3.1).
Abstract: This manual gives an introduction to writing parallel discrete event simulation programs for the Georgia Tech Time Warp (GTW) system (version 3.1). Time Warp is a synchronization mechanism for parallel discrete event simulation programs. GTW is a Time Warp simulation kernel implemented on distributed network of uniprocessor and shared memory multiprocessor machines. Use of this program shall be restricted to internal research purposes only, and it may not be redistributed in any form without authorization from the Georgia Tech Research Corporation. Derivative works must carry this Copyright notice. This program is provided as is and Georgia Tech Research Corporation disclaims all warranties with regard to this program. In no event shall Georgia Tech Research Corporation be liable for any damages arising out of or in connection with the use or performance of this program.

Journal ArticleDOI
TL;DR: It is illustrated that if only few sources with heavy tailed active periods are multiplexed and if the utilizations are low, then the end result is not necessarily a worse performance as one would anticipate when larger aggregations of such sources asymptotically approximate a self-similar process.
Abstract: The problem of end-to-end connection performance in Asynchronous Transfer Mode (ATM) networks is extremely important because its solution provides the information (such as end-to-end delay distribution and cell loss ratio) necessary for the definition of the Quality of Service (QoS) guarantees for real-time connections. Due to the analytical complexities inherent in this problem, we are seeking to develop an efficient simulation technique for the fast production of results. The presented approach is a combination of an earlier time-parallel technique and probabilistic routing. An implementation of this technique on a multiprocessor with modest capabilities shows promising speed-up. Using the developed simulator, several experiments are conducted for the end-to-end behavior of ATM connections using bursty traffic models, including ones with active periods derived from heavy tailed distributions. An important feature of the study is that the interfering traffic is of the same type as the end-to-end traffic. Thus, no assumptions are made about traffic smoothing or approximations of the aggregate interference process by Poisson arrivals. The presented end-to-end performance results include the Cell Loss Ratio (CLR) and the delay distribution at the multiplexers and at each intermediate switch of the connection path. Among the presented findings, it is illustrated that if only few sources with heavy tailed active periods are multiplexed and if the utilizations are low, then the end result is not necessarily a worse performance as one would anticipate when larger aggregations of such sources asymptotically approximate a self-similar process.

Proceedings ArticleDOI
01 Dec 1997
TL;DR: Results obtained from a messagepassing implementation on a cluster of workstations confirm that it is possible to generate self-similar ATM traffic in realtime for 155 Mbps (or even faster) links and that the technique achieves an almost linear speedup with respect to the number of available workstation.
Abstract: We present a time-parallel technique for the fast generation of self-similar traffic which is suitable for performance studies of Asynchronous Transfer Mode (ATM) networks. The technique is based on the well known result according to which the aggregation of a large number of heavy-tailed ON/OFFtype renewal/reward processes asymptotically approximates a Fractional Gaussian Noise (FGN) process and, therefore, it possesses the characteristics of self-similarity and long-range dependence. The technique parallelizes both the generation of the individual reriewlzl/reward processes as well as the merging of these processes in a per-time-slice manner. Results obtained from a messagepassing implementation on a cluster of workstations confirm that it is possible to generate self-similar ATM traffic in realtime for 155 Mbps (or even faster) links and that, furthermore, the technique achieves an almost linear speedup with respect to the number of available workstations.