scispace - formally typeset
Search or ask a question
Posted Content

A Survey on Agent-based Simulation using Hardware Accelerators

TL;DR: This survey targets modellers seeking an overview of suitable hardware platforms and execution techniques for a specific simulation model, as well as methodology researchers interested in potential research gaps requiring further exploration.
Abstract: Due to decelerating gains in single-core CPU performance, computationally expensive simulations are increasingly executed on highly parallel hardware platforms. Agent-based simulations, where simulated entities act with a certain degree of autonomy, frequently provide ample opportunities for parallelisation. Thus, a vast variety of approaches proposed in the literature demonstrated considerable performance gains using hardware platforms such as many-core CPUs and GPUs, merged CPU-GPU chips as well as FPGAs. Typically, a combination of techniques is required to achieve high performance for a given simulation model, putting substantial burden on modellers. To the best of our knowledge, no systematic overview of techniques for agent-based simulations on hardware accelerators has been given in the literature. To close this gap, we provide an overview and categorisation of the literature according to the applied techniques. Since at the current state of research, challenges such as the partitioning of a model for execution on heterogeneous hardware are still a largely manual process, we sketch directions for future research towards automating the hardware mapping and execution. This survey targets modellers seeking an overview of suitable hardware platforms and execution techniques for a specific simulation model, as well as methodology researchers interested in potential research gaps requiring further exploration.
Citations
More filters
Journal ArticleDOI
TL;DR: A comprehensive up-to-date survey identifies the main trade-offs and limitations of the existing hardware-accelerated platforms and infrastructures for NFs and outlines directions for future research.
Abstract: In order to facilitate flexible network service virtualization and migration, network functions (NFs) are increasingly executed by software modules as so-called “softwarized NFs” on General-Purpose Computing (GPC) platforms and infrastructures. GPC platforms are not specifically designed to efficiently execute NFs with their typically intense Input/Output (I/O) demands. Recently, numerous hardware-based accelerations have been developed to augment GPC platforms and infrastructures, e.g., the central processing unit (CPU) and memory, to efficiently execute NFs. This article comprehensively surveys hardware-accelerated platforms and infrastructures for executing softwarized NFs. This survey covers both commercial products, which we consider to be enabling technologies, as well as relevant research studies. We have organized the survey into the main categories of enabling technologies and research studies on hardware accelerations for the CPU, the memory, and the interconnects (e.g., between CPU and memory), as well as custom and dedicated hardware accelerators (that are embedded on the platforms); furthermore, we survey hardware-accelerated infrastructures that connect GPC platforms to networks (e.g., smart network interface cards). We find that the CPU hardware accelerations have mainly focused on extended instruction sets and CPU clock adjustments, as well as cache coherency. Hardware accelerated interconnects have been developed for on-chip and chip-to-chip connections. Our comprehensive up-to-date survey identifies the main trade-offs and limitations of the existing hardware-accelerated platforms and infrastructures for NFs and outlines directions for future research.

51 citations


Additional excerpts

  • ..., simulations [208] and graph processing [209]....

    [...]

Proceedings ArticleDOI
15 Oct 2018
TL;DR: Results show that a CPU-based parallelisation closely approaches the results of partial offloading, while full offloading substantially outperforms the other approaches and achieves a speedup of up to 28.7x over the sequential execution on a CPU.
Abstract: Microscopic traffic simulation is associated with substantial runtimes, limiting the feasibility of large-scale evaluation of traffic scenarios. Even though today heterogeneous hardware comprised of CPUs, graphics processing units (GPUs) and fused CPU-GPU devices is inexpensive and widely available, common traffic simulators still rely purely on CPU-based execution, leaving substantial acceleration potentials untapped. A number of existing works have considered the execution of traffic simulations on accelerators, but have relied on simplified models of road networks and driver behaviour tailored to the given hardware platform. Thus, the existing approaches cannot directly benefit from the vast body of research on the validity of common traffic simulation models. In this paper, we explore the performance gains achievable through the use of heterogeneous hardware when relying on typical traffic simulation models used in CPU-based simulators. We propose a partial offloading approach that relies either on a dedicated GPU or a fused CPU-GPU device. Further, we present a traffic simulation running fully on a many-core GPU and discuss the challenges of this approach. Our results show that a CPU-based parallelisation closely approaches the results of partial offloading, while full offloading substantially outperforms the other approaches. We achieve a speedup of up to 28.7x over the sequential execution on a CPU.

13 citations


Cites background from "A Survey on Agent-based Simulation ..."

  • ...A survey of techniques for general agent-based simulation on heterogeneous hardware is given in [13]....

    [...]

Journal ArticleDOI
TL;DR: A modified BSO algorithm has been used to solve KSP from the perspective of evolutionary game theory and the properties of different swarm optimization algorithms can be understood better.
Abstract: The evolutionary game theory aims to simulate different decision strategies in populations of individuals and to determine how the population evolves. Compared to strategies between two agents, such as cooperation or noncooperation, strategies on multiple agents are rather challenging and difficult to be simulated via traditional methods. Particularly, in a knowledge spillover problem (KSP), cooperation strategies among more than hundreds of individuals need to be simulated. At the same time, the brain storm optimization (BSO) algorithm, which is a data-driven and model-driven hybrid paradigm, has the potential to simulate the complex behaviors in a group of simple individuals. In this paper, a modified BSO algorithm has been used to solve KSP from the perspective of evolutionary game theory. Knowledge spillover (KS) is the sharing or exchanging of knowledge resources among individuals. Firstly, the KS and evolutionary game theory were introduced. Then, the KS model and KS optimization problems were built from the evolutionary game perspective. Lastly, the modified BSO algorithms were utilized to solve KS optimization problems. Based on the applications of BSO algorithms for KSP, the properties of different swarm optimization algorithms can be understood better. More efficient algorithms could be designed to solve different real-world evolutionary game problems.

8 citations

Journal ArticleDOI
TL;DR: A novel online dispatching method that efficiently profiles partitions of the simulation during run‐time to optimize the hardware assignment while using the profiling results to advance the simulation itself and illustrate how co‐execution can be used to further lower execution times.
Abstract: The execution of agent‐based simulations (ABSs) on hardware accelerator devices such as graphics processing units (GPUs) has been shown to offer great performance potentials. However, in heterogeneous hardware environments, it can become increasingly difficult to find viable partitions of the simulation and provide implementations for different hardware devices. To automate this process, we present OpenABLext, an extension to OpenABL, a model specification language for ABSs. By providing a device‐aware OpenCL backend, OpenABLext enables the co‐execution of ABS on heterogeneous hardware platforms consisting of central processing units, GPUs, and field programmable gate arrays (FPGAs). We present a novel online dispatching method that efficiently profiles partitions of the simulation during run‐time to optimize the hardware assignment while using the profiling results to advance the simulation itself. In addition, OpenABLext features automated conflict resolution based on user‐specified rules, supports graph‐based simulation spaces, and utilizes an efficient neighbor search algorithm. We show the improved performance of OpenABLext and demonstrate the potential of FPGAs in the context of ABS. We illustrate how co‐execution can be used to further lower execution times. OpenABLext can be seen as an enabler to tap the computing power of heterogeneous hardware platforms for ABS.

7 citations

Proceedings ArticleDOI
29 May 2019
TL;DR: This paper proposes a transition approach for CPU-based SNN simulators to enable the execution on heterogeneous hardware with only limited modifications to an existing simulator code base, and without changes to model code.
Abstract: Spiking neural networks (SNN) are among the most computationally intensive types of simulation models, with node counts on the order of up to 10^11. Currently, there is intensive research into hardware platforms suitable to support large-scale SNN simulations, whereas several of the most widely used simulators still rely purely on the execution on CPUs. Enabling the execution of these established simulators on heterogeneous hardware allows new studies to exploit the many-core hardware prevalent in modern supercomputing environments, while still being able to reproduce and compare with results from a vast body of existing literature. In this paper, we propose a transition approach for CPU-based SNN simulators to enable the execution on heterogeneous hardware (e.g., CPUs, GPUs, and FPGAs) with only limited modifications to an existing simulator code base, and without changes to model code. Our approach relies on manual porting of a small number of core simulator functionalities as found in common SNN simulators, whereas unmodified model code is analyzed and transformed automatically. We apply our approach to the well-known simulator NEST and make a version executable on heterogeneous hardware available to the community. Our measurements show that at full utilization, a single GPU achieves the performance of about 9 CPU cores.

4 citations


Cites background from "A Survey on Agent-based Simulation ..."

  • ...In recent years, these accelerators have shown promising performance results for various simulation problems [47]....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: It is suggested that input and output are basic primitives of programming and that parallel composition of communicating sequential processes is a fundamental program structuring method.
Abstract: This paper suggests that input and output are basic primitives of programming and that parallel composition of communicating sequential processes is a fundamental program structuring method. When combined with a development of Dijkstra's guarded command, these concepts are surprisingly versatile. Their use is illustrated by sample solutions of a variety of a familiar programming exercises.

11,419 citations


"A Survey on Agent-based Simulation ..." refers methods in this paper

  • ...Common approaches include specifying software systems using formalisms that express parallelism explicitly [81, 102, 116] or annotating programs with parallelisation hints [42]....

    [...]

Journal ArticleDOI
TL;DR: This article developed models of collective behavior for situations where actors have two alternatives and the costs and/or benefits of each depend on how many other actors choose which alternative, and the key...
Abstract: Models of collective behavior are developed for situations where actors have two alternatives and the costs and/or benefits of each depend on how many other actors choose which alternative. The key...

5,195 citations


"A Survey on Agent-based Simulation ..." refers background in this paper

  • ...Two categories of approaches are developed for the cascade model [60] and the threshold model [62], which both simulate the propagation of information among nodes in a graph: vertexoriented processing and edge-oriented processing....

    [...]

Book
01 Jan 1978
TL;DR: The Micromotives and Macrobehavior was originally published over twenty-five years ago, yet the stories it tells feel just as fresh today as discussed by the authors, and the subject of these stories-how small and seemingly meaningless decisions and actions by individuals often lead to significant unintended consequences for a large group-is more important than ever.
Abstract: "Schelling here offers an early analysis of 'tipping' in social situations involving a large number of individuals." -official citation for the 2005 Nobel Prize Micromotives and Macrobehavior was originally published over twenty-five years ago, yet the stories it tells feel just as fresh today. And the subject of these stories-how small and seemingly meaningless decisions and actions by individuals often lead to significant unintended consequences for a large group-is more important than ever. In one famous example, Thomas C. Schelling shows that a slight-but-not-malicious preference to have neighbors of the same race eventually leads to completely segregated populations. The updated edition of this landmark book contains a new preface and the author's Nobel Prize acceptance speech.

4,122 citations


"A Survey on Agent-based Simulation ..." refers methods in this paper

  • ...They ported the cellular models Mood Diffusion [77, 125], Game of Life [58] and Schelling Segregation [158]....

    [...]

Journal ArticleDOI
01 Jan 1998
TL;DR: At its most elemental level, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran (and separately, C and C++ to express shared memory parallelism) and leaves the base language unspecified.
Abstract: At its most elemental level, OpenMP is a set of compiler directives and callable runtime library routines that extend Fortran (and separately, C and C++ to express shared memory parallelism. It leaves the base language unspecified, and vendors can implement OpenMP in any Fortran compiler. Naturally, to support pointers and allocatables, Fortran 90 and Fortran 95 require the OpenMP implementation to include additional semantics over Fortran 77. OpenMP leverages many of the X3H5 concepts while extending them to support coarse grain parallelism. The standard also includes a callable runtime library with accompanying environment variables.

3,318 citations


"A Survey on Agent-based Simulation ..." refers methods in this paper

  • ...Common approaches include specifying software systems using formalisms that express parallelism explicitly [81, 102, 116] or annotating programs with parallelisation hints [42]....

    [...]

Posted Content
TL;DR: This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.
Abstract: Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). The heart of the TPU is a 65,536 8-bit MAC matrix multiply unit that offers a peak throughput of 92 TeraOps/second (TOPS) and a large (28 MiB) software-managed on-chip memory. The TPU's deterministic execution model is a better match to the 99th-percentile response-time requirement of our NN applications than are the time-varying optimizations of CPUs and GPUs (caches, out-of-order execution, multithreading, multiprocessing, prefetching, ...) that help average throughput more than guaranteed latency. The lack of such features helps explain why, despite having myriad MACs and a big memory, the TPU is relatively small and low power. We compare the TPU to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the same datacenters. Our workload, written in the high-level TensorFlow framework, uses production NN applications (MLPs, CNNs, and LSTMs) that represent 95% of our datacenters' NN inference demand. Despite low utilization for some applications, the TPU is on average about 15X - 30X faster than its contemporary GPU or CPU, with TOPS/Watt about 30X - 80X higher. Moreover, using the GPU's GDDR5 memory in the TPU would triple achieved TOPS and raise TOPS/Watt to nearly 70X the GPU and 200X the CPU.

3,067 citations


"A Survey on Agent-based Simulation ..." refers background in this paper

  • ...At such tasks, TPUs can outperform recent CPUs or GPUs by a factor of up to 30 [94]....

    [...]