scispace - formally typeset
Open AccessProceedings ArticleDOI

Exploiting the parallelism of large-scale application-layer networks by adaptive GPU-based simulation

Reads0
Chats0
TLDR
A GPU-based simulator engine that performs all steps of large-scale network simulations on a commodity many-core GPU and adapts its configuration at runtime in order to balance parallelism and overheads to achieve high performance for a given network model and scenario is presented.
Abstract
We present a GPU-based simulator engine that performs all steps of large-scale network simulations on a commodity many-core GPU. Overhead is reduced by avoiding unnecessary data transfers between graphics memory and main memory. On the example of a widely deployed peer-to-peer network, we analyze the parallelism in large-scale application-layer networks, which suggests the use of thousands of concurrent processor cores for simulation. The proposed simulator employs the vast number of parallel cores in modern GPUs to exploit the identified parallelism and enables substantial simulation speedup. The simulator adapts its configuration at runtime in order to balance parallelism and overheads to achieve high performance for a given network model and scenario. A performance evaluation for simulations of networks comprising up to one million peers demonstrates a speedup of up to 19.5 compared with an efficient sequential implementation and shows the effectiveness of the runtime adaptation to different network conditions.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Simian integrated framework for parallel discrete event simulation on GPUs

TL;DR: A new framework integrated in the Simian engine is presented, which allows to make efficient use of GPUs for computationally intense sections of code and allows modellers to offset some or all handlers to the GPU by efficiently grouping and scheduling these handlers.
Journal ArticleDOI

Synchronous speculative simulation of tightly coupled agents in continuous time on CPUs and GPUs

TL;DR: In this article , the authors propose synchronous optimistic synchronization algorithms tailored toward simulations of fine-grained interactions among tightly coupled agents in highly dynamic topologies and present implementations targeting multicore central processing units (CPUs) as well as many-core GPUs.
Proceedings ArticleDOI

Model-Based Concurrency Analysis of Network Simulations

TL;DR: An analytical model is proposed that enables concurrency estimations based on model knowledge and on statistics gathered from sequential simulation runs, and enables insights into the relationship between the topology and communication patterns of the simulated network, and the resulting concurrency.
Proceedings ArticleDOI

Advanced Tutorial: Parallel and Distributed Methods for Scalable Discrete Simulation

TL;DR: In this article , the fundamental notions of parallel and distributed simulation and synchronization algorithms are described and summarized under the constraints of the domains of transportation and spiking neural networks, and current research directions and challenges are discussed in light of the tension between efficiency through specialization and wide applicability through generalization.
Journal ArticleDOI

Transitioning Spiking Neural Network Simulators to Heterogeneous Hardware

TL;DR: This article proposes a transition approach for CPU-based SNN simulators to enable the execution on heterogeneous hardware, with only limited modifications to an existing simulator code base and without changes to model code.
References
More filters
Book ChapterDOI

Kademlia: A Peer-to-Peer Information System Based on the XOR Metric

TL;DR: In this paper, the authors describe a peer-to-peer distributed hash table with provable consistency and performance in a fault-prone environment, which routes queries and locates nodes using a novel XOR-based metric topology.
Proceedings ArticleDOI

Inter-block GPU communication via fast barrier synchronization

TL;DR: This work proposes two approaches for inter-block GPU communication via barrier synchronization: GPU lock-based synchronization andGPU lock-free synchronization and evaluates the efficacy of each approach via a micro-benchmark as well as three well-known algorithms — Fast Fourier Transform, dynamic programming, and bitonic sort.
Journal ArticleDOI

The cost of conservative synchronization in parallel discrete event simulations

TL;DR: It is shown that on large problems—those for which parallel processing is ideally suited— there is often enough parallel workload so that processors are not usually idle, and the method is within a constant factor of optimal.

Efficient Parallel Scan Algorithms for GPUs

TL;DR: This paper describes the design of ecient scan and segmented scan parallel primitives in CUDA for execution on GPUs using a divide-and-conquer approach and demonstrates that this design methodology results in routines that are simple, highly ecient, and free of irregular access patterns that lead to memory bank conicts.
Proceedings ArticleDOI

Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

TL;DR: Initial performance results on simulation of a diffusion process show that DES-style execution on GPGPU runs faster than DES on CPU and also significantly faster than time-stepped simulations on either CPU or GPG PU.
Related Papers (5)