scispace - formally typeset
Proceedings ArticleDOI

Exploiting parallelism and structure to accelerate the simulation of chip multi-processors

Reads0
Chats0
TLDR
It is shown that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor, and the power of hardware integration by integrating eight hardware PowerPC cores into a C MP model, achieving a speedup of up to 5.82.
Abstract
Simulation is an important means of evaluating new microarchitectures. Current trends toward chip multiprocessors (CMPs) try the ability of designers to develop efficient simulators. CMP simulation speed can be improved by exploiting parallelism in the CMP simulation model. This may be done by either running the simulation on multiple processors or by integrating multiple processors into the simulation to replace simulated processors. Doing so usually requires tedious manual parallelization or re-design to encapsulate processors. This paper presents techniques to perform automated simulator parallelization and hardware integration for CMP structural models. We show that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor. We demonstrate the power of hardware integration by integrating eight hardware PowerPC cores into a CMP model, achieving a speedup of up to 5.82.

read more

Citations
More filters
Proceedings ArticleDOI

Graphite: A distributed parallel simulator for multicores

TL;DR: This paper introduces the Graphite open-source distributed parallel multicore simulator infrastructure and demonstrates that Graphite can simulate target architectures containing over 1000 cores on ten 8-core servers with near linear speedup.
Proceedings ArticleDOI

CoRAM: an in-fabric memory architecture for FPGA-based computing

TL;DR: A new FPGA memory architecture called Connected RAM (CoRAM) is proposed to serve as a portable bridge between the distributed computation kernels and the external memory interfaces to improve performance and efficiency and to improve an application's portability and scalability.
Proceedings ArticleDOI

Interval simulation: Raising the level of abstraction in architectural simulation

TL;DR: In this paper, the authors propose interval simulation, which takes a completely different approach: interval simulation raises the level of abstraction and replaces the core-level cycle-accurate simulation model by a mechanistic analytical model.
Proceedings ArticleDOI

FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators

TL;DR: FAST as mentioned in this paper is a simulation method for x86 systems that can produce simulators that are orders of magnitude faster than comparable simulators, cycle-accurate, and can model the entire system running unmodified applications and operating systems.
Proceedings ArticleDOI

HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing

TL;DR: This paper describes the scaling techniques that make HAsim able to model a shared-memory multicore system including detailed core pipelines, cache hierarchy, and on-chip network, using a single FPGA, and compares the time-multiplexed approach to a direct implementation.
References
More filters
Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.
Book ChapterDOI

Ptolemy: a framework for simulating and prototyping heterogeneous systems

TL;DR: Ptolemy as discussed by the authors is an environment for simulation and prototyping of heterogeneous systems, which uses object-oriented software technology to model each subsystem in a natural and efficient manner, and to integrate these subsystems into a whole.
Journal ArticleDOI

DSC: scheduling parallel tasks on an unbounded number of processors

TL;DR: A low-complexity heuristic for scheduling parallel tasks on an unbounded number of completely connected processors, named the dominant sequence clustering algorithm (DSC), which guarantees a performance within a factor of 2 of the optimum for general coarse-grain DAG's.
Book

Embedded Multiprocessors: Scheduling and Synchronization

TL;DR: This work presents architectures and design methodologies for parallel systems in embedded DSP applications, and describes unique techniques for optimizing communication and synchronization.

Parallel and distributed simulation of discrete event systems

TL;DR: In the context of conservative LP simulation (Chandy/Misra/Bryant) deadlock avoidance and deadlock detection/recovery strategies, Conservative Time Windows and the Carrier Nullmessage protocol are presented.
Related Papers (5)