Exploiting parallelism and structure to accelerate the simulation of chip multi-processors

doi:10.1109/HPCA.2006.1598110

Proceedings ArticleDOI

Exploiting parallelism and structure to accelerate the simulation of chip multi-processors

David A. Penry, +6 more

- pp 29-40

Chats0

TLDR

It is shown that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor, and the power of hardware integration by integrating eight hardware PowerPC cores into a C MP model, achieving a speedup of up to 5.82.

Abstract:

Simulation is an important means of evaluating new microarchitectures. Current trends toward chip multiprocessors (CMPs) try the ability of designers to develop efficient simulators. CMP simulation speed can be improved by exploiting parallelism in the CMP simulation model. This may be done by either running the simulation on multiple processors or by integrating multiple processors into the simulation to replace simulated processors. Doing so usually requires tedious manual parallelization or re-design to encapsulate processors. This paper presents techniques to perform automated simulator parallelization and hardware integration for CMP structural models. We show that automated parallelization can achieve an 7.60 speedup for a 16-processor CMP model on a conventional 4-processor shared-memory multiprocessor. We demonstrate the power of hardware integration by integrating eight hardware PowerPC cores into a CMP model, achieving a speedup of up to 5.82.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Graphite: A distributed parallel simulator for multicores

Jason E. Miller, +7 more

TL;DR: This paper introduces the Graphite open-source distributed parallel multicore simulator infrastructure and demonstrates that Graphite can simulate target architectures containing over 1000 cores on ten 8-core servers with near linear speedup.

...read moreread less

Proceedings ArticleDOI

CoRAM: an in-fabric memory architecture for FPGA-based computing

Eric S. Chung, +2 more

TL;DR: A new FPGA memory architecture called Connected RAM (CoRAM) is proposed to serve as a portable bridge between the distributed computation kernels and the external memory interfaces to improve performance and efficiency and to improve an application's portability and scalability.

...read moreread less

Proceedings ArticleDOI

Interval simulation: Raising the level of abstraction in architectural simulation

Davy Genbrugge, +2 more

TL;DR: In this paper, the authors propose interval simulation, which takes a completely different approach: interval simulation raises the level of abstraction and replaces the core-level cycle-accurate simulation model by a mechanistic analytical model.

...read moreread less

Proceedings ArticleDOI

FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators

Derek Chiou, +7 more

TL;DR: FAST as mentioned in this paper is a simulation method for x86 systems that can produce simulators that are orders of magnitude faster than comparable simulators, cycle-accurate, and can model the entire system running unmodified applications and operating systems.

...read moreread less

Proceedings ArticleDOI

HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing

Michael Pellauer, +4 more

TL;DR: This paper describes the scaling techniques that make HAsim able to model a shared-memory multicore system including detailed core pipelines, cache hierarchy, and on-chip network, using a single FPGA, and compares the time-multiplexed approach to a direct implementation.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

Steven Cameron Woo, +4 more

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.

...read moreread less

Book ChapterDOI

Ptolemy: a framework for simulating and prototyping heterogeneous systems

J.T. Buck, +3 more

- 01 Jun 2001 -

International Journal in Computer Simula...

TL;DR: Ptolemy as discussed by the authors is an environment for simulation and prototyping of heterogeneous systems, which uses object-oriented software technology to model each subsystem in a natural and efficient manner, and to integrate these subsystems into a whole.

...read moreread less

Journal ArticleDOI

DSC: scheduling parallel tasks on an unbounded number of processors

Tao Yang, +1 more

- 01 Sep 1994 -

IEEE Transactions on Parallel and Distri...

TL;DR: A low-complexity heuristic for scheduling parallel tasks on an unbounded number of completely connected processors, named the dominant sequence clustering algorithm (DSC), which guarantees a performance within a factor of 2 of the optimum for general coarse-grain DAG's.

...read moreread less

Book

Embedded Multiprocessors: Scheduling and Synchronization

Sundararajan Sriram, +1 more

TL;DR: This work presents architectures and design methodologies for parallel systems in embedded DSP applications, and describes unique techniques for optimizing communication and synchronization.

...read moreread less

Parallel and distributed simulation of discrete event systems

Alois Ferscha, +1 more

TL;DR: In the context of conservative LP simulation (Chandy/Misra/Bryant) deadlock avoidance and deadlock detection/recovery strategies, Conservative Time Windows and the Carrier Nullmessage protocol are presented.

...read moreread less