An Evaluation of High-Level Mechanistic Core Models

doi:10.1145/2629677

Open AccessJournal ArticleDOI

An Evaluation of High-Level Mechanistic Core Models

Trevor E. Carlson, +4 more

- 25 Aug 2014 -

ACM Transactions on Architecture and Cod...

- Vol. 11, Iss: 3, pp 28

Chats0

TLDR

This article explores, analyze, and compares the accuracy and simulation speed of high-abstraction core models, a potential solution to slow cycle-level simulation, and introduces the instruction-window centric (IW-centric) core model, a new mechanistic core model that bridges the gap between interval simulation and cycle-accurate simulation by enabling high-speed simulations with higher levels of detail.

Abstract:

Large core counts and complex cache hierarchies are increasing the burden placed on commonly used simulation and modeling techniques. Although analytical models provide fast results, they do not apply to complex, many-core shared-memory systems. In contrast, detailed cycle-level simulation can be accurate but also tends to be slow, which limits the number of configurations that can be evaluated. A middle ground is needed that provides for fast simulation of complex many-core processors while still providing accurate results. In this article, we explore, analyze, and compare the accuracy and simulation speed of high-abstraction core models as a potential solution to slow cycle-level simulation. We describe a number of enhancements to interval simulation to improve its accuracy while maintaining simulation speed. In addition, we introduce the instruction-window centric (IW-centric) core model, a new mechanistic core model that bridges the gap between interval simulation and cycle-accurate simulation by enabling high-speed simulations with higher levels of detail. We also show that using accurate core models like these are important for memory subsystem studies, and that simple, naive models, like a one-IPC core model, can lead to misleading and incorrect results and conclusions in practical design studies. Validation against real hardware shows good accuracy, with an average single-core error of 11.1p and a maximum of 18.8p for the IW-centric model with a 1.5× slowdown compared to interval simulation.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Estimation of energy consumption in machine learning

Eva García-Martín, +3 more

- 01 Dec 2019 -

Journal of Parallel and Distributed Comp...

TL;DR: Energy consumption has been widely studied in the computer architecture field for decades, and while the adoption of energy as a metric in machine learning is emerging, the majority of research is stil ...

...read moreread less

Proceedings ArticleDOI

Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads

Abanti Basak, +7 more

TL;DR: An in-depth data-type-aware characterization of graph processing workloads on a simulated multi-core architecture finds that the load-load dependency chains involving different application data types form the primary bottleneck in achieving a high memory-level parallelism.

...read moreread less

Journal ArticleDOI

Energy efficiency of VM consolidation in IaaS clouds

Fei Teng, +4 more

- 01 Feb 2017 -

The Journal of Supercomputing

TL;DR: This work proposes a DVFS-based heuristic TRP-FS to consolidate virtual clusters on physical servers to save energy while guarantee job SLAs and proves the most efficient frequency that minimizes the energy consumption, and the upper bound of energy saving through DVFS techniques.

...read moreread less

Proceedings ArticleDOI

The load slice core microarchitecture

Trevor E. Carlson, +4 more

TL;DR: The Load Slice Core extends the efficient in-order, stall-on-use core with a second in- order pipeline that enables memory accesses and address-generating instructions to bypass stalled instructions in the main pipeline, thus providing an alternative direction for future many-core processors.

...read moreread less

Proceedings ArticleDOI

DeSC: decoupled supply-compute communication management for heterogeneous architectures

Tae Jun Ham, +2 more

TL;DR: This work proposes Decoupled Supply-Compute (DeSC) as a way to attack memory bottlenecks automatically, while maintaining good portability and low complexity, and updates and expands onDecoupled Access Execute approaches with increased specialization and automatic compiler support.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Pin: building customized program analysis tools with dynamic instrumentation

Chi-Keung Luk, +8 more

TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.

...read moreread less

Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

Steven Cameron Woo, +4 more

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.

...read moreread less

Journal ArticleDOI

A Proof for the Queuing Formula: L = λW

John D. C. Little

- 01 Jun 1961 -

Operations Research

TL;DR: In this paper, it was shown that if the three means are finite and the corresponding stochastic processes strictly stationary, and if the arrival process is metrically transitive with nonzero mean, then L = λW.

...read moreread less

Proceedings ArticleDOI

Automatically characterizing large scale program behavior

Timothy Sherwood, +3 more

TL;DR: This work quantifies the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explores the large scale behavior of several programs, and develops a set of algorithms based on clustering capable of analyzing this behavior.

...read moreread less

Collapse

An Evaluation of High-Level Mechanistic Core Models

Citations

Estimation of energy consumption in machine learning

Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads

Energy efficiency of VM consolidation in IaaS clouds

The load slice core microarchitecture

DeSC: decoupled supply-compute communication management for heterogeneous architectures

References

The gem5 simulator

Pin: building customized program analysis tools with dynamic instrumentation

The SPLASH-2 programs: characterization and methodological considerations

A Proof for the Queuing Formula: L = λW

Automatically characterizing large scale program behavior

Related Papers (5)

McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

The gem5 simulator

The SPLASH-2 programs: characterization and methodological considerations

Pin: building customized program analysis tools with dynamic instrumentation

Automatically characterizing large scale program behavior