An Evaluation of High-Level Mechanistic Core Models
Reads0
Chats0
TLDR
This article explores, analyze, and compares the accuracy and simulation speed of high-abstraction core models, a potential solution to slow cycle-level simulation, and introduces the instruction-window centric (IW-centric) core model, a new mechanistic core model that bridges the gap between interval simulation and cycle-accurate simulation by enabling high-speed simulations with higher levels of detail.Abstract:
Large core counts and complex cache hierarchies are increasing the burden placed on commonly used simulation and modeling techniques. Although analytical models provide fast results, they do not apply to complex, many-core shared-memory systems. In contrast, detailed cycle-level simulation can be accurate but also tends to be slow, which limits the number of configurations that can be evaluated. A middle ground is needed that provides for fast simulation of complex many-core processors while still providing accurate results. In this article, we explore, analyze, and compare the accuracy and simulation speed of high-abstraction core models as a potential solution to slow cycle-level simulation. We describe a number of enhancements to interval simulation to improve its accuracy while maintaining simulation speed. In addition, we introduce the instruction-window centric (IW-centric) core model, a new mechanistic core model that bridges the gap between interval simulation and cycle-accurate simulation by enabling high-speed simulations with higher levels of detail. We also show that using accurate core models like these are important for memory subsystem studies, and that simple, naive models, like a one-IPC core model, can lead to misleading and incorrect results and conclusions in practical design studies. Validation against real hardware shows good accuracy, with an average single-core error of 11.1p and a maximum of 18.8p for the IW-centric model with a 1.5× slowdown compared to interval simulation.read more
Citations
More filters
Journal ArticleDOI
Estimation of energy consumption in machine learning
TL;DR: Energy consumption has been widely studied in the computer architecture field for decades, and while the adoption of energy as a metric in machine learning is emerging, the majority of research is stil ...
Proceedings ArticleDOI
Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads
TL;DR: An in-depth data-type-aware characterization of graph processing workloads on a simulated multi-core architecture finds that the load-load dependency chains involving different application data types form the primary bottleneck in achieving a high memory-level parallelism.
Journal ArticleDOI
Energy efficiency of VM consolidation in IaaS clouds
TL;DR: This work proposes a DVFS-based heuristic TRP-FS to consolidate virtual clusters on physical servers to save energy while guarantee job SLAs and proves the most efficient frequency that minimizes the energy consumption, and the upper bound of energy saving through DVFS techniques.
Proceedings ArticleDOI
The load slice core microarchitecture
TL;DR: The Load Slice Core extends the efficient in-order, stall-on-use core with a second in- order pipeline that enables memory accesses and address-generating instructions to bypass stalled instructions in the main pipeline, thus providing an alternative direction for future many-core processors.
Proceedings ArticleDOI
DeSC: decoupled supply-compute communication management for heterogeneous architectures
TL;DR: This work proposes Decoupled Supply-Compute (DeSC) as a way to attack memory bottlenecks automatically, while maintaining good portability and low complexity, and updates and expands onDecoupled Access Execute approaches with increased specialization and automatic compiler support.
References
More filters
Journal ArticleDOI
The gem5 simulator
Nathan Binkert,Bradford M. Beckmann,Gabriel Black,Steven K. Reinhardt,Ali G. Saidi,Arkaprava Basu,Joel Hestness,Derek R. Hower,Tushar Krishna,Somayeh Sardashti,Rathijit Sen,Korey Sewell,Muhammad Shoaib,Nilay Vaish,Mark D. Hill,Darien Wood +15 more
TL;DR: The high level of collaboration on the gem5 project, combined with the previous success of the component parts and a liberal BSD-like license, make gem5 a valuable full-system simulation tool.
Journal ArticleDOI
Pin: building customized program analysis tools with dynamic instrumentation
Chi-Keung Luk,Robert Cohn,Robert Muth,Harish Patil,Artur Klauser,Geoff Lowney,Steven Wallace,Vijay Janapa Reddi,Kim Hazelwood +8 more
TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.
Proceedings ArticleDOI
The SPLASH-2 programs: characterization and methodological considerations
TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.
Journal ArticleDOI
A Proof for the Queuing Formula: L = λW
TL;DR: In this paper, it was shown that if the three means are finite and the corresponding stochastic processes strictly stationary, and if the arrival process is metrically transitive with nonzero mean, then L = λW.
Proceedings ArticleDOI
Automatically characterizing large scale program behavior
TL;DR: This work quantifies the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explores the large scale behavior of several programs, and develops a set of algorithms based on clustering capable of analyzing this behavior.