scispace - formally typeset
Open AccessJournal ArticleDOI

An Evaluation of High-Level Mechanistic Core Models

Reads0
Chats0
TLDR
This article explores, analyze, and compares the accuracy and simulation speed of high-abstraction core models, a potential solution to slow cycle-level simulation, and introduces the instruction-window centric (IW-centric) core model, a new mechanistic core model that bridges the gap between interval simulation and cycle-accurate simulation by enabling high-speed simulations with higher levels of detail.
Abstract
Large core counts and complex cache hierarchies are increasing the burden placed on commonly used simulation and modeling techniques. Although analytical models provide fast results, they do not apply to complex, many-core shared-memory systems. In contrast, detailed cycle-level simulation can be accurate but also tends to be slow, which limits the number of configurations that can be evaluated. A middle ground is needed that provides for fast simulation of complex many-core processors while still providing accurate results. In this article, we explore, analyze, and compare the accuracy and simulation speed of high-abstraction core models as a potential solution to slow cycle-level simulation. We describe a number of enhancements to interval simulation to improve its accuracy while maintaining simulation speed. In addition, we introduce the instruction-window centric (IW-centric) core model, a new mechanistic core model that bridges the gap between interval simulation and cycle-accurate simulation by enabling high-speed simulations with higher levels of detail. We also show that using accurate core models like these are important for memory subsystem studies, and that simple, naive models, like a one-IPC core model, can lead to misleading and incorrect results and conclusions in practical design studies. Validation against real hardware shows good accuracy, with an average single-core error of 11.1p and a maximum of 18.8p for the IW-centric model with a 1.5× slowdown compared to interval simulation.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Estimation of energy consumption in machine learning

TL;DR: Energy consumption has been widely studied in the computer architecture field for decades, and while the adoption of energy as a metric in machine learning is emerging, the majority of research is stil ...
Proceedings ArticleDOI

Analysis and Optimization of the Memory Hierarchy for Graph Processing Workloads

TL;DR: An in-depth data-type-aware characterization of graph processing workloads on a simulated multi-core architecture finds that the load-load dependency chains involving different application data types form the primary bottleneck in achieving a high memory-level parallelism.
Journal ArticleDOI

Energy efficiency of VM consolidation in IaaS clouds

TL;DR: This work proposes a DVFS-based heuristic TRP-FS to consolidate virtual clusters on physical servers to save energy while guarantee job SLAs and proves the most efficient frequency that minimizes the energy consumption, and the upper bound of energy saving through DVFS techniques.
Proceedings ArticleDOI

The load slice core microarchitecture

TL;DR: The Load Slice Core extends the efficient in-order, stall-on-use core with a second in- order pipeline that enables memory accesses and address-generating instructions to bypass stalled instructions in the main pipeline, thus providing an alternative direction for future many-core processors.
Proceedings ArticleDOI

DeSC: decoupled supply-compute communication management for heterogeneous architectures

TL;DR: This work proposes Decoupled Supply-Compute (DeSC) as a way to attack memory bottlenecks automatically, while maintaining good portability and low complexity, and updates and expands onDecoupled Access Execute approaches with increased specialization and automatic compiler support.
References
More filters
Journal ArticleDOI

Pin: building customized program analysis tools with dynamic instrumentation

TL;DR: The goals are to provide easy-to-use, portable, transparent, and efficient instrumentation, and to illustrate Pin's versatility, two Pintools in daily use to analyze production software are described.
Proceedings ArticleDOI

The SPLASH-2 programs: characterization and methodological considerations

TL;DR: This paper quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well, including the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality.
Journal ArticleDOI

A Proof for the Queuing Formula: L = λW

TL;DR: In this paper, it was shown that if the three means are finite and the corresponding stochastic processes strictly stationary, and if the arrival process is metrically transitive with nonzero mean, then L = λW.
Proceedings ArticleDOI

Automatically characterizing large scale program behavior

TL;DR: This work quantifies the effectiveness of Basic Block Vectors in capturing program behavior across several different architectural metrics, explores the large scale behavior of several programs, and develops a set of algorithms based on clustering capable of analyzing this behavior.
Related Papers (5)