scispace - formally typeset
Journal ArticleDOI

Understanding some simple processor-performance limits

Philip G. Emma
- 01 May 1997 - 
- Vol. 41, Iss: 3, pp 215-232
TLDR
This paper shows that cycles per instruction (CPI) is a simple dot product of event frequencies and event penalties, and that it is far more intuitive than its more popular cousin, instructions per cycle (IPC).
Abstract
To understand processor performance, it is essential to use metrics that are intuitive, and it is essential to be familiar with a few aspects of a simple scalar pipeline before attempting to understand more complex structures. This paper shows that cycles per instruction (CPI) is a simple dot product of event frequencies and event penalties, and that it is far more intuitive than its more popular cousin, instructions per cycle (IPC). CPI is separable into three components that account for the inherent work, the pipeline, and the memory hierarchy, respectively. Each of these components is a fixed upper limit, or “hard bound,” for the superscalar equivalent components. In the last decade, the memory-hierarchy component has become the most dominant of the three components, and in the next decade, queueing at the memory data bus will become a very significant part of this. In a reaction to this trend, an evolution in bus protocols will ensue. This paper provides a general sketch of those protocols. An underlying theme in this paper is that power constraints have been a driving force in computer architecture since the first computers were built fifty years ago. In CMOS technology, power constraints will shape future microarchitecture in a positive and surprising way. Specifically, a resurgence of the RISC approach is expected in high-performance design which will cause the client and server microarchitectures to converge.

read more

Citations
More filters
Journal ArticleDOI

Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE)

TL;DR: This paper proposes Performance Impact Estimation (PIE) as a mechanism to predict which workload-to-core mapping is likely to provide the best performance and shows that it requires limited hardware support and can improve system performance by an average of 5.5% over recent state-of-the-art scheduling proposals and by 8.7% over a sampling-based scheduling policy.
Journal ArticleDOI

An Evaluation of High-Level Mechanistic Core Models

TL;DR: This article explores, analyze, and compares the accuracy and simulation speed of high-abstraction core models, a potential solution to slow cycle-level simulation, and introduces the instruction-window centric (IW-centric) core model, a new mechanistic core model that bridges the gap between interval simulation and cycle-accurate simulation by enabling high-speed simulations with higher levels of detail.
Proceedings ArticleDOI

Construction and use of linear regression models for processor performance analysis

TL;DR: This work builds linear regression models that relate processor performance to micro-architectural parameters, using simulation based experiments and obtains good approximate models using an iterative process in which Akaike's information criteria is used to extract a good linear model from a small set of simulations.
Journal ArticleDOI

A mechanistic performance model for superscalar out-of-order processors

TL;DR: The mechanistic model provides several advantages over prior modeling approaches, and, when estimating performance, it differs from detailed simulation of a 4-wide out-of-order processor by an average of 7%.
Proceedings ArticleDOI

Interval simulation: Raising the level of abstraction in architectural simulation

TL;DR: In this paper, the authors propose interval simulation, which takes a completely different approach: interval simulation raises the level of abstraction and replaces the core-level cycle-accurate simulation model by a mechanistic analytical model.
References
More filters
Book

An efficient algorithm for exploiting multiple arithmetic units

TL;DR: In this article, the authors describe the methods employed in the floating-point area of the System/360 Model 91 to exploit the existence of multiple execution units and register tagging schemes.
Journal ArticleDOI

An efficient algorithm for exploiting multiple arithmetic units

TL;DR: The methods employed in the floating-point area of the System/360 Model 91 to exploit the existence of multiple execution units are described, which permits simultaneous execution of independent instructions while preserving the essential precedences inherent in the instruction stream.
Book

Computer Architecture and Organization

John P. Hayes
TL;DR: Computer Architecture and Organization, 3rd edition, provides a comprehensive and up-to-date view of the architecture and internal organization of computers from a mainly hardware perspective.
Proceedings ArticleDOI

Implementation of precise interrupts in pipelined processors

TL;DR: Five solutions to the precise interrupt problem in pipelined processors are described and evaluated, with results showing that, at best, the first solution results in a performance degradation of about 16%.
Journal ArticleDOI

Performance trends in high-end processors

TL;DR: In this article, a first order cycle time model performance trends and limits for both bipolar and CMOS processors are projected based on a first-order cycle-time model, and the performance limits of bipolar and room temperature CMOS uniprocessors are shown.
Related Papers (5)