Understanding some simple processor-performance limits

doi:10.1147/RD.413.0215

Journal ArticleDOI

Understanding some simple processor-performance limits

Philip G. Emma

- 01 May 1997 -

Ibm Journal of Research and Development

- Vol. 41, Iss: 3, pp 215-232

TLDR

This paper shows that cycles per instruction (CPI) is a simple dot product of event frequencies and event penalties, and that it is far more intuitive than its more popular cousin, instructions per cycle (IPC).

Abstract:

To understand processor performance, it is essential to use metrics that are intuitive, and it is essential to be familiar with a few aspects of a simple scalar pipeline before attempting to understand more complex structures. This paper shows that cycles per instruction (CPI) is a simple dot product of event frequencies and event penalties, and that it is far more intuitive than its more popular cousin, instructions per cycle (IPC). CPI is separable into three components that account for the inherent work, the pipeline, and the memory hierarchy, respectively. Each of these components is a fixed upper limit, or “hard bound,” for the superscalar equivalent components. In the last decade, the memory-hierarchy component has become the most dominant of the three components, and in the next decade, queueing at the memory data bus will become a very significant part of this. In a reaction to this trend, an evolution in bus protocols will ensue. This paper provides a general sketch of those protocols. An underlying theme in this paper is that power constraints have been a driving force in computer architecture since the first computers were built fifty years ago. In CMOS technology, power constraints will shape future microarchitecture in a positive and surprising way. Specifically, a resurgence of the RISC approach is expected in high-performance design which will cause the client and server microarchitectures to converge.

Understanding some simple processor-performance limits

Citations

Scheduling heterogeneous multi-cores through Performance Impact Estimation (PIE)

An Evaluation of High-Level Mechanistic Core Models

Construction and use of linear regression models for processor performance analysis

A mechanistic performance model for superscalar out-of-order processors

Interval simulation: Raising the level of abstraction in architectural simulation

References

An efficient algorithm for exploiting multiple arithmetic units

An efficient algorithm for exploiting multiple arithmetic units

Computer Architecture and Organization

Implementation of precise interrupts in pipelined processors

Performance trends in high-end processors

Related Papers (5)

Computer Architecture: A Quantitative Approach

Predicting inter-thread cache contention on a chip multi-processor architecture

Simics: A full system simulation platform

Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Automatically characterizing large scale program behavior