scispace - formally typeset
Journal ArticleDOI

Experimental evaluation of on-chip microprocessor cache memories

Mark D. Hill, +1 more
- Vol. 12, Iss: 3, pp 158-166
Reads0
Chats0
TLDR
This paper uses trace driven simulation to study design tradeoffs for small (on-chip) caches, and finds that general purpose caches of 64 bytes (net size) are marginally useful in some cases, while 1024-byte caches perform fairly well.
Abstract
Advances in integrated circuit density are permitting the implementation on a single chip of functions and performance enhancements beyond those of a basic processors. One performance enhancement of proven value is a cache memory; placing a cache on the processor chip can reduce both mean memory access time and bus traffic. In this paper we use trace driven simulation to study design tradeoffs for small (on-chip) caches. Miss ratio and traffic ratio (bus traffic) are the metrics for cache performance. Particular attention is paid to sub-block caches (also known as sector caches), in which address tags are associated with blocks, each of which contains multiple sub-blocks; sub-blocks are the transfer unit. Using traces from two 16-bit architectures (Z8000, PDP-11) and two 32-bit architectures (VAX-11, System/370), we find that general purpose caches of 64 bytes (net size) are marginally useful in some cases, while 1024-byte caches perform fairly well; typical miss and traffic ratios for a 1024 byte (net size) cache, 4-way set associative with 8 byte blocks are: PDP-11: .039, .156, Z8000: .015, .060, VAX 11: .080, .160, Sys/370: .244, .489. (These figures are based on traces of user programs and the performance obtained in practice is likely to be less good.) The use of sub-blocks allows tradeoffs between miss ratio and traffic ratio for a given cache size. Load forward is quite useful. Extensive simulation results are presented.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Memory Bandwidth Limitations of Future Microprocessors

TL;DR: It is predicted that off-chip accesses will be so expensive that all system memory will reside on one or more processor chips, and pin bandwidth limitations will make more complex on-chip caches cost-effective.
Journal ArticleDOI

An analytical cache model

TL;DR: An analytical cache model is developed that gives miss rates for a given trace as a function of cache size, degree of associativity, block size, subblock size, multiprogramming level, task switch interval, and observation interval.
Journal ArticleDOI

Trace-driven memory simulation: a survey

TL;DR: A survey and analysis of trace-driven memory simulation tools can be found in this article, where the authors discuss the strengths and weaknesses of different approaches and show that no single method is best when all criteria, including accuracy, speed, memory, flexibility, portability, expense, and ease of use are considered.
Journal ArticleDOI

A class of compatible cache consistency protocols and their support by the IEEE futurebus

TL;DR: This paper defines a class of compatible consistency protocols supported by the current IEEE Futurebus design, referred to as the MOESI class of protocols, which has the property that any system component can select (dynamically) any action permitted by any protocol in the class, and be assured that consistency is maintained throughout the system.
Proceedings ArticleDOI

Efficiently enabling conventional block sizes for very large die-stacked DRAM caches

TL;DR: Die-stacking technology enables multiple layers of DRAM to be integrated with multicore processors, but a promising use of stacked DRAM is as a cache, since its capacity is insufficient to be all of main memory.
References
More filters
Journal ArticleDOI

Cache Memories

TL;DR: Specific aspects of cache memories investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size.
Journal ArticleDOI

Evaluation techniques for storage hierarchies

TL;DR: A new and efficient method of determining, in one pass of an address trace, performance measures for a large class of demand-paged, multilevel storage systems utilizing a variety of mapping schemes and replacement algorithms.
Proceedings ArticleDOI

Using cache memory to reduce processor-memory traffic

TL;DR: It is demonstrated that a cache exploiting primarily temporal locality (look-behind) can indeed reduce traffic to memory greatly, and introduce an elegant solution to the cache coherency problem.