Open AccessDissertation
Split array and scalar data caches: a comprehensive study of data cache organization
Krishna M. Kavi,Afrin Naz +1 more
Reads0
Chats0
TLDR
A split data cache architecture is proposed that will group memory accesses as scalar or array references according to their inherent locality and will subsequently map each group to a dedicated cache partition, to reduce area and power consumed by cache memories while retaining performance gains.Abstract:
Existing cache organization suffers from the inability to distinguish different types of localities, and non-selectively cache all data rather than making any attempt to take special advantage of the locality type. This causes unnecessary movement of data among the levels of the memory hierarchy and increases in miss ratio. In this dissertation I propose a split data cache architecture that will group memory accesses as scalar or array references according to their inherent locality and will subsequently map each group to a dedicated cache partition. In this system, because scalar and array references will no longer negatively affect each other, cache-interference is diminished, delivering better performance. Further improvement is achieved by the introduction of victim cache, prefetching, data flattening and reconfigurability to tune the array and scalar caches for specific application.
The most significant contribution of my work is the introduction of novel cache architecture for embedded microprocessor platforms. My proposed cache architecture uses reconfigurability coupled with split data caches to reduce area and power consumed by cache memories while retaining performance gains. My results show excellent reductions in both memory size and memory access times, translating into reduced power consumption. Since there was a huge reduction in miss rates at L-1 caches, further power reduction is achieved by partially or completely shutting down L-2 data or L-2 instruction caches. The saving in cache sizes resulting from these designs can be used for other processor activities including instruction and data prefetching, branch-prediction buffers. The potential benefits of such techniques for embedded applications have been evaluated in my work.
I also explore how my cache organization performs for non-numeric data structures. I propose a novel idea called “Data flattening” which is a profile based memory allocation technique to compress sparsely scattered pointer data into regular contiguous memory locations and explore the potentials of my proposed Spit cache organization for data treated with data flattening method.read more
Citations
More filters
Proceedings ArticleDOI
Superoptimized Memory Subsystems for Streaming Applications
TL;DR: This work shows that it is possible to generate automatically a superoptimized memory subsystem that can be deployed on an FPGA such that it performs better than a general-purpose memory subsystem.
Journal Article
A power efficient cache structure for embedded processors based on the dual cache structure
TL;DR: The cooperative cache system is adopted as the cache structure for the CalmRISC-32 embedded processor that is going to be manufactured by Samsung Electronics Co. with 0.25µm technology.
Proceedings ArticleDOI
Superoptimization of memory subsystems
TL;DR: It is shown that it is possible to discover unusual memory subsystems that provide performance improvements over a typical memory subsystem by Drawing motivation from the superoptimization of instruction sequences, which successfully finds unusually clever instruction sequences for programs.
Book ChapterDOI
Superoptimizing Memory Subsystems for Multiple Objectives
TL;DR: This work considers the automatic determination of application-specific memory subsystems via superoptimization, with the goals of reducing memory access time and of minimizing writes.
References
More filters
Book
Computer Architecture: A Quantitative Approach
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Proceedings ArticleDOI
MiBench: A free, commercially representative embedded benchmark suite
Matthew R. Guthaus,Jeff Ringenberg,Daniel J. Ernst,Todd Austin,Trevor Mudge,Richard B. Brown +5 more
TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.
Journal ArticleDOI
The SimpleScalar tool set, version 2.0
Doug Burger,Todd Austin +1 more
TL;DR: This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors.
Journal ArticleDOI
Ubiquitous B-Tree
TL;DR: The major variations of the B-tree are discussed, especially the B+-tree, contrasting the merits and costs of each implementation and illustrating a general purpose access method that uses a B- tree.
Journal ArticleDOI
Cache Memories
TL;DR: Specific aspects of cache memories investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size.