scispace - formally typeset
Open AccessDissertation

Split array and scalar data caches: a comprehensive study of data cache organization

Reads0
Chats0
TLDR
A split data cache architecture is proposed that will group memory accesses as scalar or array references according to their inherent locality and will subsequently map each group to a dedicated cache partition, to reduce area and power consumed by cache memories while retaining performance gains.
Abstract
Existing cache organization suffers from the inability to distinguish different types of localities, and non-selectively cache all data rather than making any attempt to take special advantage of the locality type. This causes unnecessary movement of data among the levels of the memory hierarchy and increases in miss ratio. In this dissertation I propose a split data cache architecture that will group memory accesses as scalar or array references according to their inherent locality and will subsequently map each group to a dedicated cache partition. In this system, because scalar and array references will no longer negatively affect each other, cache-interference is diminished, delivering better performance. Further improvement is achieved by the introduction of victim cache, prefetching, data flattening and reconfigurability to tune the array and scalar caches for specific application. The most significant contribution of my work is the introduction of novel cache architecture for embedded microprocessor platforms. My proposed cache architecture uses reconfigurability coupled with split data caches to reduce area and power consumed by cache memories while retaining performance gains. My results show excellent reductions in both memory size and memory access times, translating into reduced power consumption. Since there was a huge reduction in miss rates at L-1 caches, further power reduction is achieved by partially or completely shutting down L-2 data or L-2 instruction caches. The saving in cache sizes resulting from these designs can be used for other processor activities including instruction and data prefetching, branch-prediction buffers. The potential benefits of such techniques for embedded applications have been evaluated in my work. I also explore how my cache organization performs for non-numeric data structures. I propose a novel idea called “Data flattening” which is a profile based memory allocation technique to compress sparsely scattered pointer data into regular contiguous memory locations and explore the potentials of my proposed Spit cache organization for data treated with data flattening method.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Superoptimized Memory Subsystems for Streaming Applications

TL;DR: This work shows that it is possible to generate automatically a superoptimized memory subsystem that can be deployed on an FPGA such that it performs better than a general-purpose memory subsystem.
Journal Article

A power efficient cache structure for embedded processors based on the dual cache structure

TL;DR: The cooperative cache system is adopted as the cache structure for the CalmRISC-32 embedded processor that is going to be manufactured by Samsung Electronics Co. with 0.25µm technology.
Proceedings ArticleDOI

Superoptimization of memory subsystems

TL;DR: It is shown that it is possible to discover unusual memory subsystems that provide performance improvements over a typical memory subsystem by Drawing motivation from the superoptimization of instruction sequences, which successfully finds unusually clever instruction sequences for programs.
Book ChapterDOI

Superoptimizing Memory Subsystems for Multiple Objectives

TL;DR: This work considers the automatic determination of application-specific memory subsystems via superoptimization, with the goals of reducing memory access time and of minimizing writes.
References
More filters
Book

Computer Architecture: A Quantitative Approach

TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Proceedings ArticleDOI

MiBench: A free, commercially representative embedded benchmark suite

TL;DR: A new version of SimpleScalar that has been adapted to the ARM instruction set is used to characterize the performance of the benchmarks using configurations similar to current and next generation embedded processors.
Journal ArticleDOI

The SimpleScalar tool set, version 2.0

TL;DR: This document describes release 2.0 of the SimpleScalar tool set, a suite of free, publicly available simulation tools that offer both detailed and high-performance simulation of modern microprocessors.
Journal ArticleDOI

Ubiquitous B-Tree

TL;DR: The major variations of the B-tree are discussed, especially the B+-tree, contrasting the merits and costs of each implementation and illustrating a general purpose access method that uses a B- tree.
Journal ArticleDOI

Cache Memories

TL;DR: Specific aspects of cache memories investigated include: the cache fetch algorithm (demand versus prefetch), the placement and replacement algorithms, line size, store-through versus copy-back updating of main memory, cold-start versus warm-start miss ratios, mulhcache consistency, the effect of input /output through the cache, the behavior of split data/instruction caches, and cache size.
Related Papers (5)