scispace - formally typeset
Journal ArticleDOI

Cache-Oblivious Algorithms

Reads0
Chats0
TLDR
It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.
Abstract
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size M and cache-line length B where M = Ω(B2), the number of cache misses for an m × n matrix transpose is Θ(1 + mn/B). The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ(1 + (n/B)(1 + logM n)). We also give a Θ(mnp)-work algorithm to multiply an m × n matrix by an n × p matrix that incurs Θ(1 + (mn + np + mp)/B + mnp/B√M) cache faults.We introduce an “ideal-cache” model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement. We offer empirical evidence that cache-oblivious algorithms perform well in practice.

read more

Citations
More filters
Book ChapterDOI

Compressed Cache-Oblivious String B-tree

TL;DR: This paper addresses few variants of the well-known prefix-search problem in a dictionary of strings, and provides solutions for the cache-oblivious model which improve the best known results.
Journal ArticleDOI

Cache-Oblivious Buffer Heap and Cache-Efficient Computation of Shortest Paths in Graphs

TL;DR: The notion of a slim data structure that captures the situation when only a limited portion of the cache is available to the data structure to retain data between data structural operations is introduced and a buffer heap automatically adapts to such an environment and supports all operations in O(1/λ + 1/B log2 N/λ) amortized block transfers each when the size of the slim cache is λ.
Proceedings ArticleDOI

Closing the Gap Between Cache-oblivious and Cache-adaptive Analysis

TL;DR: The gap between cache-oblivious and cache-adaptive analysis is closed by showing how to make a smoothed analysis of cache- Adaptive algorithms via random reshuffling of memory fluctuations, and suggesting that cache- obliviousness is a solid foundation for achieving cache- adaptivity when the memory profile is not overly tailored to the algorithm structure.

Cache-Adaptive Analysis

TL;DR: In this paper, the authors present techniques for designing and analyzing algorithms in a cache-adaptive setting, where the RAM available to the algorithm changes over time, and give a simple recipe for determining whether common divide-and-conquer algorithms are optimally cache adaptive.
Book ChapterDOI

Optimizing matrix multiplication with a classifier learning system

TL;DR: This paper develops a generator of sorting routines that uses a classifier learning system to generate high performance libraries for matrix-matrix multiplication and produces matrix multiplication routines that use recursive layouts and several levels of tiling.
References
More filters
Book

Matrix computations

Gene H. Golub
Book

Introduction to Algorithms

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Journal ArticleDOI

An algorithm for the machine calculation of complex Fourier series

TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Book

Computer Architecture: A Quantitative Approach

TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.