scispace - formally typeset
Proceedings ArticleDOI

Cache-oblivious algorithms

Reads0
Chats0
TLDR
It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.
Abstract
This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT, and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size Z and cache-line length L where Z=/spl Omega/(L/sup 2/) the number of cache misses for an m/spl times/n matrix transpose is /spl Theta/(1+mn/L). The number of cache misses for either an n-point FFT or the sorting of n numbers is /spl Theta/(1+(n/L)(1+log/sub Z/n)). We also give an /spl Theta/(mnp)-work algorithm to multiply an m/spl times/n matrix by an n/spl times/p matrix that incurs /spl Theta/(1+(mn+np+mp)/L+mnp/L/spl radic/Z) cache faults. We introduce an "ideal-cache" model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model. Can be simulated efficiently by LRU replacement. We also provide preliminary empirical results on the effectiveness of cache-oblivious algorithms in practice.

read more

Citations
More filters
Journal ArticleDOI

The Design and Implementation of FFTW3

TL;DR: It is shown that such an approach can yield an implementation of the discrete Fourier transform that is competitive with hand-optimized libraries, and the software structure that makes the current FFTW3 version flexible and adaptive is described.
Journal ArticleDOI

External memory algorithms and data structures: dealing with massive data

TL;DR: The state of the art in the design and analysis of external memory algorithms and data structures, where the goal is to exploit locality in order to reduce the I/O costs is surveyed.
Proceedings ArticleDOI

X-Stream: edge-centric graph processing using streaming partitions

TL;DR: X-Stream is novel in using an edge-centric rather than a vertex-centric implementation of this model, and streaming completely unordered edge lists rather than performing random access, and competes favorably with existing systems for graph processing.
Proceedings ArticleDOI

Sequoia: programming the memory hierarchy

TL;DR: This work has implemented a complete programming system, including a compiler and runtime systems for cell processor-based blade systems and distributed memory clusters, and demonstrates efficient performance running Sequoia programs on both of these platforms.
Book ChapterDOI

Simple linear work suffix array construction

TL;DR: The skew algorithm for suffix array construction over integer alphabets that can be implemented to run in linear time using integer sorting as its only nontrivial subroutine is introduced.
References
More filters
Book

Matrix computations

Gene H. Golub
Book

Introduction to Algorithms

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Journal ArticleDOI

An algorithm for the machine calculation of complex Fourier series

TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Book

Computer Architecture: A Quantitative Approach

TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.
Book

The Design and Analysis of Computer Algorithms

TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.