Journal ArticleDOI
Cache-Oblivious Algorithms
TLDR
It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.Abstract:
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size M and cache-line length B where M = Ω(B2), the number of cache misses for an m × n matrix transpose is Θ(1 + mn/B). The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ(1 + (n/B)(1 + logM n)). We also give a Θ(mnp)-work algorithm to multiply an m × n matrix by an n × p matrix that incurs Θ(1 + (mn + np + mp)/B + mnp/B√M) cache faults.We introduce an “ideal-cache” model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement. We offer empirical evidence that cache-oblivious algorithms perform well in practice.read more
Citations
More filters
Proceedings ArticleDOI
The pochoir stencil compiler
TL;DR: The Pochoir stencil compiler allows a programmer to write a simple specification of a stencil in a domain-specific stencil language embedded in C++ which the Pochir compiler then translates into high-performing Cilk code that employs an efficient parallel cache-oblivious algorithm.
Proceedings ArticleDOI
Implicit and explicit optimizations for stencil computations
TL;DR: Several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor are examined, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure.
Journal ArticleDOI
Communication lower bounds and optimal algorithms for numerical linear algebra
TL;DR: This paper describes lower bounds on communication in linear algebra, and presents lower bounds for Strassen-like algorithms, and for iterative methods, in particular Krylov subspace methods applied to sparse matrices.
Cache-Oblivious Algorithms and Data Structures
TL;DR: A recent body of work has developed cache-oblivious algorithms and data structures that perform as well or nearly as well as standard external-memory structures which require knowledge of the cache/memory size and block transfer size.
Proceedings ArticleDOI
Cache-oblivious string B-trees
TL;DR: This paper presents a cache-oblivious string B-tree (COSB-tree) data structure that is efficient in all these ways: searches asymptotically optimally and inserts and deletes nearly optimally, and maintains an index whose size is proportional to the front-compressed size of the dictionary.
References
More filters
Journal ArticleDOI
Amortized efficiency of list update and paging rules
TL;DR: This article shows that move-to-front is within a constant factor of optimum among a wide class of list maintenance rules, and analyzes the amortized complexity of LRU, showing that its efficiency differs from that of the off-line paging rule by a factor that depends on the size of fast memory.
Proceedings ArticleDOI
FFTW: an adaptive software architecture for the FFT
Matteo Frigo,Steven G. Johnson +1 more
TL;DR: An adaptive FFT program that tunes the computation automatically for any particular hardware, and tests show that FFTW's self-optimizing approach usually yields significantly better performance than all other publicly available software.
Book
Sorting and Searching
TL;DR: The first revision of this third volume is a survey of classical computer techniques for sorting and searching that extends the treatment of data structures to consider both large and small databases and internal and external memories.
Journal ArticleDOI
A study of replacement algorithms for a virtual-storage computer
TL;DR: One of the basic limitations of a digital computer is the size of its available memory; an approach that permits the programmer to use a sufficiently large address range can accomplish this objective, assuming that means are provided for automatic execution of the memory-overlay functions.
Book
Computer Architecture: A Quantitative Approach, 2nd Edition
TL;DR: A quantitative approach to computer architecture a quantitative approach 5th edition computer architecture quantitative approach solution manual computer Architecture quantitative approach solutions manual computer architecture an quantitative approach 3rd editionComputer architecture, fifth edition.