Journal ArticleDOI
Cache-Oblivious Algorithms
Reads0
Chats0
TLDR
It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.Abstract:
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size M and cache-line length B where M = Ω(B2), the number of cache misses for an m × n matrix transpose is Θ(1 + mn/B). The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ(1 + (n/B)(1 + logM n)). We also give a Θ(mnp)-work algorithm to multiply an m × n matrix by an n × p matrix that incurs Θ(1 + (mn + np + mp)/B + mnp/B√M) cache faults.We introduce an “ideal-cache” model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement. We offer empirical evidence that cache-oblivious algorithms perform well in practice.read more
Citations
More filters
Proceedings ArticleDOI
High-performance data structures for de novo assembly of genomes: cache oblivious generic programming
TL;DR: This work proposes a novel set of cache oblivious generic data structures for serial, multithreaded and distributed processing of high-throughput sequencing data for the creation of de Bruijn or k-mer graphs towards their usage in de novo assembly and related HTS data analytics problems.
Journal ArticleDOI
Cache simulation for irregular memory traffic on multi-core CPUs: Case study on performance models for sparse matrix–vector multiplication
TL;DR: A method for estimating volumes of data traffic caused by irregular, parallel computations on multi-core CPUs with memory hierarchies containing both private and shared caches is presented and a performance model based on these estimates that applies to bandwidth-limited computations is described.
Exploiting locality and parallelism with hierarchically tiled arrays
TL;DR: A set of tile operations is shown which leads to a natural and easy implementation of different algorithms in parallel and in sequential with higher clarity and smaller size and it is shown that the HTA codes needs less programming effort with a negligible effect on performance.
Posted Content
Balanced Partitioning of Several Cache-Oblivious Algorithms
Yuan Tang,Weiguo Gao +1 more
TL;DR: This work proposes a novel way of partitioning a cache-oblivious algorithm to achieve perfect strong scaling on an arbitrary number, even a prime number, of processors within a certain range in a shared-memory setting and provides an almost exact solution to the open problem on parallelizing Strassen.
Journal ArticleDOI
Resilient Dynamic Programming
TL;DR: This work obtains the first resilient algorithms for a broad range of dynamic programming problems, devising a general framework that can be applied to both iterative and recursive implementations.
References
More filters
Book
Introduction to Algorithms
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Journal ArticleDOI
An algorithm for the machine calculation of complex Fourier series
J.W. Cooley,John W. Tukey +1 more
TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Book
Computer Architecture: A Quantitative Approach
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.