Journal ArticleDOI
Cache-Oblivious Algorithms
Reads0
Chats0
TLDR
It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.Abstract:
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size M and cache-line length B where M = Ω(B2), the number of cache misses for an m × n matrix transpose is Θ(1 + mn/B). The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ(1 + (n/B)(1 + logM n)). We also give a Θ(mnp)-work algorithm to multiply an m × n matrix by an n × p matrix that incurs Θ(1 + (mn + np + mp)/B + mnp/B√M) cache faults.We introduce an “ideal-cache” model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement. We offer empirical evidence that cache-oblivious algorithms perform well in practice.read more
Citations
More filters
Journal ArticleDOI
Algorithmic cache of sorted tables for feature selection: Speeding up methods based on consistency and information theory measures
TL;DR: This paper proposes the concept of an algorithmic cache, which stores sorted tables to speed up the access to example information and reduces computation time and it is competitive with hash table structures.
Proceedings ArticleDOI
Deriving parametric multi-way recursive divide-and-conquer dynamic programming algorithms using polyhedral compilers
Mohammad Mahdi Javanmard,Zafar Ahmad,Martin Kong,Louis-Noël Pouchet,Rezaul Chowdhury,Robert W. Harrison +5 more
TL;DR: A novel framework to automatically derive highly efficient parametric multi-way recursive divide&conquer algorithms for a class of dynamic programming (DP) problems where the value of R can be changed on the fly for every level of recursion.
Proceedings ArticleDOI
Cache-oblivious scheduling of streaming pipelines
Kunal Agrawal,Jeremy T. Fineman +1 more
TL;DR: This recursive algorithm is not parameterized by cache size, yet it achieves the asymptotically minimum number of cache misses with constant factor memory augmentation.
Proceedings ArticleDOI
Performance and Power Characteristics of Matrix Multiplication Algorithms on Multicore and Shared Memory Machines
TL;DR: This paper presented its studies of performance, cache behavior, and energy efficiency of multiple parallel matrix multiplication algorithms on a multicore desktop computer and a medium-size shared memory machine, both being considered as referenced sizes of nodes to create amedium- and largescale computational clusters for high performance computing used in industry and national laboratories.
Journal ArticleDOI
NUMA-aware multicore Matrix Multiplication
TL;DR: It is shown how proper memory mapping and scheduling manage to tune an existing matrix multiplication implementation and reduce the number of cache-misses and the gained speedup as a novel figure of merit to measure the quality of the method.
References
More filters
Book
Introduction to Algorithms
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Journal ArticleDOI
An algorithm for the machine calculation of complex Fourier series
J.W. Cooley,John W. Tukey +1 more
TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Book
Computer Architecture: A Quantitative Approach
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.