Journal ArticleDOI
Cache-Oblivious Algorithms
TLDR
It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.Abstract:
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size M and cache-line length B where M = Ω(B2), the number of cache misses for an m × n matrix transpose is Θ(1 + mn/B). The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ(1 + (n/B)(1 + logM n)). We also give a Θ(mnp)-work algorithm to multiply an m × n matrix by an n × p matrix that incurs Θ(1 + (mn + np + mp)/B + mnp/B√M) cache faults.We introduce an “ideal-cache” model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement. We offer empirical evidence that cache-oblivious algorithms perform well in practice.read more
Citations
More filters
Proceedings ArticleDOI
Cache-Adaptive Exploration: Experimental Results and Scan-Hiding for Adaptivity
TL;DR: In this paper, the authors introduce scan hiding, a technique for converting a class of non-cache-adaptive algorithms with linear scans to optimally cacheadaptive variants, based on a concrete example of scan-hiding on Strassen's algorithm.
Dissertation
Vers des noyaux de calcul intensif pérennes
TL;DR: The MultiTarget Parallel Skeleton (MTPS) as mentioned in this paper is a multilevel parallel skeleton that enables the construction of multicible code for Legolas++, a multicible implementation of Legolas.
Book
Presburger Arithmetic and its use in verification
TL;DR: Presburger Arithmetic and its use in verification and the requirement of taking advantage of these computing powers becomes critical when every computer has gone multicore.
Journal ArticleDOI
Multiple pattern matching for network security applications: Acceleration through vectorization
Charalampos Stylianopoulos,Magnus Almgren,Olaf Landsiedel,Olaf Landsiedel,Marina Papatriantafilou +4 more
TL;DR: This paper presents efficient algorithmic designs that achieve good cache locality and make use of modern vectorization techniques to utilize data parallelism within each core, and complements their algorithms with an analytical model that predicts their performance and that can be used to easily evaluate alternative designs.
Proceedings ArticleDOI
Improved space bounds for cache-oblivious range reporting
Peyman Afshani,Norbert Zeh +1 more
TL;DR: Improved bounds on the size of cache-oblivious range reporting data structures that achieve the optimal query bound of O(logB N + + K/B/B block transfers + Ω(N/i) space, thereby improving on a recent lower bound for the same problem.
References
More filters
Book
Introduction to Algorithms
TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Journal ArticleDOI
An algorithm for the machine calculation of complex Fourier series
J.W. Cooley,John W. Tukey +1 more
TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Book
Computer Architecture: A Quantitative Approach
TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.