scispace - formally typeset
Journal ArticleDOI

Cache-Oblivious Algorithms

TLDR
It is proved that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement.
Abstract
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms, these algorithms are cache oblivious: no variables dependent on hardware parameters, such as cache size and cache-line length, need to be tuned to achieve optimality. Nevertheless, these algorithms use an optimal amount of work and move data optimally among multiple levels of cache. For a cache with size M and cache-line length B where M = Ω(B2), the number of cache misses for an m × n matrix transpose is Θ(1 + mn/B). The number of cache misses for either an n-point FFT or the sorting of n numbers is Θ(1 + (n/B)(1 + logM n)). We also give a Θ(mnp)-work algorithm to multiply an m × n matrix by an n × p matrix that incurs Θ(1 + (mn + np + mp)/B + mnp/B√M) cache faults.We introduce an “ideal-cache” model to analyze our algorithms. We prove that an optimal cache-oblivious algorithm designed for two levels of memory is also optimal for multiple levels and that the assumption of optimal replacement in the ideal-cache model can be simulated efficiently by LRU replacement. We offer empirical evidence that cache-oblivious algorithms perform well in practice.

read more

Citations
More filters
Proceedings ArticleDOI

CilkSpec: optimistic concurrency for Cilk

TL;DR: A speculation-based approach to alleviate the concurrency constraints imposed by such recursive parallel programs by designing a runtime infrastructure that supports speculative execution and a predictor to accurately learn and identify opportunities to relax extraneous concurrence constraints.
Journal ArticleDOI

Ideal and Predictable Hit Ratio for Matrix Transposition in Data Caches

TL;DR: The analytical hit/miss assessment enables the usage of a data cache for matrix transposition in real-time systems, since the number of misses in the worst case is bound and the energy consumption and pollution to other computations are reduced.
Proceedings ArticleDOI

Improving the Space-Time Efficiency of Matrix Multiplication Algorithms

TL;DR: This study gives out sub-linear time, optimal work, space and caching algorithms for both general matrix multiplication on a semiring and Strassen-like fast algorithms on a ring for cache-oblivious parallel algorithms.

Machine learning techniques for code generation and optimization

TL;DR: Algorithms generated using the approach presented in this thesis are quite effective at taking into account the complex interactions between architectural and input data characteristics and that the resulting code performs significantly better than conventional sorting implementations and the code generated by the earlier study.
Proceedings ArticleDOI

Dynamically generating FFT code on mobile devices

TL;DR: The results of benchmarks on Apple A4, A6, Nvidia Tegra3 and Samsung Exynos4 based devices show that disabling dynamic code generation in FFTS decreases performance by as much as 25%, depending on the device and the parameters of the transform.
References
More filters
Book

Matrix computations

Gene H. Golub
Book

Introduction to Algorithms

TL;DR: The updated new edition of the classic Introduction to Algorithms is intended primarily for use in undergraduate or graduate courses in algorithms or data structures and presents a rich variety of algorithms and covers them in considerable depth while making their design and analysis accessible to all levels of readers.
Journal ArticleDOI

An algorithm for the machine calculation of complex Fourier series

TL;DR: Good generalized these methods and gave elegant algorithms for which one class of applications is the calculation of Fourier series, applicable to certain problems in which one must multiply an N-vector by an N X N matrix which can be factored into m sparse matrices.
Book

Computer Architecture: A Quantitative Approach

TL;DR: This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today.