scispace - formally typeset
Journal ArticleDOI

Improving Memory Hierarchy Performance for Irregular Applications Using Data and Computation Reorderings

Reads0
Chats0
TLDR
In this article, the authors investigate the impact of reordering on data reuse at different levels in the memory hierarchy and introduce a new architecture-independent multi-level blocking strategy for irregular applications.
Abstract
The performance of irregular applications on modern computer systems is hurt by the wide gap between CPU and memory speeds because these applications typically under-utilize multi-level memory hierarchies, which help hide this gap. This paper investigates using data and computation reorderings to improve memory hierarchy utilization for irregular applications. We evaluate the impact of reordering on data reuse at different levels in the memory hierarchy. We focus on coordinated data and computation reordering based on space-filling curves and we introduce a new architecture-independent multi-level blocking strategy for irregular applications. For two particle codes we studied, the most effective reorderings reduced overall execution time by a factor of two and four, respectively. Preliminary experience with a scatter benchmark derived from a large unstructured mesh application showed that careful data and computation ordering reduced primary cache misses by a factor of two compared to a random ordering.

read more

Citations
More filters
Book

Data-Intensive Text Processing with MapReduce

TL;DR: This half-day tutorial introduces participants to data-intensive text processing with the MapReduce programming model using the open-source Hadoop implementation, with a focus on scalability and the tradeoffs associated with distributed processing of large datasets.
Journal ArticleDOI

Predicting whole-program locality through reuse distance analysis

TL;DR: In this paper, the authors show that profiling can also predict program locality for inputs other than profiled ones, where locality is defined by the distance of data reuse, which can reveal global patterns not apparent in short distance reuses or local control flow.
Journal ArticleDOI

High-throughput sequence alignment using Graphics Processing Units

TL;DR: MUMmerGPU is a low cost, ultra-fast sequence alignment program designed to handle the increasing volume of data produced by new, high-throughput sequencing technologies, and demonstrates that even memory-intensive applications can run significantly faster on the relatively low-cost GPU than on the CPU.
Proceedings ArticleDOI

Locality phase prediction

TL;DR: Compared with existing methods based on program code and execution intervals, locality phase prediction is unique because it uses locality profiles, and it marks phase boundaries in program code.
Journal ArticleDOI

Program locality analysis using reuse distance

TL;DR: Two techniques are presented, among the first to enable quantitative analysis of whole-program locality in general sequential code, that predict how the locality of a program changes with its input.
References
More filters
Journal ArticleDOI

CHARMM: A program for macromolecular energy, minimization, and dynamics calculations

TL;DR: The CHARMM (Chemistry at Harvard Macromolecular Mechanics) as discussed by the authors is a computer program that uses empirical energy functions to model macromolescular systems, and it can read or model build structures, energy minimize them by first- or second-derivative techniques, perform a normal mode or molecular dynamics simulation, and analyze the structural, equilibrium, and dynamic properties determined in these calculations.
Proceedings ArticleDOI

Reducing the bandwidth of sparse symmetric matrices

E. Cuthill, +1 more
TL;DR: A direct method of obtaining an automatic nodal numbering scheme to ensure that the corresponding coefficient matrix will have a narrow bandwidth is presented.
Related Papers (5)