Kaushik Datta

Proceedings ArticleDOI

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

TL;DR: This work explores multicore stencil (nearest-neighbor) computations --- a class of algorithms at the heart of many structured grid codes, including PDF solvers, and develops a number of effective optimization strategies, and builds an auto-tuning environment that searches over these strategies to minimize runtime, while maximizing performance portability.

...read moreread less

Journal ArticleDOI

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

Kaushik Datta, +5 more

- 01 Feb 2009 -

Siam Review

TL;DR: Results demonstrate that recent trends in memory system organization have reduced the eﬁcacy of traditional cache- blocking optimizations, and represent one of the most extensive analyses of stencil optimizations and performance modeling to date.

...read moreread less

Proceedings ArticleDOI

Productivity and performance using partitioned global address space languages

Katherine Yelick, +15 more

TL;DR: Two related projects, the Titanium and UPC projects, combine compiler, runtime, and application efforts to demonstrate some of the performance and productivity advantages to these languages.

...read moreread less

Proceedings ArticleDOI

Implicit and explicit optimizations for stencil computations

Shoaib Kamil, +5 more

TL;DR: Several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor are examined, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure.

...read moreread less

Proceedings Article

A case for machine learning to optimize multicore performance

Archana Ganapathi, +3 more

TL;DR: The opportunity for using machine learning in multicore autotuning is even more promising than the successes to date in the systems literature, and state-of-the-art machine learning techniques are applied to explore this space more intelligently.

...read moreread less

Papers

Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures

Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors

Productivity and performance using partitioned global address space languages

Implicit and explicit optimizations for stencil computations

A case for machine learning to optimize multicore performance