K
Kaushik Datta
Researcher at University of California, Berkeley
Publications - 16
Citations - 1530
Kaushik Datta is an academic researcher from University of California, Berkeley. The author has contributed to research in topics: Cache & Stencil. The author has an hindex of 12, co-authored 16 publications receiving 1497 citations.
Papers
More filters
Proceedings ArticleDOI
Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures
Kaushik Datta,Mark Murphy,Vasily Volkov,Samuel Williams,Jonathan Carter,Leonid Oliker,David A. Patterson,John Shalf,Katherine Yelick +8 more
TL;DR: This work explores multicore stencil (nearest-neighbor) computations --- a class of algorithms at the heart of many structured grid codes, including PDF solvers, and develops a number of effective optimization strategies, and builds an auto-tuning environment that searches over these strategies to minimize runtime, while maximizing performance portability.
Journal ArticleDOI
Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors
TL;DR: Results demonstrate that recent trends in memory system organization have reduced the eficacy of traditional cache- blocking optimizations, and represent one of the most extensive analyses of stencil optimizations and performance modeling to date.
Proceedings ArticleDOI
Productivity and performance using partitioned global address space languages
Katherine Yelick,Dan Bonachea,Wei-Yu Chen,Phillip Colella,Kaushik Datta,Jason Duell,Susan L. Graham,Paul Hargrove,Paul N. Hilfinger,Parry Husbands,Costin Iancu,Amir Kamil,Rajesh Nishtala,Jimmy Su,Michael Welcome,Tong Wen +15 more
TL;DR: Two related projects, the Titanium and UPC projects, combine compiler, runtime, and application efforts to demonstrate some of the performance and productivity advantages to these languages.
Proceedings ArticleDOI
Implicit and explicit optimizations for stencil computations
TL;DR: Several optimizations on both the conventional cache-based memory systems of the Itanium 2, Opteron, and Power5, as well as the heterogeneous multicore design of the Cell processor are examined, including both an implicit cache oblivious approach and a cache-aware algorithm blocked to match the cache structure.
Proceedings Article
A case for machine learning to optimize multicore performance
TL;DR: The opportunity for using machine learning in multicore autotuning is even more promising than the successes to date in the systems literature, and state-of-the-art machine learning techniques are applied to explore this space more intelligently.