T
Todd C. Mowry
Researcher at Carnegie Mellon University
Publications - 117
Citations - 9806
Todd C. Mowry is an academic researcher from Carnegie Mellon University. The author has contributed to research in topics: Cache & Compiler. The author has an hindex of 49, co-authored 113 publications receiving 9137 citations. Previous affiliations of Todd C. Mowry include University of Toronto & Stanford University.
Papers
More filters
ReportDOI
Compiler and Hardware Support for Automatic Instruction Prefetching: A Cooperative Approach
Todd C. Mowry,Chi-Keung Luk +1 more
TL;DR: A novel compiler algorithm which automatically inserts instruction-prefetch instructions into the executable to prefetch the targets of control transfers far enough in advance is proposed, which results in speedups ranging from 9.4% to 18.5% over the original execution time on an out-of-order superscalar processor.
Book ChapterDOI
A framework for accelerating bottlenecks in GPU execution with assist warps
Nandita Vijaykumar,Gennady Pekhimenko,Adwait Jog,Saugata Ghose,Abhishek Bhowmick,Rachata Ausavarungnirun,Chita R. Das,Mahmut Kandemir,Todd C. Mowry,Onur Mutlu +9 more
TL;DR: Modern graphics processing units (GPUs) are well provisioned to support the concurrent execution of thousands of threads, but different bottlenecks during execution and heterogeneous application requirements create imbalances in utilization of resources in the cores.
Proceedings ArticleDOI
A programmable, energy-minimal dataflow compiler and architecture
Graham Gobieski,Souradip Ghosh,Marijn J. H. Heule,Todd C. Mowry,Tony Nowatzki,Nathan Beckmann,Brandon Lucia +6 more
TL;DR: RipTide is a co-designed compiler and CGRA architecture that achieves both high programmability and extreme energy efficiency, eliminating an Amdahl efficiency bottleneck.
Journal ArticleDOI
Understanding why correlation profiling improves the predictability of data cache misses in nonnumeric applications
Todd C. Mowry,Chi-Keung Luk +1 more
TL;DR: A new profiling technique is proposed and evaluated that helps predict which dynamic instances of a static memory reference will hit or miss in the cache: correlation profiling and it is demonstrated that software prefetching can achieve better performance on a modern superscalar processor when directed by correlation profiling rather than summary profiling information.
Redesigning database systems in light of cpu cache prefetching
TL;DR: This thesis investigates a different approach: reducing the impact of cache misses through a technique called cache prefetching, and presents a novel algorithm, Inspector Joins, that exploits the free information obtained from one pass of the hash join algorithm to improve the performance of a later pass.