M
Michael Bauer
Researcher at Nvidia
Publications - 23
Citations - 1277
Michael Bauer is an academic researcher from Nvidia. The author has contributed to research in topics: Runtime system & Programming paradigm. The author has an hindex of 14, co-authored 23 publications receiving 1099 citations. Previous affiliations of Michael Bauer include Stanford University.
Papers
More filters
Proceedings ArticleDOI
Legion: expressing locality and independence with logical regions
TL;DR: A runtime system that dynamically extracts parallelism from Legion programs, using a distributed, parallel scheduling algorithm that identifies both independent tasks and nested parallelism.
Proceedings ArticleDOI
CudaDMA: optimizing GPU memory bandwidth via warp specialization
TL;DR: This work proposes an approach for programming GPUs with tightly-coupled specialized DMA warps for performing memory transfers between on-chip and off-chip memories, and presents an extensible API, CudaDMA, that encapsulates synchronization and common sequential and strided data transfer patterns.
Proceedings ArticleDOI
Regent: a high-productivity programming language for HPC with logical regions
TL;DR: An optimizing compiler is presented that translates Regent programs into efficient implementations for Legion, an asynchronous task-based model and it is demonstrated that Regent achieves performance comparable to hand-tuned Legion.
Journal ArticleDOI
Dissecting the Molecular Relationship Among Various Cardiogenic Progenitor Cells
Devaveena Dey,Devaveena Dey,Leng Han,Michael Bauer,Fumihiro Sanada,Fumihiro Sanada,Angelos Oikonomopoulos,Toru Hosoda,Toru Hosoda,Kazumasa Unno,Patrícia de Almeida,Patrícia de Almeida,Annarosa Leri,Annarosa Leri,Joseph C. Wu,Joseph C. Wu +15 more
TL;DR: This study indicates that cardiac ckit+ cells represent the most primitive population in the rodent heart, and will serve as a useful reference to determine the molecular identity of progenitors used in future preclinical and clinical studies.
Proceedings ArticleDOI
Realm: an event-based low-level runtime for distributed memory architectures
TL;DR: An implementation of Realm is described that relies on a novel generational event data structure for efficiently handling large numbers of events in a distributed address space and demonstrates that Realm confers considerable latency hiding to clients, attaining significant speedups over traditional bulk-synchronous and independently optimized MPI codes.