scispace - formally typeset
M

Michael Bauer

Researcher at Nvidia

Publications -  23
Citations -  1277

Michael Bauer is an academic researcher from Nvidia. The author has contributed to research in topics: Runtime system & Programming paradigm. The author has an hindex of 14, co-authored 23 publications receiving 1099 citations. Previous affiliations of Michael Bauer include Stanford University.

Papers
More filters
Proceedings ArticleDOI

Legion: expressing locality and independence with logical regions

TL;DR: A runtime system that dynamically extracts parallelism from Legion programs, using a distributed, parallel scheduling algorithm that identifies both independent tasks and nested parallelism.
Proceedings ArticleDOI

CudaDMA: optimizing GPU memory bandwidth via warp specialization

TL;DR: This work proposes an approach for programming GPUs with tightly-coupled specialized DMA warps for performing memory transfers between on-chip and off-chip memories, and presents an extensible API, CudaDMA, that encapsulates synchronization and common sequential and strided data transfer patterns.
Proceedings ArticleDOI

Regent: a high-productivity programming language for HPC with logical regions

TL;DR: An optimizing compiler is presented that translates Regent programs into efficient implementations for Legion, an asynchronous task-based model and it is demonstrated that Regent achieves performance comparable to hand-tuned Legion.
Journal ArticleDOI

Dissecting the Molecular Relationship Among Various Cardiogenic Progenitor Cells

TL;DR: This study indicates that cardiac ckit+ cells represent the most primitive population in the rodent heart, and will serve as a useful reference to determine the molecular identity of progenitors used in future preclinical and clinical studies.
Proceedings ArticleDOI

Realm: an event-based low-level runtime for distributed memory architectures

TL;DR: An implementation of Realm is described that relies on a novel generational event data structure for efficiently handling large numbers of events in a distributed address space and demonstrates that Realm confers considerable latency hiding to clients, attaining significant speedups over traditional bulk-synchronous and independently optimized MPI codes.