Showing papers by "Michael Garland published in 2022"

PDF

Open Access

Journal Article•DOI•

BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage

[...]

Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeon-Su Park, Jinjun Xiong, CJ Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-mei W. Hwu - Show less +9 more

arXiv.org

TL;DR: It is shown that the BaM infrastructure software running on GPUs can identify and communicate the fine-grain accesses at a sufficiently high rate to fully utilize the underlying storage devices, and even with consumer-grade SSDs, a BaM system can support application performance that is competitive against a much more expensive DRAM-only solution and the reduction in I/O amplification can yield significant performance benefit.

...read moreread less

Abstract: —Accelerators like Graphics Processing Units (GPUs) have been increasingly deployed in modern data centers because of their compute capabilities and memory bandwidth. These accelerators have traditionally relied on the “application host code” and the OS running on the CPU to orchestrate their accesses to the data storage devices. CPU orchestration of storage data accesses works well for classic GPU applications, like dense neural network training, where data access patterns are predeﬁned, regular, dense, and independent of the data values, enabling the CPU to partition the storage data into coarse-grain chunks and coordinate the storage device accesses and data transfers to the accelerators. Unfortunately, such a CPU-centric strategy causes excessive CPU-GPU synchronization overhead and/or I/O trafﬁc ampliﬁcation, diminishing the effective storage bandwidth for emerging applications with ﬁne-grain data-dependent access patterns like graph and data analytics, recommender systems, and graph neural networks. In this work, we make a case for enabling GPUs to orchestrate high-throughput, ﬁne-grain accesses into NVMe Solid State Drives (SSDs) in a new system architecture called BaM.

...read moreread less

6 citations

Proceedings Article•DOI•

GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture

[...]

Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeon-Su Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-mei W. Hwu - Show less +9 more

09 Mar 2022

TL;DR: BaM as mentioned in this paper proposes a fine-grained software cache to coalesce data storage requests while minimizing I/O traffic amplification, which is well suited for GPU applications with known data access patterns that enable partitioning of their dataset to be processed in a pipelined fashion.

...read moreread less

Abstract: Graphics Processing Units (GPUs) have traditionally relied on the host CPU to initiate access to the data storage. This approach is well-suited for GPU applications with known data access patterns that enable partitioning of their dataset to be processed in a pipelined fashion in the GPU. However, emerging applications such as graph and data analytics, recommender systems, or graph neural networks, require fine-grained, data-dependent access to storage. CPU initiation of storage access is unsuitable for these applications due to high CPU-GPU synchronization overheads, I/O traffic amplification, and long CPU processing latencies. GPU-initiated storage removes these overheads from the storage control path and, thus, can potentially support these applications at much higher speed. However, there is a lack of systems architecture and software stack that enable efficient GPU-initiated storage access. This work presents a novel system architecture, BaM, that fills this gap. BaM features a fine-grained software cache to coalesce data storage requests while minimizing I/O traffic amplification. This software cache communicates with the storage system via high-throughput queues that enable the massive number of concurrent threads in modern GPUs to make I/O requests at a high rate to fully utilize the storage devices and the system interconnect. Experimental results show that BaM delivers 1.0x and 1.49x end-to-end speed up for BFS and CC graph analytics benchmarks while reducing hardware costs by up to 21.7x over accessing the graph data from the host memory. Furthermore, BaM speeds up data-analytics workloads by 5.3x over CPU-initiated storage access on the same hardware.

...read moreread less

2 citations

Journal Article•DOI•

Efficient Sparsely Activated Transformers

[...]

Salar Latifi, Saurav Muralidharan, Michael Garland

31 Aug 2022-arXiv.org

TL;DR: A novel system named PLANER is introduced that takes an existing Transformer-based network and a user-deﬁned latency target and produces an optimized, sparsely-activated version of the original network that tries to meet the latency target while maintaining baseline accuracy.

...read moreread less

Abstract: Transformer-based neural networks have achieved state-of-the-art task performance in a number of machine learning domains including natural language processing and computer vision. To further improve their accuracy, recent work has explored the integration of dynamic behavior into these networks in the form of mixture-of-expert (MoE) layers. In this paper, we explore the introduction of MoE layers to optimize a different metric: inference latency. We introduce a novel system named PLANER that takes an existing Transformer-based network and a user-deﬁned latency target and produces an optimized, sparsely-activated version of the original network that tries to meet the latency target while maintaining baseline accuracy. We evaluate PLANER on two real-world language modeling tasks using the Transformer-XL network and achieve inference latency reductions of over 2x at iso-accuracy.

...read moreread less

1 citations

Journal Article•DOI•

A Blueprint for Cogeneration at State Facilities

[...]

Michael Garland¹•Institutions (1)

Nvidia (United States)¹

04 Apr 2022-Distributed generation & alternative energy journal

TL;DR: Brown as discussed by the authors challenged utilities and indus-tries to develop 6,000 megawatts (MW) of electricity in California during the 1980s through cogeneration, and also announced that the state will take the lead by beginning immediately to develop 400 MW ofcogeneration at state facilities.

...read moreread less

Abstract: At the June 3, 1980, meeting of the Governor’s Cogeneration TaskForce, Governor Edmund G. Brown Jr., challenged utilities and indus-tries to develop 6,000 megawatts (MW) of electricity in California duringthe 1980s through cogeneration. The Governor also announced that thestate will take the lead by beginning immediately to develop 400 MW ofcogeneration at state facilities.As a first step, the Governor requested the Department of GeneralServices, the Office of Appropriate Technology, and the Department ofWater Resources to prepare a blueprint for developing this capacity. TheGovernor called for identification of feasible cogeneration projects thatcan be implemented without delay; establishment of an overall time-table for additional planning, feasibility studies, design and construc-tion; and a discussion of potential sources of funds.Since the Governor’s announcement, the California Energy Com-mission, the Department of General Services, the Office of AppropriateTechnology, the University of California, and the state university andcolleges, with the cooperation of the Departments of DevelopmentalServices, Mental Health, Corrections, and Health Services have initiated,continued work on, or completed feasibility studies or engineering de-sign work for state facilities totaling more than 177 MW of cogenerationcapacity. The Department of Water Resources has conducted preliminarysite investigations at thirteen additional state facilities.

...read moreread less