Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

A framework to enable applications executing within virtual machines to transparently share one or more GPUs is presented and it is found that even when contention is high the consolidation algorithm is effective in improving the throughput, and that the runtime overhead of the framework is low.

Abstract:

Driven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still at initial stages, and the challenge of making GPU a true shared resource in the cloud has not yet been addressed.This paper presents a framework to enable applications executing within virtual machines to transparently share one or more GPUs. Our contributions are twofold: we extend an open source GPU virtualization software to include efficient GPU sharing, and we propose solutions to the conceptual problem of GPU kernel consolidation. In particular, we introduce a method for computing the affinity score between two or more kernels, which provides an indication of potential performance improvements upon kernel consolidation. In addition, we explore molding as a means to achieve efficient GPU sharing also in the case of kernels with high or conflicting resource requirements. We use these concepts to develop an algorithm to efficiently map a set of kernels on a pair of GPUs. We extensively evaluate our framework using eight popular GPU kernels and two Fermi GPUs. We find that even when contention is high our consolidation algorithm is effective in improving the throughput, and that the runtime overhead of our framework is low.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Multi-tenancy on GPGPU-based servers

Dipanjan Sengupta,Raghavendra Belapure,Karsten Schwan +2 moreGeorgia Institute of Technology

Show Less

TL;DR: 'Rain', a system level abstraction for GPU "hyperthreading" that makes it possible to efficiently utilize GPUs without compromising fairness among multiple tenant applications, is proposed.

...read moreread less

Book ChapterDOI

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Raffaele Montella,Giuseppe Coviello,Giulio Giunta,Giuliano Laccetti,Florin Isaila,Javier Garcia Blas +5 moreApplied Science Private University,Complutense University of Madrid

Show Less

TL;DR: The generic virtualization service GVirtuS (Generic Virtualization Service), a framework for development of split-drivers for cloud virtualization solutions, focuses on GPU virtualization.

...read moreread less

Proceedings ArticleDOI

Energy-Aware Workload Consolidation on GPU

Dong Li,Surendra Byna,Srimat Chakradhar +2 moreOak Ridge National Laboratory,Lawrence Berkeley National Laboratory,Princeton University

Show Less

TL;DR: A novel runtime framework is developed that dynamically consolidates instances from different workloads from multiple user processes into a single GPU workload and provides 2X to 22X energy benefit over a multicore CPU.

...read moreread less

Proceedings ArticleDOI

COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors

Srihari Cadambi,Giuseppe Coviello,Cheng-Hong Li,Rajat Phull,Kunal Rao,Murugan Sankaradass,Srimat Chakradhar +6 morePrinceton University

Show Less

TL;DR: A new, user-level middleware called COSMIC is proposed that improves performance and reliability of multiprocessing on coprocessors like the Xeon Phi, and increases multiprocessioning reliability by exploiting programmer-specified per-processCoprocessor memory requirements to completely avoid memory oversubscription and crashes.

...read moreread less

Journal ArticleDOI

A Cloud Gaming System Based on User-Level Virtualization and Its Resource Scheduling

Youhui Zhang,Peng Qu,Jiang Cihang,Weimin Zheng +3 moreTsinghua University

- 01 May 2016 -

IEEE Transactions on Parallel and Distri...

Show Less

TL;DR: This paper proposes GCloud, a GPU/CPU hybrid cluster for cloud gaming based on the user-level virtualization technology, and presents a performance model to analyze the server-capacity and games' resource-consumptions, which categorizes games into two types: CPU-critical and memory-of-critical.

...read moreread less

1
2
3
…
4
5
6
7
8
9
10
…
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

The cost of doing science on the cloud: the Montage example

Ewa Deelman,Gurmeet Singh,Miron Livny,Bruce Berriman,John C. Good +4 moreUniversity of Wisconsin-Madison,California Institute of Technology

Show Less

TL;DR: Using the Amazon cloud fee structure and a real-life astronomy application, the cost performance tradeoffs of different execution and resource provisioning plans are studied and it is shown that by provisioning the right amount of storage and compute resources, cost can be significantly reduced with no significant impact on application performance.

...read moreread less

Proceedings ArticleDOI

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Chi-Keung Luk,Sunpyo Hong,Hyesoon Kim +2 moreIntel,Georgia Institute of Technology

Show Less

TL;DR: Adaptive mapping is proposed, a fully automatic technique to map computations to processing elements on a CPU+GPU machine and it is shown that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduced in energy consumption than static mappings on average for a set of important computation benchmarks.

...read moreread less

Proceedings ArticleDOI

Automated control of multiple virtualized resources

Pradeep Padala,Kai-Yuan Hou,Kang G. Shin,Xiaoyun Zhu,Mustafa Uysal,Zhikui Wang,Sharad Singhal,Arif Merchant +7 moreUniversity of Michigan,VMware,Hewlett-Packard

Show Less

TL;DR: Experimental evaluation with RUBiS and TPC-W benchmarks along with production-trace-driven workloads indicates that AutoControl can detect and mitigate CPU and disk I/O bottlenecks that occur over time and across multiple nodes by allocating each resource accordingly.

...read moreread less

Proceedings ArticleDOI

Cost-benefit analysis of Cloud Computing versus desktop grids

Derrick Kondo,Bahman Javadi,Paul Malecot,Franck Cappello,Dustin Anderson +4 moreFrench Institute for Research in Computer Science and Automation,University of California, Berkeley

Show Less

TL;DR: This work compares and contrast the performance and monetary cost-benefits of clouds for desktop grid applications, ranging in computational size and storage and examines performance measurements and monetary expenses of real desktop grids and the Amazon elastic compute cloud.

...read moreread less

Proceedings ArticleDOI

Accelerator: using data parallelism to program GPUs for general-purpose uses

David Tarditi,Sidd Puri,Jose M. Oglesby +2 moreMicrosoft

Show Less

TL;DR: This work describes Accelerator, a system that uses data parallelism to program GPUs for general-purpose uses instead of C, and compares the performance of Accelerator versions of the benchmarks against hand-written pixel shaders.

...read moreread less

1
2
3
4
…
5
6
7

Collapse

IEEE Transactions on Computers

Show Less

SciSpace

About Careers Resources Support Browse Papers Pricing SciSpace Affiliate Program Cancellation & Refund Policy Terms Privacy

Tools

Citation generator AI Detector Paraphraser Citation Booster

Extensions

SciSpace

Directories

Papers Topics Journals Authors Conferences Institutions Questions Citation Styles

Contact

support@typeset.io +91 8431021544

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Citations

Multi-tenancy on GPGPU-based servers

A general-purpose virtualization service for HPC on cloud computing: an application to GPUs

Energy-Aware Workload Consolidation on GPU

COSMIC: middleware for high performance and reliable multiprocessing on xeon phi coprocessors

A Cloud Gaming System Based on User-Level Virtualization and Its Resource Scheduling

References

The cost of doing science on the cloud: the Montage example

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

Automated control of multiple virtualized resources

Cost-benefit analysis of Cloud Computing versus desktop grids

Accelerator: using data parallelism to program GPUs for general-purpose uses

Related Papers (5)

GViM: GPU-accelerated virtual machines

A GPGPU transparent virtualization component for high performance computing clouds

rCUDA: Reducing the number of GPU-based accelerators in high performance clusters

Rodinia: A benchmark suite for heterogeneous computing

vCUDA: GPU-Accelerated High-Performance Computing in Virtual Machines