scispace - formally typeset
Proceedings ArticleDOI

Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Reads0
Chats0
TLDR
A framework to enable applications executing within virtual machines to transparently share one or more GPUs is presented and it is found that even when contention is high the consolidation algorithm is effective in improving the throughput, and that the runtime overhead of the framework is low.
Abstract
Driven by the emergence of GPUs as a major player in high performance computing and the rapidly growing popularity of cloud environments, GPU instances are now being offered by cloud providers. The use of GPUs in a cloud environment, however, is still at initial stages, and the challenge of making GPU a true shared resource in the cloud has not yet been addressed.This paper presents a framework to enable applications executing within virtual machines to transparently share one or more GPUs. Our contributions are twofold: we extend an open source GPU virtualization software to include efficient GPU sharing, and we propose solutions to the conceptual problem of GPU kernel consolidation. In particular, we introduce a method for computing the affinity score between two or more kernels, which provides an indication of potential performance improvements upon kernel consolidation. In addition, we explore molding as a means to achieve efficient GPU sharing also in the case of kernels with high or conflicting resource requirements. We use these concepts to develop an algorithm to efficiently map a set of kernels on a pair of GPUs. We extensively evaluate our framework using eight popular GPU kernels and two Fermi GPUs. We find that even when contention is high our consolidation algorithm is effective in improving the throughput, and that the runtime overhead of our framework is low.

read more

Citations
More filters
Proceedings ArticleDOI

Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

TL;DR: This paper defines Planaria1, a microarchitectural capability that can dynamically fission (break) into multiple smaller yet full-fledged DNN engines at runtime that enables spatially co-locating multiple DNN inference services on the same hardware, offering simultaneous multi-tenant DNN acceleration.
Journal ArticleDOI

VGRIS: Virtualized GPU Resource Isolation and Scheduling in Cloud Gaming

TL;DR: VGRIS, a resource management framework for virtualized GPU resource isolation and scheduling in cloud gaming, is proposed and experimental results show that VGRIS can effectively schedule GPU resources among various workloads.
Proceedings ArticleDOI

A virtual memory based runtime to support multi-tenancy in clusters with GPUs

TL;DR: This paper proposes a runtime system that provides abstraction and sharing of GPUs, while allowing isolation of concurrent applications, and a central component of this runtime is a memory manager that provides a virtual memory abstraction to the applications.
Book

Enabling Real-Time Mobile Cloud Computing through Emerging Technologies

Tolga Soyata
TL;DR: Using Adobe Reader is the easiest way to submit your proposed amendments for your IGI Global proof and makes it simple for you, the contributor, to mark up the PDF.
Proceedings ArticleDOI

pvFPGA: accessing an FPGA-based hardware accelerator in a paravirtualized environment

TL;DR: The pvFPGA as mentioned in this paper is the first system design solution for virtualizing an FPGA-based hardware accelerator on the x86 platform, where each unprivileged domain allocates a shared data pool for both user-kernel and inter-domain data transfer.
References
More filters
Proceedings ArticleDOI

The cost of doing science on the cloud: the Montage example

TL;DR: Using the Amazon cloud fee structure and a real-life astronomy application, the cost performance tradeoffs of different execution and resource provisioning plans are studied and it is shown that by provisioning the right amount of storage and compute resources, cost can be significantly reduced with no significant impact on application performance.
Proceedings ArticleDOI

Qilin: exploiting parallelism on heterogeneous multiprocessors with adaptive mapping

TL;DR: Adaptive mapping is proposed, a fully automatic technique to map computations to processing elements on a CPU+GPU machine and it is shown that, by judiciously distributing works over the CPU and GPU, automatic adaptive mapping achieves a 25% reduction in execution time and a 20% reduced in energy consumption than static mappings on average for a set of important computation benchmarks.
Proceedings ArticleDOI

Automated control of multiple virtualized resources

TL;DR: Experimental evaluation with RUBiS and TPC-W benchmarks along with production-trace-driven workloads indicates that AutoControl can detect and mitigate CPU and disk I/O bottlenecks that occur over time and across multiple nodes by allocating each resource accordingly.
Proceedings ArticleDOI

Cost-benefit analysis of Cloud Computing versus desktop grids

TL;DR: This work compares and contrast the performance and monetary cost-benefits of clouds for desktop grid applications, ranging in computational size and storage and examines performance measurements and monetary expenses of real desktop grids and the Amazon elastic compute cloud.
Proceedings ArticleDOI

Accelerator: using data parallelism to program GPUs for general-purpose uses

TL;DR: This work describes Accelerator, a system that uses data parallelism to program GPUs for general-purpose uses instead of C, and compares the performance of Accelerator versions of the benchmarks against hand-written pixel shaders.
Related Papers (5)