scispace - formally typeset
Proceedings ArticleDOI

GPUShare: Fair-Sharing Middleware for GPU Clouds

Reads0
Chats0
TLDR
GPUShare is presented, a software-based mechanism that can yield a kernel before all of its threads have run, thus giving finer control over the time slice for which the GPU is allocated to a process and improves fair GPU sharing across tenants.
Abstract
Many new cloud-focused applications such as deeplearning and graph analytics have started to rely on the highcomputing throughput of GPUs, but cloud providers cannotcurrently support fine-grained time-sharing on GPUs to enablemulti-tenancy for these types of applications. Currently, schedulingis performed by the GPU driver in combination with ahardware thread dispatcher to maximize utilization. However, when multiple applications with contrasting kernel running timesand high-utilization of the GPU need to be co-located, thisapproach unduly favors one or more of the applications at theexpense of others. This paper presents GPUShare, a middleware solution forGPU fair sharing among high-utilization, long-running applications. It begins by analyzing the scenarios under which thecurrent driver-based multi-process scheduling fails, noting thatsuch scenarios are quite common. It then describes a softwarebasedmechanism that can yield a kernel before all of its threadshave run, thus giving finer control over the time slice for whichthe GPU is allocated to a process. In controlling time slices onthe GPU by yielding kernels, GPUShare improves fair GPUsharing across tenants and outperforms the CUDA driver byup to 45% for two tenants and by up to 89% for more thantwo tenants, while incurring a maximum overhead of only 12%.Additional improvements are obtained from having a centralscheduler that further smooths out disparities across tenants'GPU shares improving fair sharing by up to 92% for two tenantsand by up to 76% for more than two tenants.

read more

Citations
More filters
Proceedings ArticleDOI

CuMF_SGD: Parallelized Stochastic Gradient Descent for Matrix Factorization on GPUs

TL;DR: This paper first design high-performance GPU computation kernels that accelerate individual SGD updates by exploiting model parallelism, then design efficient schemes that parallelize SGD Updates by exploiting data parallelism and scales cuMF SGD to large data sets that cannot fit into one GPU's memory.
Proceedings ArticleDOI

Dynamic application reconfiguration on heterogeneous hardware

TL;DR: Through TornadoVM, a virtual machine capable of reconfiguring applications, at runtime, for hardware acceleration based on the currently available hardware resources, this paper introduces a new level of compilation in which applications can benefit from heterogeneous hardware.
Proceedings ArticleDOI

Wheel: Accelerating CNNs with Distributed GPUs via Hybrid Parallelism and Alternate Strategy

TL;DR: Wheel first partitions the layers of a CNN into two kinds of modules: convolutional module and fully-connected module, and deploys them following the proposed hybrid parallelism, which reduces the transmitted data and fully using GPUs simultaneously.
Proceedings ArticleDOI

GLoop: an event-driven runtime for consolidating GPGPU applications

TL;DR: GLoop is presented, which is a software runtime that enables us to consolidate GPGPU apps including GPU eaters including GLoop offers an event-driven programming model, which allows GLoop-based apps to inherit the GPU eater's high functionality while proportionally scheduling them on a shared GPU in an isolated manner.
References
More filters
Proceedings Article

The PageRank Citation Ranking : Bringing Order to the Web

TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
Posted Content

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe as discussed by the authors is a BSD-licensed C++ library with Python and MATLAB bindings for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Proceedings ArticleDOI

Caffe: Convolutional Architecture for Fast Feature Embedding

TL;DR: Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Journal ArticleDOI

Scheduling Algorithms for Multiprogramming in a Hard-Real-Time Environment

TL;DR: The problem of multiprogram scheduling on a single processor is studied from the viewpoint of the characteristics peculiar to the program functions that need guaranteed service and it is shown that an optimum fixed priority scheduler possesses an upper bound to processor utilization.
Book

Scheduling algorithms for multiprogramming in a hard real-time environment

TL;DR: In this paper, the problem of multiprogram scheduling on a single processor is studied from the viewpoint of the characteristics peculiar to the program functions that need guaranteed service, and it is shown that an optimum fixed priority scheduler possesses an upper bound to processor utilization which may be as low as 70 percent for large task sets.
Related Papers (5)