scispace - formally typeset
Proceedings ArticleDOI

Scheduling multi-tenant cloud workloads on accelerator-based systems

Reads0
Chats0
TLDR
The Strings scheduler realizes the vision of a dynamic model where GPUs are treated as first class schedulable entities by decomposing the GPU scheduling problem into a combination of load balancing and per-device scheduling.
Abstract
Accelerator-based systems are making rapid inroads into becoming platforms of choice for high end cloud services. There is a need therefore, to move from the current model in which high performance applications explicitly and programmatically select the GPU devices on which to run, to a dynamic model where GPUs are treated as first class schedulable entities. The Strings scheduler realizes this vision by decomposing the GPU scheduling problem into a combination of load balancing and per-device scheduling. (i) Device-level scheduling efficiently uses all of a GPU's hardware resources, including its computational and data movement engines, and (ii) load balancing goes beyond obtaining high throughput, to ensure fairness through prioritizing GPU requests that have attained least service. With its methods, Strings achieves improvements in system throughput and fairness of up to 8.70x and 13%, respectively, compared to the CUDA runtime.

read more

Citations
More filters
Proceedings ArticleDOI

IVM: a task-based shared memory programming model and runtime system to enable uniform access to CPU-GPU clusters

TL;DR: This work proposes a programming framework called Inter-node Virtual Memory (IVM) that provides the programmer with a uniform view of compute resources and memory spaces within a CPU-GPU cluster, and a mechanism to easily incorporate load balancing within the application.
Proceedings ArticleDOI

A systems perspective on GPU computing: a tribute to Karsten Schwan

TL;DR: His legacy of key research contributions in general-purpose GPU computing includes novel scheduling and resource management abstractions, runtime specialization, and novel data management techniques to support scalable, distributed GPU frameworks.
Proceedings ArticleDOI

Improving Application Concurrency on GPUs by Managing Implicit and Explicit Synchronizations

TL;DR: This work designs runtime mechanisms to bypass synchronizations in applications, along with a memory management scheme that can be integrated with these synchronization avoidance mechanisms to improve GPU utilization and system throughput, and integrates these mechanisms into a recently proposed GPU virtualization runtime named Sync-Free GPU (SF-GPU).
Dissertation

Enhancing manageability of execution and data for GPGPU computing

TL;DR: Symphony focuses on abstracting scheduling control at the granularity of thread blocks of application kernels, and uses the software supervisor approach where a supervisor kernel cooperates with the hardware thread block scheduler to schedule application kernels on the SMs.
Patent

Automatic localization of acceleration in edge computing environments

TL;DR: In this article, the use of the local acceleration circuitry or the remote acceleration resource is selected based on the estimated time and other appropriate factors in relation to a service level agreement, and an estimated time (and cost or other identifiable or estimateable considerations) to execute the function at the respective location is identified.
References
More filters
Posted Content

A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems

TL;DR: A quantitative measure called Indiex of FRairness, applicable to any resource sharing or allocation problem, which is independent of the amount of the resource, and boundedness aids intuitive understanding of the fairness index.
Journal ArticleDOI

StarPU: a unified platform for task scheduling on heterogeneous multicore architectures

TL;DR: StarPU as mentioned in this paper is a runtime system that provides a high-level unified execution model for numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware and easily develop and tune powerful scheduling algorithms.

A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems

TL;DR: Indiex of Fairness as mentioned in this paper is a quantitative measure that is applicable to any resource sharing or allocation problem, and it is independent of the amount of the resource and the fairness index always lies between 0 and 1.
Proceedings ArticleDOI

On micro-kernel construction

TL;DR: Contradictory to this belief, it is shown and support by documentary evidence that inefficiency and inflexibility of current μ-kernels is not inherited from the basic idea but mostly from overloading the kernel and/or from improper implementation.
Journal ArticleDOI

Symbiotic jobscheduling for a simultaneous multithreaded processor

TL;DR: It is demonstrated that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler, and that a small sample of the possible schedules is sufficient to identify a good schedule quickly.
Related Papers (5)