Proceedings ArticleDOI
Scheduling multi-tenant cloud workloads on accelerator-based systems
Dipanjan Sengupta,Anshuman Goswami,Karsten Schwan,Krishna Pallavi +3 more
- pp 513-524
Reads0
Chats0
TLDR
The Strings scheduler realizes the vision of a dynamic model where GPUs are treated as first class schedulable entities by decomposing the GPU scheduling problem into a combination of load balancing and per-device scheduling.Abstract:
Accelerator-based systems are making rapid inroads into becoming platforms of choice for high end cloud services. There is a need therefore, to move from the current model in which high performance applications explicitly and programmatically select the GPU devices on which to run, to a dynamic model where GPUs are treated as first class schedulable entities. The Strings scheduler realizes this vision by decomposing the GPU scheduling problem into a combination of load balancing and per-device scheduling. (i) Device-level scheduling efficiently uses all of a GPU's hardware resources, including its computational and data movement engines, and (ii) load balancing goes beyond obtaining high throughput, to ensure fairness through prioritizing GPU requests that have attained least service. With its methods, Strings achieves improvements in system throughput and fairness of up to 8.70x and 13%, respectively, compared to the CUDA runtime.read more
Citations
More filters
Proceedings ArticleDOI
IVM: a task-based shared memory programming model and runtime system to enable uniform access to CPU-GPU clusters
TL;DR: This work proposes a programming framework called Inter-node Virtual Memory (IVM) that provides the programmer with a uniform view of compute resources and memory spaces within a CPU-GPU cluster, and a mechanism to easily incorporate load balancing within the application.
Proceedings ArticleDOI
A systems perspective on GPU computing: a tribute to Karsten Schwan
TL;DR: His legacy of key research contributions in general-purpose GPU computing includes novel scheduling and resource management abstractions, runtime specialization, and novel data management techniques to support scalable, distributed GPU frameworks.
Proceedings ArticleDOI
Improving Application Concurrency on GPUs by Managing Implicit and Explicit Synchronizations
TL;DR: This work designs runtime mechanisms to bypass synchronizations in applications, along with a memory management scheme that can be integrated with these synchronization avoidance mechanisms to improve GPU utilization and system throughput, and integrates these mechanisms into a recently proposed GPU virtualization runtime named Sync-Free GPU (SF-GPU).
Dissertation
Enhancing manageability of execution and data for GPGPU computing
TL;DR: Symphony focuses on abstracting scheduling control at the granularity of thread blocks of application kernels, and uses the software supervisor approach where a supervisor kernel cooperates with the hardware thread block scheduler to schedule application kernels on the SMs.
Patent
Automatic localization of acceleration in edge computing environments
TL;DR: In this article, the use of the local acceleration circuitry or the remote acceleration resource is selected based on the estimated time and other appropriate factors in relation to a service level agreement, and an estimated time (and cost or other identifiable or estimateable considerations) to execute the function at the respective location is identified.
References
More filters
Posted Content
A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems
Raj Jain,Dah Ming Chiu,W. Hawe +2 more
TL;DR: A quantitative measure called Indiex of FRairness, applicable to any resource sharing or allocation problem, which is independent of the amount of the resource, and boundedness aids intuitive understanding of the fairness index.
Journal ArticleDOI
StarPU: a unified platform for task scheduling on heterogeneous multicore architectures
TL;DR: StarPU as mentioned in this paper is a runtime system that provides a high-level unified execution model for numerical kernel designers with a convenient way to generate parallel tasks over heterogeneous hardware and easily develop and tune powerful scheduling algorithms.
A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems
Raj Jain,Dah Ming Chiu,W. Hawe +2 more
TL;DR: Indiex of Fairness as mentioned in this paper is a quantitative measure that is applicable to any resource sharing or allocation problem, and it is independent of the amount of the resource and the fairness index always lies between 0 and 1.
Proceedings ArticleDOI
On micro-kernel construction
TL;DR: Contradictory to this belief, it is shown and support by documentary evidence that inefficiency and inflexibility of current μ-kernels is not inherited from the basic idea but mostly from overloading the kernel and/or from improper implementation.
Journal ArticleDOI
Symbiotic jobscheduling for a simultaneous multithreaded processor
Allan Snavely,Dean M. Tullsen +1 more
TL;DR: It is demonstrated that performance on a hardware multithreaded processor is sensitive to the set of jobs that are coscheduled by the operating system jobscheduler, and that a small sample of the possible schedules is sufficient to identify a good schedule quickly.