Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning

doi:10.1109/IPDPS.2016.73

Proceedings ArticleDOI

Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning

Yash Ukidave, +2 more

- pp 353-362

Chats0

TLDR

Mystic, an interference-aware scheduler for efficient co-execution of applications on GPU-based clusters and cloud servers is presented, which identifies the similarities between new applications and the executing applications, and guides the scheduler to minimize the interference and improve system throughput.

Abstract:

GPUs have become the primary choice of accelerators for high-end data centers and cloud servers, which can host thousands of disparate applications. With the growing demands for GPUs on clusters, there arises a need for efficient co-execution of applications on the same accelerator device. However, the resource contention among co-executing applications causes interference which leads to degradation in execution performance, impacts QoS requirements of applications and lowers overall system throughput. While previous work has proposed techniques for detecting interference, the existing solutions are either developed for CPU clusters, or use static profiling approaches which can be computationally intensive and do not scale well. We present Mystic, an interference-aware scheduler for efficient co-execution of applications on GPU-based clusters and cloud servers. The most important feature of Mystic is the use of learning-based analytical models for detecting interference between applications. We leverage a collaborative filtering framework to characterize an incoming application with respect to the interference it may cause when co-executing with other applications while sharing GPU resources. Mystic identifies the similarities between new applications and the executing applications, and guides the scheduler to minimize the interference and improve system throughput. We train the learning model with 42 CUDA applications, and consider another separate set of 55 diverse, real-world GPU applications for evaluation. Mystic is evaluated on a live GPU cluster with 32 NVIDIA GPUs. Our framework achieves performance guarantees for 90.3% of the evaluated applications. When compared with state-of-the art interference-oblivious schedulers, Mystic improves the system throughput by 27.5% on average, and achieves a 16.3% improvement on average in GPU utilization.

Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning

Citations

Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

Quality of Service Support for Fine-Grained Sharing on GPUs

BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction Services

Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

References

Matrix Factorization Techniques for Recommender Systems

Industry Report: Amazon.com Recommendations: Item-to-Item Collaborative Filtering.

A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

Amazon.com recommendations: item-to-item collaborative filtering

Related Papers (5)

Baymax: QoS Awareness and Increased Utilization for Non-Preemptive Accelerators in Warehouse Scale Computers

Enabling preemptive multiprogramming on GPUs

Prophet: Precise QoS Prediction on Non-Preemptive Accelerators to Improve Utilization in Warehouse-Scale Computers

Simultaneous Multikernel GPU: Multi-tasking throughput processors via fine-grained sharing

Chimera: Collaborative Preemption for Multitasking on a Shared GPU