scispace - formally typeset
Proceedings ArticleDOI

Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning

Reads0
Chats0
TLDR
Mystic, an interference-aware scheduler for efficient co-execution of applications on GPU-based clusters and cloud servers is presented, which identifies the similarities between new applications and the executing applications, and guides the scheduler to minimize the interference and improve system throughput.
Abstract
GPUs have become the primary choice of accelerators for high-end data centers and cloud servers, which can host thousands of disparate applications. With the growing demands for GPUs on clusters, there arises a need for efficient co-execution of applications on the same accelerator device. However, the resource contention among co-executing applications causes interference which leads to degradation in execution performance, impacts QoS requirements of applications and lowers overall system throughput. While previous work has proposed techniques for detecting interference, the existing solutions are either developed for CPU clusters, or use static profiling approaches which can be computationally intensive and do not scale well. We present Mystic, an interference-aware scheduler for efficient co-execution of applications on GPU-based clusters and cloud servers. The most important feature of Mystic is the use of learning-based analytical models for detecting interference between applications. We leverage a collaborative filtering framework to characterize an incoming application with respect to the interference it may cause when co-executing with other applications while sharing GPU resources. Mystic identifies the similarities between new applications and the executing applications, and guides the scheduler to minimize the interference and improve system throughput. We train the learning model with 42 CUDA applications, and consider another separate set of 55 diverse, real-world GPU applications for evaluation. Mystic is evaluated on a live GPU cluster with 32 NVIDIA GPUs. Our framework achieves performance guarantees for 90.3% of the evaluated applications. When compared with state-of-the art interference-oblivious schedulers, Mystic improves the system throughput by 27.5% on average, and achieves a 16.3% improvement on average in GPU utilization.

read more

Citations
More filters
Proceedings ArticleDOI

Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks

TL;DR: This paper defines Planaria1, a microarchitectural capability that can dynamically fission (break) into multiple smaller yet full-fledged DNN engines at runtime that enables spatially co-locating multiple DNN inference services on the same hardware, offering simultaneous multi-tenant DNN acceleration.
Proceedings ArticleDOI

Quality of Service Support for Fine-Grained Sharing on GPUs

TL;DR: This work proposes QoS mechanisms for a fine-grained form of GPU sharing that can provide control over the progress of kernels on a per cycle basis and the amount of thread-level parallelism of each kernel.
Proceedings ArticleDOI

BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction Services

TL;DR: This work presents a distributed and scalable deep-learning prediction serving system called Barista, and proposes an intelligent agent to allocate and manage the compute resources by horizontal and vertical scaling to maintain the required prediction latency.
Proceedings ArticleDOI

Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management

TL;DR: This work proposes new application-aware TLP management techniques for a multi-application execution environment such that all co-scheduled applications can make good and judicious use of all the shared resources, and proposes an application-level utility metric, called effective bandwidth, which accounts for two runtime metrics: attained DRAM bandwidth and cache miss rates.
Journal ArticleDOI

Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems

TL;DR: In this article, an interference-aware and prediction-based resource manager for DL systems is proposed, which proactively predicts GPU utilization of heterogeneous DL jobs extrapolated from the DL model's computation graph features, removing the need for online profiling and isolated reserved GPUs.
References
More filters
Journal ArticleDOI

Matrix Factorization Techniques for Recommender Systems

TL;DR: As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.
Journal Article

Industry Report: Amazon.com Recommendations: Item-to-Item Collaborative Filtering.

TL;DR: This work compares three common approaches to solving the recommendation problem: traditional collaborative filtering, cluster models, and search-based methods, and their algorithm, which is called item-to-item collaborative filtering.
Posted Content

A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems

TL;DR: A quantitative measure called Indiex of FRairness, applicable to any resource sharing or allocation problem, which is independent of the amount of the resource, and boundedness aids intuitive understanding of the fairness index.
Journal ArticleDOI

Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

TL;DR: This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings and proposes three effective and efficient techniques for obtaining high-quality combiners (consensus functions).
Journal ArticleDOI

Amazon.com recommendations: item-to-item collaborative filtering

TL;DR: Item-to-item collaborative filtering (ITF) as mentioned in this paper is a popular recommendation algorithm for e-commerce Web sites that scales independently of the number of customers and number of items in the product catalog.
Related Papers (5)