Proceedings ArticleDOI
Mystic: Predictive Scheduling for GPU Based Cloud Servers Using Machine Learning
Yash Ukidave,Xiangyu Li,David Kaeli +2 more
- pp 353-362
Reads0
Chats0
TLDR
Mystic, an interference-aware scheduler for efficient co-execution of applications on GPU-based clusters and cloud servers is presented, which identifies the similarities between new applications and the executing applications, and guides the scheduler to minimize the interference and improve system throughput.Abstract:
GPUs have become the primary choice of accelerators for high-end data centers and cloud servers, which can host thousands of disparate applications. With the growing demands for GPUs on clusters, there arises a need for efficient co-execution of applications on the same accelerator device. However, the resource contention among co-executing applications causes interference which leads to degradation in execution performance, impacts QoS requirements of applications and lowers overall system throughput. While previous work has proposed techniques for detecting interference, the existing solutions are either developed for CPU clusters, or use static profiling approaches which can be computationally intensive and do not scale well. We present Mystic, an interference-aware scheduler for efficient co-execution of applications on GPU-based clusters and cloud servers. The most important feature of Mystic is the use of learning-based analytical models for detecting interference between applications. We leverage a collaborative filtering framework to characterize an incoming application with respect to the interference it may cause when co-executing with other applications while sharing GPU resources. Mystic identifies the similarities between new applications and the executing applications, and guides the scheduler to minimize the interference and improve system throughput. We train the learning model with 42 CUDA applications, and consider another separate set of 55 diverse, real-world GPU applications for evaluation. Mystic is evaluated on a live GPU cluster with 32 NVIDIA GPUs. Our framework achieves performance guarantees for 90.3% of the evaluated applications. When compared with state-of-the art interference-oblivious schedulers, Mystic improves the system throughput by 27.5% on average, and achieves a 16.3% improvement on average in GPU utilization.read more
Citations
More filters
Proceedings ArticleDOI
Planaria: Dynamic Architecture Fission for Spatial Multi-Tenant Acceleration of Deep Neural Networks
Soroush Ghodrati,Byung Hoon Ahn,Joon Kyung Kim,Sean Kinzer,Brahmendra Reddy Yatham,Navateja Alla,Hardik Sharma,Mohammad Alian,Eiman Ebrahimi,Nam Sung Kim,Cliff Young,Hadi Esmaeilzadeh +11 more
TL;DR: This paper defines Planaria1, a microarchitectural capability that can dynamically fission (break) into multiple smaller yet full-fledged DNN engines at runtime that enables spatially co-locating multiple DNN inference services on the same hardware, offering simultaneous multi-tenant DNN acceleration.
Proceedings ArticleDOI
Quality of Service Support for Fine-Grained Sharing on GPUs
TL;DR: This work proposes QoS mechanisms for a fine-grained form of GPU sharing that can provide control over the progress of kernels on a per cycle basis and the amount of thread-level parallelism of each kernel.
Proceedings ArticleDOI
BARISTA: Efficient and Scalable Serverless Serving System for Deep Learning Prediction Services
Anirban Bhattacharjee,Ajay Chhokra,Zhuangwei Kang,Hongyang Sun,Aniruddha Gokhale,Gabor Karsai +5 more
TL;DR: This work presents a distributed and scalable deep-learning prediction serving system called Barista, and proposes an intelligent agent to allocate and manage the compute resources by horizontal and vertical scaling to maintain the required prediction latency.
Proceedings ArticleDOI
Efficient and Fair Multi-programming in GPUs via Effective Bandwidth Management
TL;DR: This work proposes new application-aware TLP management techniques for a multi-application execution environment such that all co-scheduled applications can make good and judicious use of all the shared resources, and proposes an application-level utility metric, called effective bandwidth, which accounts for two runtime metrics: attained DRAM bandwidth and cache miss rates.
Journal ArticleDOI
Horus: Interference-Aware and Prediction-Based Scheduling in Deep Learning Systems
TL;DR: In this article, an interference-aware and prediction-based resource manager for DL systems is proposed, which proactively predicts GPU utilization of heterogeneous DL jobs extrapolated from the DL model's computation graph features, removing the need for online profiling and isolated reserved GPUs.
References
More filters
Journal ArticleDOI
Matrix Factorization Techniques for Recommender Systems
TL;DR: As the Netflix Prize competition has demonstrated, matrix factorization models are superior to classic nearest neighbor techniques for producing product recommendations, allowing the incorporation of additional information such as implicit feedback, temporal effects, and confidence levels.
Journal Article
Industry Report: Amazon.com Recommendations: Item-to-Item Collaborative Filtering.
TL;DR: This work compares three common approaches to solving the recommendation problem: traditional collaborative filtering, cluster models, and search-based methods, and their algorithm, which is called item-to-item collaborative filtering.
Posted Content
A Quantitative Measure Of Fairness And Discrimination For Resource Allocation In Shared Computer Systems
Raj Jain,Dah Ming Chiu,W. Hawe +2 more
TL;DR: A quantitative measure called Indiex of FRairness, applicable to any resource sharing or allocation problem, which is independent of the amount of the resource, and boundedness aids intuitive understanding of the fairness index.
Journal ArticleDOI
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions
Alexander Strehl,Joydeep Ghosh +1 more
TL;DR: This paper introduces the problem of combining multiple partitionings of a set of objects into a single consolidated clustering without accessing the features or algorithms that determined these partitionings and proposes three effective and efficient techniques for obtaining high-quality combiners (consensus functions).
Journal ArticleDOI
Amazon.com recommendations: item-to-item collaborative filtering
TL;DR: Item-to-item collaborative filtering (ITF) as mentioned in this paper is a popular recommendation algorithm for e-commerce Web sites that scales independently of the number of customers and number of items in the product catalog.