Book ChapterDOI
Core Algorithms of the Maui Scheduler
David B. Jackson,Quinn Snell,Mark J. Clement +2 more
- pp 87-102
TLDR
This paper focuses on three areas of Maui scheduling, specifically, backfill, job prioritization, and fairshare and briefly discusses the goals of each component, the issues and corresponding design decisions, and the algorithms enabling the Maui policies.Abstract:
The Maui scheduler has received wide acceptance in the HPC community as a highly configurable and effective batch scheduler It is currently in use on hundreds of SP, O2K, and Linux cluster systems throughout the world including a high percentage of the largest and most cutting edge research sites While the algorithms used within Maui have proven themselves effective, nothing has been published to date documenting these algorithms nor the configurable aspects they support This paper focuses on three areas of Maui scheduling, specifically, backfill, job prioritization, and fairshare It briefly discusses the goals of each component, the issues and corresponding design decisions, and the algorithms enabling the Maui policies It also covers the configurable aspects of each algorithm and the impact of various parameter selectionsread more
Citations
More filters
Proceedings ArticleDOI
Apache Hadoop YARN: yet another resource negotiator
Vinod Kumar Vavilapalli,Arun C. Murthy,Chris Douglas,Sharad Agarwal,Mahadev Konar,Robert Evans,Thomas Graves,Jason Lowe,Hitesh Shah,Siddharth Seth,Bikas Saha,Carlo Curino,Owen O'Malley,Sanjay Radia,Benjamin Reed,Eric Baldeschwieler +15 more
TL;DR: The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.
Journal ArticleDOI
Distributed computing in practice: the Condor experience
TL;DR: The history and philosophy of the Condor project is provided and how it has interacted with other projects and evolved along with the field of distributed computing is described.
Book ChapterDOI
SLURM: Simple Linux Utility for Resource Management
TL;DR: A new cluster resource management system called Simple Linux Utility Resource Management (SLURM) is described in this paper, designed to be flexible and fault-tolerant and can be ported to other clusters of different size and architecture with minimal effort.
Proceedings ArticleDOI
Large-scale cluster management at Google with Borg
TL;DR: A summary of the Borg system architecture and features, important design decisions, a quantitative analysis of some of its policy decisions, and a qualitative examination of lessons learned from a decade of operational experience with it are presented.
Proceedings ArticleDOI
Omega: flexible, scalable schedulers for large compute clusters
TL;DR: This work presents a novel approach to address increasing scale and the need for rapid response to changing requirements using parallelism, shared state, and lock-free optimistic concurrency control to address monolithic cluster scheduler architectures.
References
More filters
Book ChapterDOI
Job Scheduling Under the Portable Batch System
TL;DR: The typical batch queuing system schedules jobs for execution by a set of queue controls, which limits the set of scheduling policies available to a site.
Proceedings ArticleDOI
Utilization and predictability in scheduling the IBM SP2 with backfilling
Dror G. Feitelson,A.M. Weil +1 more
TL;DR: A more conservative approach is shown, in which small jobs move ahead only if they do not delay any job in the queue, which produces essentially the same benefits in terms of utilization as the EASY scheduler.
Book ChapterDOI
The EASY - LoadLeveler API Project
TL;DR: A LoadLeveler API is developed that allows external schedulers like EASY to control the starting and stopping of jobs through Load leveler, and helps address Cornell's difficulties with the current scheduling algorithm.
Book ChapterDOI
The Performance Impact of Advance Reservation Meta-scheduling
TL;DR: This research quantifies the impact of advance reservations on and outlines the algorithms that must be used to schedule metajobs and indicates that advance reservations can improve the response time for meetajobs, while not significantly impacting overall system performance.