Journal ArticleDOI
Classification Framework of MapReduce Scheduling Algorithms
TLDR
A comprehensive and structured survey of the scheduling algorithms proposed so far is presented here using a novel multidimensional classification framework and identifies various open issues and directions for future research.Abstract:
A MapReduce scheduling algorithm plays a critical role in managing large clusters of hardware nodes and meeting multiple quality requirements by controlling the order and distribution of users, jobs, and tasks execution. A comprehensive and structured survey of the scheduling algorithms proposed so far is presented here using a novel multidimensional classification framework. These dimensions are (i) meeting quality requirements, (ii) scheduling entities, and (iii) adapting to dynamic environments; each dimension has its own taxonomy. An empirical evaluation framework for these algorithms is recommended. This survey identifies various open issues and directions for future research.read more
Citations
More filters
Journal ArticleDOI
Architecting Time-Critical Big-Data Systems
TL;DR: This paper deals with the definition of a time-critical big- data system from the point of view of requirements, analyzing the specific characteristics of some popular big-data applications and proposing an architecture and offering initial performance patterns that connect application costs with infrastructure performance.
Journal ArticleDOI
MapReduce Scheduling for Deadline-Constrained Jobs in Heterogeneous Cloud Computing Systems
TL;DR: The Bipartite Graph modelling is utilized to propose a new MapReduce Scheduler called the BGMRS, which can obtain the optimal solution of the deadline-constrained scheduling problem by transforming the problem into a well-known graph problem: minimum weighted bipartite matching.
Journal ArticleDOI
A data locality based scheduler to enhance MapReduce performance in heterogeneous environments
TL;DR: The experimental results prove that the proposed scheduler enhances the MapReduce performance in heterogeneous environments and improves data locality for different parameters as compared to the Hadoop default scheduler, Matchmaking scheduler and Delay scheduler respectively.
Journal ArticleDOI
MapReduce scheduling algorithms: a review
Ibrahim Abaker Targio Hashem,Ibrahim Abaker Targio Hashem,Nor Badrul Anuar,Mohsen Marjani,Ejaz Ahmed,Haruna Chiroma,Ahmad Firdaus,Muhamad Taufik Abdullah,Faiz Alotaibi,Waleed Kamaleldin Mahmoud Ali,Ibrar Yaqoob,Abdullah Gani +11 more
TL;DR: This study analyzed scheduling in MapReduce on two aspects: taxonomy and performance evaluation and can serve as the benchmark to expert researchers for proposing a novel MapReduced scheduling algorithm and for novice researchers, it can be used as a starting point.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Book
Hadoop: The Definitive Guide
TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.
Proceedings ArticleDOI
Apache Hadoop YARN: yet another resource negotiator
Vinod Kumar Vavilapalli,Arun C. Murthy,Chris Douglas,Sharad Agarwal,Mahadev Konar,Robert Evans,Thomas Graves,Jason Lowe,Hitesh Shah,Siddharth Seth,Bikas Saha,Carlo Curino,Owen O'Malley,Sanjay Radia,Benjamin Reed,Eric Baldeschwieler +15 more
TL;DR: The design, development, and current state of deployment of the next generation of Hadoop's compute platform: YARN is summarized, which decouples the programming model from the resource management infrastructure, and delegates many scheduling functions to per-application components.
Book
Scheduling Algorithms
TL;DR: Besides scheduling problems for single and parallel machines and shop scheduling problems, this book covers advanced models involving due-dates, sequence dependent changeover times and batching.