Proceedings ArticleDOI
Scheduling in mapreduce-like systems for fast completion time
Hyunseok Chang,Murali Kodialam,Ramana Rao Kompella,T. V. Lakshman,Myungjin Lee,Sarit Mukherjee +5 more
- pp 3074-3082
Reads0
Chats0
TLDR
This paper devise various online and offline algorithms to arrive at a good ordering of jobs to minimize the overall job completion times, and proposes approximation algorithms that work within a factor of 3 of the optimal.Abstract:
Large-scale data processing needs of enterprises today are primarily met with distributed and parallel computing in data centers. MapReduce has emerged as an important programming model for these environments. Since today's data centers run many MapReduce jobs in parallel, it is important to find a good scheduling algorithm that can optimize the completion times of these jobs. While several recent papers focused on optimizing the scheduler, there exists very little theoretical understanding of the scheduling problem in the context of MapReduce. In this paper, we seek to address this problem by first presenting a simplified abstraction of the MapReduce scheduling problem, and then formulate the scheduling problem as an optimization problem.We devise various online and offline algorithms to arrive at a good ordering of jobs to minimize the overall job completion times. Since optimal solutions are hard to compute (NP-hard), we propose approximation algorithms that work within a factor of 3 of the optimal. Using simulations, we also compare our online algorithm with standard scheduling strategies such as FIFO, Shortest Job First and show that our algorithm consistently outperforms these across different job distributions.read more
Citations
More filters
Journal ArticleDOI
Energy-Aware Scheduling of MapReduce Jobs for Big Data Applications
TL;DR: This paper proposes two heuristic algorithms, called energy-aware MapReduce scheduling algorithms (EMRSA-I and EMRSA-II), that find the assignments of map and reduce tasks to the machine slots in orderto minimize the energy consumed when executing the application.
Proceedings ArticleDOI
Joint scheduling of processing and Shuffle phases in MapReduce systems
TL;DR: This paper considers the problem of jointly scheduling all three phases of the MapReduce process with a view of understanding the theoretical complexity of the joint scheduling and working towards practical heuristics for scheduling the tasks.
Patent
Resource aware scheduling in a distributed computing environment
Xiaoqiao Meng,Jian Tan,Li Zhang +2 more
TL;DR: In this article, the authors present a system and methods for resource aware scheduling of processes in a distributed computing environment and present a comparison of the current reward value and the prospective reward value.
Journal ArticleDOI
From the Cloud to the Atmosphere: Running MapReduce across Data Centers
TL;DR: G-MR is introduced, a system for executing sequences of MapReduce jobs on geo-distributed data sets, which implements the optimization framework, and evaluations show that using G-MR significantly improves processing time and cost for geodistributed data set.
Journal ArticleDOI
Budget-Driven Scheduling Algorithms for Batches of MapReduce Jobs in Heterogeneous Clouds
Yang Wang,Wei Shi +1 more
TL;DR: Two greedy algorithms are developed, called Global Greedy Budget and Gradual Refinement, which show the efficiencies of the greedy algorithms in cost-effectiveness to distribute the budget for performance optimizations of the MapReduce workflows.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Proceedings ArticleDOI
Improving MapReduce performance in heterogeneous environments
TL;DR: A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.
Proceedings ArticleDOI
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
TL;DR: This work proposes a simple algorithm called delay scheduling, which achieves nearly optimal data locality in a variety of workloads and can increase throughput by up to 2x while preserving fairness.
Proceedings ArticleDOI
Quincy: fair scheduling for distributed computing clusters
TL;DR: It is argued that data-intensive computation benefits from a fine-grain resource sharing model that differs from the coarser semi-static resource allocations implemented by most existing cluster computing architectures.