Open Access
Jumbo: Beyond MapReduce for Workload Balancing
Sven Groot,Masaru Kitsuregawa +1 more
TLDR
Jumbo is introduced, a distributed data processing platform that allows us to go beyond MapReduce and work towards solving the load balancing issues.Abstract:
Over the past decade several frameworks such as Google MapReduce have been developed that allow data processing with unprecedented scale due to their high scalability and fault tolerance. However, these systems provide both new and existing challenges for workload balancing that have not yet been fully explored. The MapReduce model in particular has some inherent limitations when it comes to workload balancing. In this paper, we introduce Jumbo, a distributed data processing platform that allows us to go beyond MapReduce and work towards solving the load balancing issues.read more
Citations
More filters
Journal ArticleDOI
An improved partitioning mechanism for optimizing massive data analysis using MapReduce
TL;DR: An improved partitioning algorithm that improves load balancing and memory consumption is proposed via an improved sampling algorithm and partitioner and experiments show that the proposed algorithm is faster, more memory efficient, and more accurate than the current implementation.
Proceedings ArticleDOI
MARLA: MapReduce for Heterogeneous Clusters
TL;DR: This paper addresses the problems associated with existing MapReduce implementations affecting cluster heterogeneity, and subsequently presents MARLA, a Map Reduce framework capable of performing well not only in homogeneous settings, but also when the cluster exhibits heterogeneous properties.
Journal ArticleDOI
A study on using uncertain time series matching algorithms for MapReduce applications
Nikzad Babaii Rizvandi,Nikzad Babaii Rizvandi,Javid Taheri,R. Moraveji,R. Moraveji,Albert Y. Zomaya +5 more
TL;DR: In this paper, the authors study CPU utilization time patterns of several MapReduce applications and save the patterns along with their statistical information in a reference database to be later used to tweak system parameters to efficiently execute future unknown applications.
Proceedings ArticleDOI
On Using Pattern Matching Algorithms in MapReduce Applications
TL;DR: This paper studies CPU utilization time patterns of several MapReduce applications to evaluate the hypothesis in tweaking system parameters in executing similar applications, and results showed effectiveness of the approach on pseudo-distributed Map Reduce platforms.
Journal ArticleDOI
An Adaptive and Memory Efficient Sampling Mechanism for Partitioning in MapReduce
TL;DR: An adaptive sampling mechanism for total order partitioning that can reduce memory consumption whilst partitioning with a trie-based sampling mechanism (ATrie) is proposed and experiments show the proposed mechanism is more adaptive and more memory efficient than previous implementations.
References
More filters
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI
MapReduce: simplified data processing on large clusters
Jeffrey Dean,Sanjay Ghemawat +1 more
TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Journal ArticleDOI
The Google file system
TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Proceedings ArticleDOI
Dryad: distributed data-parallel programs from sequential building blocks
TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
Proceedings ArticleDOI
Improving MapReduce performance in heterogeneous environments
TL;DR: A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.