scispace - formally typeset
Open Access

Jumbo: Beyond MapReduce for Workload Balancing

TLDR
Jumbo is introduced, a distributed data processing platform that allows us to go beyond MapReduce and work towards solving the load balancing issues.
Abstract
Over the past decade several frameworks such as Google MapReduce have been developed that allow data processing with unprecedented scale due to their high scalability and fault tolerance. However, these systems provide both new and existing challenges for workload balancing that have not yet been fully explored. The MapReduce model in particular has some inherent limitations when it comes to workload balancing. In this paper, we introduce Jumbo, a distributed data processing platform that allows us to go beyond MapReduce and work towards solving the load balancing issues.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

An improved partitioning mechanism for optimizing massive data analysis using MapReduce

TL;DR: An improved partitioning algorithm that improves load balancing and memory consumption is proposed via an improved sampling algorithm and partitioner and experiments show that the proposed algorithm is faster, more memory efficient, and more accurate than the current implementation.
Proceedings ArticleDOI

MARLA: MapReduce for Heterogeneous Clusters

TL;DR: This paper addresses the problems associated with existing MapReduce implementations affecting cluster heterogeneity, and subsequently presents MARLA, a Map Reduce framework capable of performing well not only in homogeneous settings, but also when the cluster exhibits heterogeneous properties.
Journal ArticleDOI

A study on using uncertain time series matching algorithms for MapReduce applications

TL;DR: In this paper, the authors study CPU utilization time patterns of several MapReduce applications and save the patterns along with their statistical information in a reference database to be later used to tweak system parameters to efficiently execute future unknown applications.
Proceedings ArticleDOI

On Using Pattern Matching Algorithms in MapReduce Applications

TL;DR: This paper studies CPU utilization time patterns of several MapReduce applications to evaluate the hypothesis in tweaking system parameters in executing similar applications, and results showed effectiveness of the approach on pseudo-distributed Map Reduce platforms.
Journal ArticleDOI

An Adaptive and Memory Efficient Sampling Mechanism for Partitioning in MapReduce

TL;DR: An adaptive sampling mechanism for total order partitioning that can reduce memory consumption whilst partitioning with a trie-based sampling mechanism (ATrie) is proposed and experiments show the proposed mechanism is more adaptive and more memory efficient than previous implementations.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Journal ArticleDOI

The Google file system

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.
Proceedings ArticleDOI

Dryad: distributed data-parallel programs from sequential building blocks

TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
Proceedings ArticleDOI

Improving MapReduce performance in heterogeneous environments

TL;DR: A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.
Related Papers (5)