Jumbo: Beyond MapReduce for Workload Balancing

Open Access

Jumbo: Beyond MapReduce for Workload Balancing

TLDR

Jumbo is introduced, a distributed data processing platform that allows us to go beyond MapReduce and work towards solving the load balancing issues.

Abstract:

Over the past decade several frameworks such as Google MapReduce have been developed that allow data processing with unprecedented scale due to their high scalability and fault tolerance. However, these systems provide both new and existing challenges for workload balancing that have not yet been fully explored. The MapReduce model in particular has some inherent limitations when it comes to workload balancing. In this paper, we introduce Jumbo, a distributed data processing platform that allows us to go beyond MapReduce and work towards solving the load balancing issues.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

An improved partitioning mechanism for optimizing massive data analysis using MapReduce

Kenn Slagter, +3 more

- 01 Oct 2013 -

The Journal of Supercomputing

TL;DR: An improved partitioning algorithm that improves load balancing and memory consumption is proposed via an improved sampling algorithm and partitioner and experiments show that the proposed algorithm is faster, more memory efficient, and more accurate than the current implementation.

...read moreread less

Proceedings ArticleDOI

MARLA: MapReduce for Heterogeneous Clusters

Zacharia Fadika, +3 more

TL;DR: This paper addresses the problems associated with existing MapReduce implementations affecting cluster heterogeneity, and subsequently presents MARLA, a Map Reduce framework capable of performing well not only in homogeneous settings, but also when the cluster exhibits heterogeneous properties.

...read moreread less

Journal ArticleDOI

A study on using uncertain time series matching algorithms for MapReduce applications

Nikzad Babaii Rizvandi, +5 more

- 25 Aug 2013 -

Concurrency and Computation: Practice an...

TL;DR: In this paper, the authors study CPU utilization time patterns of several MapReduce applications and save the patterns along with their statistical information in a reference database to be later used to tweak system parameters to efficiently execute future unknown applications.

...read moreread less

Proceedings ArticleDOI

On Using Pattern Matching Algorithms in MapReduce Applications

Nikzad Babaii Rizvandi, +2 more

TL;DR: This paper studies CPU utilization time patterns of several MapReduce applications to evaluate the hypothesis in tweaking system parameters in executing similar applications, and results showed effectiveness of the approach on pseudo-distributed Map Reduce platforms.

...read moreread less

Journal ArticleDOI

An Adaptive and Memory Efficient Sampling Mechanism for Partitioning in MapReduce

Kenn Slagter, +2 more

- 01 Jun 2015 -

International Journal of Parallel Progra...

TL;DR: An adaptive sampling mechanism for total order partitioning that can reduce memory consumption whilst partitioning with a trie-based sampling mechanism (ATrie) is proposed and experiments show the proposed mechanism is more adaptive and more memory efficient than previous implementations.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.

...read moreread less

Journal ArticleDOI

MapReduce: simplified data processing on large clusters

Jeffrey Dean, +1 more

- 01 Jan 2008 -

Communications of The ACM

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.

...read moreread less

Journal ArticleDOI

The Google file system

Sanjay Ghemawat, +2 more

TL;DR: This paper presents file system interface extensions designed to support distributed applications, discusses many aspects of the design, and reports measurements from both micro-benchmarks and real world use.

...read moreread less

Proceedings ArticleDOI

Dryad: distributed data-parallel programs from sequential building blocks

Michael Isard, +4 more

TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.

...read moreread less

Proceedings ArticleDOI

Improving MapReduce performance in heterogeneous environments

Matei Zaharia, +4 more

TL;DR: A new scheduling algorithm, Longest Approximate Time to End (LATE), that is highly robust to heterogeneity and can improve Hadoop response times by a factor of 2 in clusters of 200 virtual machines on EC2.

...read moreread less

Jumbo: Beyond MapReduce for Workload Balancing

Citations

An improved partitioning mechanism for optimizing massive data analysis using MapReduce

MARLA: MapReduce for Heterogeneous Clusters

A study on using uncertain time series matching algorithms for MapReduce applications

On Using Pattern Matching Algorithms in MapReduce Applications

An Adaptive and Memory Efficient Sampling Mechanism for Partitioning in MapReduce

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

The Google file system

Dryad: distributed data-parallel programs from sequential building blocks

Improving MapReduce performance in heterogeneous environments

Related Papers (5)

MapReduce: simplified data processing on large clusters

Improving MapReduce performance in heterogeneous environments

Job Scheduling for Multi-User MapReduce Clusters

Brief announcement: modelling MapReduce for optimal execution in the cloud

MapReduce optimization using regulated dynamic prioritization