Load balancing in MapReduce on homogeneous and heterogeneous clusters: an in-depth review

doi:10.1504/IJCNDS.2015.070969

Journal ArticleDOI

Load balancing in MapReduce on homogeneous and heterogeneous clusters: an in-depth review

Mohammad Javad Kargar, +1 more

- 01 Aug 2015 -

International Journal of Communication N...

- Vol. 15, Iss: 2, pp 149-168

TLDR

This paper examines the effectiveness of two main key factors: data locality and data skew on homogeneous and heterogeneous clusters in Hadoop MapReduce.

Abstract:

Numbers of various programming models have been proposed to process big data in recent years. However, MapReduce is the most famous programming model amongst cloud computing environments and includes many advantages, yet there are several challenges to deal with. Load balancing is considered as one of the most significant downsides of MapReduce which causes the increase in applications' runtime and accordingly results in less-efficiency, where there is no appropriate proposed mechanism. Although, data locality and data skew are known as two main key factors for determination of load balancing, yet it is remarkable that load balance highly depends on whether the computational clusters are homogeneous or heterogeneous. This paper examines the effectiveness of two main key factors. These are data locality and data skew on homogeneous and heterogeneous clusters. Besides, a review is conducted on a number of recent literature in the same context of load balancing improvements in Hadoop MapReduce. Finally, all investigated researches are compared with the purpose of highlighting the differences of various load balancing methods, the optimisation phase, type of clusters and the main challenges.

Load balancing in MapReduce on homogeneous and heterogeneous clusters: an in-depth review

Citations

Performance evaluation and analysis of load balancing algorithms in cloud computing environments

MapReduce Data Skewness Handling: A Systematic Literature Review

Improving MapReduce Load Balancing in Hadoop

Performance Evaluation of Dynamic Load Balancing Algorithms in Cloud Computation

Load Balancing Algorithms in Cloud Computing Analysis and Performance Evaluation

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

The Hadoop Distributed File System

Hadoop: The Definitive Guide

Dryad: distributed data-parallel programs from sequential building blocks

Related Papers (5)

Addressing Performance Heterogeneity in MapReduce Clusters with Elastic Tasks

Tarazu: optimizing MapReduce on heterogeneous clusters

ActCap: Accelerating MapReduce on heterogeneous clusters with capability-aware data placement

Load balancing in MapReduce environments for data intensive applications

Improving MapReduce Performance through Complexity and Performance Based Data Placement in Heterogeneous Hadoop Clusters