scispace - formally typeset
Journal ArticleDOI

Load balancing in MapReduce on homogeneous and heterogeneous clusters: an in-depth review

TLDR
This paper examines the effectiveness of two main key factors: data locality and data skew on homogeneous and heterogeneous clusters in Hadoop MapReduce.
Abstract
Numbers of various programming models have been proposed to process big data in recent years. However, MapReduce is the most famous programming model amongst cloud computing environments and includes many advantages, yet there are several challenges to deal with. Load balancing is considered as one of the most significant downsides of MapReduce which causes the increase in applications' runtime and accordingly results in less-efficiency, where there is no appropriate proposed mechanism. Although, data locality and data skew are known as two main key factors for determination of load balancing, yet it is remarkable that load balance highly depends on whether the computational clusters are homogeneous or heterogeneous. This paper examines the effectiveness of two main key factors. These are data locality and data skew on homogeneous and heterogeneous clusters. Besides, a review is conducted on a number of recent literature in the same context of load balancing improvements in Hadoop MapReduce. Finally, all investigated researches are compared with the purpose of highlighting the differences of various load balancing methods, the optimisation phase, type of clusters and the main challenges.

read more

Citations
More filters
Proceedings ArticleDOI

Performance evaluation and analysis of load balancing algorithms in cloud computing environments

TL;DR: An analytical comparison for the combinations of VM load balancing algorithms and different broker policies for cloud computing systems is presented and the best possible combinations are specified.
Journal ArticleDOI

MapReduce Data Skewness Handling: A Systematic Literature Review

TL;DR: In this review, it was concluded that there are important parameters have not been considered in MapReduce data skewness handling approaches.
Proceedings ArticleDOI

Improving MapReduce Load Balancing in Hadoop

TL;DR: A load balancing mechanism to mitigate the negative effect of data skew on the performance of MapReduce is proposed and results demonstrate that this mechanism can effectively reduce the data transmission cost through the network to each reducer.
Posted ContentDOI

Performance Evaluation of Dynamic Load Balancing Algorithms in Cloud Computation

TL;DR: In this article , the authors present an investigative and evaluation for the combinations of virtual machine dynamic load balancing algorithms and different load advisor policies and the cloud computing load evaluate these methodologies by simulating on CloudAnalyst simulator and the final consequences reports are accessible based on different cloud computing environment parameters.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Proceedings ArticleDOI

The Hadoop Distributed File System

TL;DR: The architecture of HDFS is described and experience using HDFS to manage 25 petabytes of enterprise data at Yahoo! is reported on.
Book

Hadoop: The Definitive Guide

Tom White
TL;DR: This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoops clusters.
Proceedings ArticleDOI

Dryad: distributed data-parallel programs from sequential building blocks

TL;DR: The Dryad execution engine handles all the difficult problems of creating a large distributed, concurrent application: scheduling the use of computers and their CPUs, recovering from communication or computer failures, and transporting data between vertices.
Related Papers (5)