Mining on Big Data Using Hadoop MapReduce Model

doi:10.1088/1757-899X/263/4/042007

Open AccessJournal ArticleDOI

Mining on Big Data Using Hadoop MapReduce Model

G Salman Ahmed, +1 more

- Vol. 263, Iss: 4, pp 042007

Chats0

TLDR

Trial uncovers the fact that Hadoop contributes towards lessening system and processing masses by the uprightness of dispensing with excess exchanges on Hadoops hubs, and impressively outperforms and enhances the other models considerably.

Abstract:

Customary parallel calculations for mining nonstop item create opportunity to adjust stack of similar data among hubs. The paper aims to review this process by analyzing the critical execution downside of the common parallel recurrent item-set mining calculations. Given a larger than average dataset, data apportioning strategies inside the current arrangements endure high correspondence and mining overhead evoked by repetitive exchanges transmitted among registering hubs. We tend to address this downside by building up a learning apportioning approach referred as Hadoop abuse using the map-reduce programming model. All objectives of Hadoop are to zest up the execution of parallel recurrent item-set mining on Hadoop bunches. Fusing the comparability metric and furthermore the locality-sensitive hashing procedure, Hadoop puts to a great degree comparative exchanges into an information segment to lift neighborhood while not making AN exorbitant assortment of excess exchanges. We tend to execute Hadoop on a 34-hub Hadoop bunch, driven by a decent change of datasets made by IBM quest market-basket manufactured data generator. Trial uncovers the fact that Hadoop contributes towards lessening system and processing masses by the uprightness of dispensing with excess exchanges on Hadoop hubs. Hadoop impressively outperforms and enhances the other models considerably.

Mining on Big Data Using Hadoop MapReduce Model

Citations

Deep neural networks to predict diabetic retinopathy

Map-optimize-reduce: CAN tree assisted FP-growth algorithm for clusters based FP mining on Hadoop

Performance Analysis of ECG Big Data using Apache Hive and Apache Pig

Relative Competence Centered Scrutiny and Implementation of Apriori, FP – Growth and Mapreduce Algorithms

References

MapReduce: simplified data processing on large clusters

MapReduce: simplified data processing on large clusters

Schism: a workload-driven approach to database replication and partitioning

Parallel and distributed association mining: a survey

Pfp: parallel fp-growth for query recommendation

Related Papers (5)

An Approach to Enhance the Performance of Hadoop MapReduce Framework for Big Data

A Survey of Big Data Processing in Perspective of Hadoop and Mapreduce

A Review of Scheduling Algorithms in Hadoop

A Big Data MapReduce Hadoop distribution architecture for processing input splits to solve the small data problem

A case for MapReduce over the internet