scispace - formally typeset
Open AccessJournal ArticleDOI

Mining on Big Data Using Hadoop MapReduce Model

G Salman Ahmed, +1 more
- Vol. 263, Iss: 4, pp 042007
Reads0
Chats0
TLDR
Trial uncovers the fact that Hadoop contributes towards lessening system and processing masses by the uprightness of dispensing with excess exchanges on Hadoops hubs, and impressively outperforms and enhances the other models considerably.
Abstract
Customary parallel calculations for mining nonstop item create opportunity to adjust stack of similar data among hubs. The paper aims to review this process by analyzing the critical execution downside of the common parallel recurrent item-set mining calculations. Given a larger than average dataset, data apportioning strategies inside the current arrangements endure high correspondence and mining overhead evoked by repetitive exchanges transmitted among registering hubs. We tend to address this downside by building up a learning apportioning approach referred as Hadoop abuse using the map-reduce programming model. All objectives of Hadoop are to zest up the execution of parallel recurrent item-set mining on Hadoop bunches. Fusing the comparability metric and furthermore the locality-sensitive hashing procedure, Hadoop puts to a great degree comparative exchanges into an information segment to lift neighborhood while not making AN exorbitant assortment of excess exchanges. We tend to execute Hadoop on a 34-hub Hadoop bunch, driven by a decent change of datasets made by IBM quest market-basket manufactured data generator. Trial uncovers the fact that Hadoop contributes towards lessening system and processing masses by the uprightness of dispensing with excess exchanges on Hadoop hubs. Hadoop impressively outperforms and enhances the other models considerably.

read more

Citations
More filters
Journal ArticleDOI

Deep neural networks to predict diabetic retinopathy

TL;DR: The present study uses principal component analysis based deep neural network model using Grey Wolf Optimization (GWO) algorithm to classify the extracted features of diabetic retinopathy dataset and shows that the proposed model offers better performance compared to the traditional machine learning algorithms.
Journal ArticleDOI

Map-optimize-reduce: CAN tree assisted FP-growth algorithm for clusters based FP mining on Hadoop

TL;DR: This work establishes FPM using extend version of MapReduce framework in Hadoop environment and performs preprocessing to remove data redundancy, and proposes AP clustering which generates effective clusters from the given dataset.
Proceedings ArticleDOI

Performance Analysis of ECG Big Data using Apache Hive and Apache Pig

TL;DR: It is showed from results that Apache Pig has been considered more efficient and systematic in providing quick results in less time as compared to Apache Hive.

Relative Competence Centered Scrutiny and Implementation of Apriori, FP – Growth and Mapreduce Algorithms

TL;DR: The major rise in data collection and storage has raised the necessity for much more powerful data analysis tools and there is a need for constantly updating the models to handle data velocity or new incoming data.
References
More filters
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This paper presents the implementation of MapReduce, a programming model and an associated implementation for processing and generating large data sets that runs on a large cluster of commodity machines and is highly scalable.
Journal ArticleDOI

MapReduce: simplified data processing on large clusters

TL;DR: This presentation explains how the underlying runtime system automatically parallelizes the computation across large-scale clusters of machines, handles machine failures, and schedules inter-machine communication to make efficient use of the network and disks.
Journal ArticleDOI

Schism: a workload-driven approach to database replication and partitioning

TL;DR: Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.
Journal ArticleDOI

Parallel and distributed association mining: a survey

TL;DR: The author surveys the state of the art in parallel and distributed association-rule-mining algorithms and uncovers the field's challenges and open research problems.
Proceedings ArticleDOI

Pfp: parallel fp-growth for query recommendation

TL;DR: Through empirical study on a large dataset of 802,939 Web pages and 1,021,107 tags, it is demonstrated that PFP can achieve virtually linear speedup and to be promising for supporting query recommendation for search engines.