scispace - formally typeset
Proceedings ArticleDOI

An evaluation model for Cloud-based Data mining Systems with Hadoop

Reads0
Chats0
TLDR
In this article, a review of the existing technologies described by some of the relevant works were taken to achieve the overall framework, and some algorithms perform better in any given circumstance based on the references, and there is nothing like one algorithm fits all the tasks of mining procedures.
Abstract
The traditional approach of mining is more expensive, slow, and is inefficient in case of big data. This calls the essence of cloud technology which has the capability of discovering knowledge from a huge database at a very high rate. The implementation of Hadoop technology makes the processing more efficient because of its underlying characteristic of parallelism and data locality. The aim is to have a system based on a review that resembles the most efficient cloud data mining technology. The system should have capabilities to mine big data and have greater application whilst address the problems of existing mining technologies. In doing so, the existing technologies described by some of the relevant works were taken to achieve the overall framework. Reviews of related works were performed for a better understanding of the existing technology on cloud data mining. Based on the references, some algorithms perform better in any given circumstance. The scalability, parallelism, and cost-effectiveness play a significant role in making the system more efficient. The data locality feature of Hadoop gives a maximum optimization in the mining process. Data mining is not a single task, and there is nothing like one algorithm fits all the tasks of mining procedures. The assumptions and given circumstance of data mining will define the accuracy of mining and overall performance. Data type and tasks are always changing which indicates the essence in dynamic algorithms and techniques of data mining.

read more

References
More filters
Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.
Journal ArticleDOI

A Survey of Parallel Sequential Pattern Mining

TL;DR: An in-depth survey of the current status of parallel SPM (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state-of-the art PSPM.
Journal ArticleDOI

A survey of data partitioning and sampling methods to support big data analysis

TL;DR: It is believed that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.
Journal ArticleDOI

Performance and energy efficiency of big data applications in cloud environments

TL;DR: An energy efficiency evaluation of Hadoop on physical and virtual clusters in different configurations and a discussion on the implications of using cloud environments for big data analyses are presented.
Journal ArticleDOI

Parallel and distributed clustering framework for big spatial data mining

TL;DR: A Dynamic Parallel and Distributed clustering (DPDC) approach that can analyse Big Data within a reasonable response time and produce accurate results, by using existing and current computing and storage infrastructure, such as cloud computing.
Related Papers (5)