An evaluation model for Cloud-based Data mining Systems with Hadoop

doi:10.1109/CITISIA50690.2020.9371799

Proceedings ArticleDOI

An evaluation model for Cloud-based Data mining Systems with Hadoop

Anil Limbu, +1 more

Chats0

TLDR

In this article, a review of the existing technologies described by some of the relevant works were taken to achieve the overall framework, and some algorithms perform better in any given circumstance based on the references, and there is nothing like one algorithm fits all the tasks of mining procedures.

Abstract:

The traditional approach of mining is more expensive, slow, and is inefficient in case of big data. This calls the essence of cloud technology which has the capability of discovering knowledge from a huge database at a very high rate. The implementation of Hadoop technology makes the processing more efficient because of its underlying characteristic of parallelism and data locality. The aim is to have a system based on a review that resembles the most efficient cloud data mining technology. The system should have capabilities to mine big data and have greater application whilst address the problems of existing mining technologies. In doing so, the existing technologies described by some of the relevant works were taken to achieve the overall framework. Reviews of related works were performed for a better understanding of the existing technology on cloud data mining. Based on the references, some algorithms perform better in any given circumstance. The scalability, parallelism, and cost-effectiveness play a significant role in making the system more efficient. The data locality feature of Hadoop gives a maximum optimization in the mining process. Data mining is not a single task, and there is nothing like one algorithm fits all the tasks of mining procedures. The assumptions and given circumstance of data mining will define the accuracy of mining and overall performance. Data type and tasks are always changing which indicates the essence in dynamic algorithms and techniques of data mining.

References

PDF

Open Access

More filters

Proceedings Article

A density-based algorithm for discovering clusters in large spatial Databases with Noise

Martin Ester, +3 more

TL;DR: DBSCAN, a new clustering algorithm relying on a density-based notion of clusters which is designed to discover clusters of arbitrary shape, is presented which requires only one input parameter and supports the user in determining an appropriate value for it.

...read moreread less

Journal ArticleDOI

A Survey of Parallel Sequential Pattern Mining

Wensheng Gan, +4 more

- 07 Jun 2019 -

ACM Transactions on Knowledge Discovery ...

TL;DR: An in-depth survey of the current status of parallel SPM (PSPM) is investigated and provided, including detailed categorization of traditional serial SPM approaches, and state-of-the art PSPM.

...read moreread less

Journal ArticleDOI

A survey of data partitioning and sampling methods to support big data analysis

Mohammad Sultan Mahmud, +4 more

TL;DR: It is believed that data partitioning and sampling should be considered together to build approximate cluster computing frameworks that are reliable in both the computational and statistical respects.

...read moreread less

Journal ArticleDOI

Performance and energy efficiency of big data applications in cloud environments

Eugen Feller, +2 more

- 01 May 2015 -

Journal of Parallel and Distributed Comp...

TL;DR: An energy efficiency evaluation of Hadoop on physical and virtual clusters in different configurations and a discussion on the implications of using cloud environments for big data analyses are presented.

...read moreread less

Journal ArticleDOI

Parallel and distributed clustering framework for big spatial data mining

Malika Bendechache, +2 more

- 16 Mar 2019 -

International Journal of Parallel, Emerg...

TL;DR: A Dynamic Parallel and Distributed clustering (DPDC) approach that can analyse Big Data within a reasonable response time and produce accurate results, by using existing and current computing and storage infrastructure, such as cloud computing.

...read moreread less

An evaluation model for Cloud-based Data mining Systems with Hadoop

References

A density-based algorithm for discovering clusters in large spatial Databases with Noise

A Survey of Parallel Sequential Pattern Mining

A survey of data partitioning and sampling methods to support big data analysis

Performance and energy efficiency of big data applications in cloud environments

Parallel and distributed clustering framework for big spatial data mining

Related Papers (5)

A Framework for Exploring Algorithms for Big Data Mining

PDMiner：a cloud computing based parallel and distributed data mining toolkit platform

Research of Data-Aiming Mining Algorithm in Cloud Environment

The Research of Large Scale Data Processing Platform Based on the Spark

A Novel Approach for Identification of Hadoop Cloud Temporal Patterns Using Map Reduce