An evaluation of Hadoop cluster efficiency in document clustering using parallel K-means

doi:10.1109/ICCS1.2017.8325954

Proceedings ArticleDOI

An evaluation of Hadoop cluster efficiency in document clustering using parallel K-means

Tanvir Habib Sardar, +2 more

Chats0

TLDR

This study design and experiment a parallel k-means algorithm using MapReduce programming model and compared the result with sequential k-Means for clustering varying size of document dataset and demonstrates that proposed k- means obtains higher performance and outperformed sequential k -means while clustering documents.

Abstract:

One of the significant data mining techniques is clustering. Due to digitalization and globalization of each work space, large datasets are being generated rapidly. Such large dataset clustering is a challenge for traditional sequential clustering algorithms as it requires large execution time to cluster such datasets. Distributed parallel architectures and algorithms are thus helpful to achieve performance and scalability requirement of clustering large datasets. In this study, we design and experiment a parallel k-means algorithm using MapReduce programming model and compared the result with sequential k-means for clustering varying size of document dataset. The result demonstrates that proposed k-means obtains higher performance and outperformed sequential k-means while clustering documents.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

Zahid Ansari, +2 more

- 01 Apr 2019 -

Journal of The Institution of Engineers ...

TL;DR: This work has proposed a parallel version of traditional K-means so as to execute it over Hadoop distributed framework and the experimental results show that the proposed K- means algorithm outperforms traditional K -means while clustering large volume of datasets.

...read moreread less

Journal ArticleDOI

An Analysis of Distributed Document Clustering Using MapReduce Based K -Means Algorithm

Tanvir Habib Sardar, +1 more

- 01 Dec 2020 -

Journal of The Institution of Engineers ...

TL;DR: The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering and works more efficiently when the dataset size and Hadoop cluster sizes are large.

...read moreread less

Journal ArticleDOI

MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering

Tanvir Habib Sardar, +1 more

- 19 Jul 2021 -

Journal of The Institution of Engineers ...

TL;DR: In this paper, a MapReduce-based fuzzy C-means algorithm for big document data clustering is proposed, which is extensively experimented with using different sizes of document datasets and executed over the Hadoop cluster of different sizes.

...read moreread less

Journal ArticleDOI

A Dataset Schema for Cooperative Learning from Demonstration in Multi-robots Systems

Marco A. C. Simões, +2 more

- 03 Dec 2019 -

arXiv: Robotics

TL;DR: In this paper, the authors propose a new dataset schema to support learning the coordinated behavior in MASs from demonstration, which is validated in a Multi-Robot System (MRS) organizing a collection of new cooperative plans recommendations from the demonstration by domain experts.

...read moreread less

Journal ArticleDOI

A Dataset Schema for Cooperative Learning from Demonstration in Multi-robot Systems

Marco A. C. Simões, +2 more

- 01 Sep 2020 -

Journal of Intelligent and Robotic Syste...

TL;DR: A new dataset schema is proposed to support learning the coordinated behavior in MASs from demonstration and is validated in a Multi-Robot System (MRS) organizing a collection of new cooperative plans recommendations from the demonstration by domain experts.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Survey of clustering algorithms

Rui Xu, +1 more

- 01 May 2005 -

IEEE Transactions on Neural Networks

TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.

...read moreread less

Journal ArticleDOI

A survey on clustering algorithms for wireless sensor networks

Ameer Ahmed Abbasi, +1 more

- 01 Oct 2007 -

Computer Communications

TL;DR: A taxonomy and general classification of published clustering schemes for WSNs is presented, highlighting their objectives, features, complexity, etc and comparing of these clustering algorithms based on metrics such as convergence rate, cluster stability, cluster overlapping, location-awareness and support for node mobility.

...read moreread less

A survey of ontology evaluation techniques

Janez Brank, +1 more

TL;DR: A survey of the state of the art in ontology evaluation is presented, typically in order to determine which of several ontologies would best suit a particular purpose.

...read moreread less

Book ChapterDOI

A Survey of Text Clustering Algorithms

Charu C. Aggarwal, +1 more

TL;DR: This chapter will study the key challenges of the clustering problem, as it applies to the text domain, and discuss the key methods used for text clustering, and their relative advantages.

...read moreread less

Journal ArticleDOI

Improved K-mean Clustering Algorithm for Prediction Analysis using Classification Technique in Data Mining

Arpit Bansal, +2 more

- 17 Jan 2017 -

International Journal of Computer Applic...

TL;DR: Improvement in the kmean clustering algorithm will be proposed which can define number of clusters automatically and assign required cluster to un-clustered points and will leads to improvement in accuracy and reduce clustering time by the member assigned to the cluster to predict cancer.

...read moreread less