Proceedings ArticleDOI
An evaluation of Hadoop cluster efficiency in document clustering using parallel K-means
Reads0
Chats0
TLDR
This study design and experiment a parallel k-means algorithm using MapReduce programming model and compared the result with sequential k-Means for clustering varying size of document dataset and demonstrates that proposed k- means obtains higher performance and outperformed sequential k -means while clustering documents.Abstract:
One of the significant data mining techniques is clustering. Due to digitalization and globalization of each work space, large datasets are being generated rapidly. Such large dataset clustering is a challenge for traditional sequential clustering algorithms as it requires large execution time to cluster such datasets. Distributed parallel architectures and algorithms are thus helpful to achieve performance and scalability requirement of clustering large datasets. In this study, we design and experiment a parallel k-means algorithm using MapReduce programming model and compared the result with sequential k-means for clustering varying size of document dataset. The result demonstrates that proposed k-means obtains higher performance and outperformed sequential k-means while clustering documents.read more
Citations
More filters
Journal ArticleDOI
Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering
TL;DR: This work has proposed a parallel version of traditional K-means so as to execute it over Hadoop distributed framework and the experimental results show that the proposed K- means algorithm outperforms traditional K -means while clustering large volume of datasets.
Journal ArticleDOI
An Analysis of Distributed Document Clustering Using MapReduce Based K -Means Algorithm
Tanvir Habib Sardar,Zahid Ansari +1 more
TL;DR: The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering and works more efficiently when the dataset size and Hadoop cluster sizes are large.
Journal ArticleDOI
MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering
Tanvir Habib Sardar,Zahid Ansari +1 more
TL;DR: In this paper, a MapReduce-based fuzzy C-means algorithm for big document data clustering is proposed, which is extensively experimented with using different sizes of document datasets and executed over the Hadoop cluster of different sizes.
Journal ArticleDOI
A Dataset Schema for Cooperative Learning from Demonstration in Multi-robots Systems
TL;DR: In this paper, the authors propose a new dataset schema to support learning the coordinated behavior in MASs from demonstration, which is validated in a Multi-Robot System (MRS) organizing a collection of new cooperative plans recommendations from the demonstration by domain experts.
Journal ArticleDOI
A Dataset Schema for Cooperative Learning from Demonstration in Multi-robot Systems
TL;DR: A new dataset schema is proposed to support learning the coordinated behavior in MASs from demonstration and is validated in a Multi-Robot System (MRS) organizing a collection of new cooperative plans recommendations from the demonstration by domain experts.
References
More filters
Journal ArticleDOI
Survey of clustering algorithms
Rui Xu,Donald C. Wunsch +1 more
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Journal ArticleDOI
A survey on clustering algorithms for wireless sensor networks
TL;DR: A taxonomy and general classification of published clustering schemes for WSNs is presented, highlighting their objectives, features, complexity, etc and comparing of these clustering algorithms based on metrics such as convergence rate, cluster stability, cluster overlapping, location-awareness and support for node mobility.
A survey of ontology evaluation techniques
Janez Brank,Marko Grobelnik +1 more
TL;DR: A survey of the state of the art in ontology evaluation is presented, typically in order to determine which of several ontologies would best suit a particular purpose.
Book ChapterDOI
A Survey of Text Clustering Algorithms
TL;DR: This chapter will study the key challenges of the clustering problem, as it applies to the text domain, and discuss the key methods used for text clustering, and their relative advantages.
Journal ArticleDOI
Improved K-mean Clustering Algorithm for Prediction Analysis using Classification Technique in Data Mining
TL;DR: Improvement in the kmean clustering algorithm will be proposed which can define number of clusters automatically and assign required cluster to un-clustered points and will leads to improvement in accuracy and reduce clustering time by the member assigned to the cluster to predict cancer.
Related Papers (5)
An Analysis of Distributed Document Clustering Using MapReduce Based K -Means Algorithm
Tanvir Habib Sardar,Zahid Ansari +1 more