scispace - formally typeset
Proceedings ArticleDOI

An evaluation of Hadoop cluster efficiency in document clustering using parallel K-means

Reads0
Chats0
TLDR
This study design and experiment a parallel k-means algorithm using MapReduce programming model and compared the result with sequential k-Means for clustering varying size of document dataset and demonstrates that proposed k- means obtains higher performance and outperformed sequential k -means while clustering documents.
Abstract
One of the significant data mining techniques is clustering. Due to digitalization and globalization of each work space, large datasets are being generated rapidly. Such large dataset clustering is a challenge for traditional sequential clustering algorithms as it requires large execution time to cluster such datasets. Distributed parallel architectures and algorithms are thus helpful to achieve performance and scalability requirement of clustering large datasets. In this study, we design and experiment a parallel k-means algorithm using MapReduce programming model and compared the result with sequential k-means for clustering varying size of document dataset. The result demonstrates that proposed k-means obtains higher performance and outperformed sequential k-means while clustering documents.

read more

Citations
More filters
Journal ArticleDOI

Data Categorization Using Hadoop MapReduce-Based Parallel K-Means Clustering

TL;DR: This work has proposed a parallel version of traditional K-means so as to execute it over Hadoop distributed framework and the experimental results show that the proposed K- means algorithm outperforms traditional K -means while clustering large volume of datasets.
Journal ArticleDOI

An Analysis of Distributed Document Clustering Using MapReduce Based K -Means Algorithm

TL;DR: The result shows that the proposed algorithm is efficient than traditional K-Means for all size of document datasets clustering and works more efficiently when the dataset size and Hadoop cluster sizes are large.
Journal ArticleDOI

MapReduce-based Fuzzy C-means Algorithm for Distributed Document Clustering

TL;DR: In this paper, a MapReduce-based fuzzy C-means algorithm for big document data clustering is proposed, which is extensively experimented with using different sizes of document datasets and executed over the Hadoop cluster of different sizes.
Journal ArticleDOI

A Dataset Schema for Cooperative Learning from Demonstration in Multi-robots Systems

TL;DR: In this paper, the authors propose a new dataset schema to support learning the coordinated behavior in MASs from demonstration, which is validated in a Multi-Robot System (MRS) organizing a collection of new cooperative plans recommendations from the demonstration by domain experts.
Journal ArticleDOI

A Dataset Schema for Cooperative Learning from Demonstration in Multi-robot Systems

TL;DR: A new dataset schema is proposed to support learning the coordinated behavior in MASs from demonstration and is validated in a Multi-Robot System (MRS) organizing a collection of new cooperative plans recommendations from the demonstration by domain experts.
References
More filters
Journal ArticleDOI

Survey of clustering algorithms

TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Journal ArticleDOI

A survey on clustering algorithms for wireless sensor networks

TL;DR: A taxonomy and general classification of published clustering schemes for WSNs is presented, highlighting their objectives, features, complexity, etc and comparing of these clustering algorithms based on metrics such as convergence rate, cluster stability, cluster overlapping, location-awareness and support for node mobility.

A survey of ontology evaluation techniques

TL;DR: A survey of the state of the art in ontology evaluation is presented, typically in order to determine which of several ontologies would best suit a particular purpose.
Book ChapterDOI

A Survey of Text Clustering Algorithms

TL;DR: This chapter will study the key challenges of the clustering problem, as it applies to the text domain, and discuss the key methods used for text clustering, and their relative advantages.
Journal ArticleDOI

Improved K-mean Clustering Algorithm for Prediction Analysis using Classification Technique in Data Mining

TL;DR: Improvement in the kmean clustering algorithm will be proposed which can define number of clusters automatically and assign required cluster to un-clustered points and will leads to improvement in accuracy and reduce clustering time by the member assigned to the cluster to predict cancer.
Related Papers (5)