scispace - formally typeset
Open AccessJournal Article

A Survey on Text Mining in Clustering

S. Logeswari, +1 more
- 01 Jan 2011 - 
- Vol. 2, Iss: 1, pp 111-116
TLDR
This paper emphasis on the various techniques that are used to cluster the text documents based on keywords, phrases and concepts, and includes the different performance measures that were used to evaluate the quality of clusters.
Abstract
Text mining has important applications in the area of data mining and information retrieval. One of the important tasks in text mining is document clustering. Many existing document clustering techniques use the bag-of-words model to represent the content of a document. It is only effective for grouping related documents when these documents share a large proportion of lexically equivalent terms. The synonymy between related documents is ignored. It reduces the effectiveness of applications using a standard full-text document representation. This paper emphasis on the various techniques that are used to cluster the text documents based on keywords, phrases and concepts. It also includes the different performance measures that are used to evaluate the quality of clusters. Keywords: Document Clustering, Latent Semantic Indexing, Vector Space Model, tf-idf, precision, recall, F-measure.

read more

Citations
More filters
Journal ArticleDOI

Object relevance weight pattern mining for activity recognition and segmentation

TL;DR: This paper develops a KeyExtract algorithm for activity recognition and two algorithms, MaxGap and MaxGain, for activity segmentation with linear time complexities and results indicate that the proposed algorithms achieve high accuracy in the presence of different noise levels indicating their good potential in real-world deployment.
Journal ArticleDOI

An unsupervised approach to activity recognition and segmentation based on object-use fingerprints

TL;DR: This paper shows how to build activity models based on object-use fingerprints, which are sets of contrast patterns describing significant differences of object use between any two activity classes, and proposes a fingerprint-based algorithm to recognize activities.
Proceedings ArticleDOI

An Empirical Comparison of Four Text Mining Methods

TL;DR: This paper sheds light on the theory that underlies text mining methods and provides guidance for researchers who seek to apply these methods.
Journal ArticleDOI

L1-Norm Distance Minimization-Based Fast Robust Twin Support Vector $k$ -Plane Clustering

TL;DR: This brief develops a new k-plane clustering method called L1-norm distance minimization-based robust TWSVC by using robust L1 -norm distance and proposes a novel iterative algorithm to achieve this objective.
Proceedings ArticleDOI

Analysis and evaluation of unstructured data: text mining versus natural language processing

TL;DR: Comparison and evaluation the similarities and differences between text mining and natural language processing for extraction useful information via suitable themselves methods are evaluated.
References
More filters
Journal ArticleDOI

Survey of clustering algorithms

TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.

Similarity Measures for Text Document Clustering

TL;DR: A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, cosine similarity, and relative entropy, and a comparison of these measures in partitional clustering for text document datasets is compared and analyzed.
Journal ArticleDOI

Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

TL;DR: This paper evaluates the performance of different criterion functions in the context of partitional clustering algorithms for document datasets, and shows that there are a set of criterion functions that consistently outperform the rest.
Journal ArticleDOI

Evaluation of an inference network-based retrieval model

TL;DR: Network representations show promise as mechanisms for inferring probable relationships between documents and queries and have been used in information retrieval since at least the early 1960s.
Journal ArticleDOI

A Survey of Text Mining Techniques and Applications

TL;DR: In this paper, a Survey of Text Mining techniques and applications have been presented.
Related Papers (5)