Open AccessJournal Article
A Survey on Text Mining in Clustering
TLDR
This paper emphasis on the various techniques that are used to cluster the text documents based on keywords, phrases and concepts, and includes the different performance measures that were used to evaluate the quality of clusters.Abstract:
Text mining has important applications in the area of data mining and information retrieval. One of the important tasks in text mining is document clustering. Many existing document clustering techniques use the bag-of-words model to represent the content of a document. It is only effective for grouping related documents when these documents share a large proportion of lexically equivalent terms. The synonymy between related documents is ignored. It reduces the effectiveness of applications using a standard full-text document representation. This paper emphasis on the various techniques that are used to cluster the text documents based on keywords, phrases and concepts. It also includes the different performance measures that are used to evaluate the quality of clusters. Keywords: Document Clustering, Latent Semantic Indexing, Vector Space Model, tf-idf, precision, recall, F-measure.read more
Citations
More filters
Journal ArticleDOI
Object relevance weight pattern mining for activity recognition and segmentation
TL;DR: This paper develops a KeyExtract algorithm for activity recognition and two algorithms, MaxGap and MaxGain, for activity segmentation with linear time complexities and results indicate that the proposed algorithms achieve high accuracy in the presence of different noise levels indicating their good potential in real-world deployment.
Journal ArticleDOI
An unsupervised approach to activity recognition and segmentation based on object-use fingerprints
TL;DR: This paper shows how to build activity models based on object-use fingerprints, which are sets of contrast patterns describing significant differences of object use between any two activity classes, and proposes a fingerprint-based algorithm to recognize activities.
Proceedings ArticleDOI
An Empirical Comparison of Four Text Mining Methods
TL;DR: This paper sheds light on the theory that underlies text mining methods and provides guidance for researchers who seek to apply these methods.
Journal ArticleDOI
L1-Norm Distance Minimization-Based Fast Robust Twin Support Vector $k$ -Plane Clustering
TL;DR: This brief develops a new k-plane clustering method called L1-norm distance minimization-based robust TWSVC by using robust L1 -norm distance and proposes a novel iterative algorithm to achieve this objective.
Proceedings ArticleDOI
Analysis and evaluation of unstructured data: text mining versus natural language processing
TL;DR: Comparison and evaluation the similarities and differences between text mining and natural language processing for extraction useful information via suitable themselves methods are evaluated.
References
More filters
Journal ArticleDOI
Survey of clustering algorithms
Rui Xu,Donald C. Wunsch +1 more
TL;DR: Clustering algorithms for data sets appearing in statistics, computer science, and machine learning are surveyed, and their applications in some benchmark data sets, the traveling salesman problem, and bioinformatics, a new field attracting intensive efforts are illustrated.
Similarity Measures for Text Document Clustering
TL;DR: A wide variety of distance functions and similarity measures have been used for clustering, such as squared Euclidean distance, cosine similarity, and relative entropy, and a comparison of these measures in partitional clustering for text document datasets is compared and analyzed.
Journal ArticleDOI
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering
Ying Zhao,George Karypis +1 more
TL;DR: This paper evaluates the performance of different criterion functions in the context of partitional clustering algorithms for document datasets, and shows that there are a set of criterion functions that consistently outperform the rest.
Journal ArticleDOI
Evaluation of an inference network-based retrieval model
Howard R. Turtle,W. Bruce Croft +1 more
TL;DR: Network representations show promise as mechanisms for inferring probable relationships between documents and queries and have been used in information retrieval since at least the early 1960s.
Journal ArticleDOI
A Survey of Text Mining Techniques and Applications
TL;DR: In this paper, a Survey of Text Mining techniques and applications have been presented.
Related Papers (5)
Improving information retrieval using document clusters and semantic synonym extraction
G. Bharathi,D. Venkatesan +1 more