scispace - formally typeset
Journal ArticleDOI

Recent trends in hierarchic document clustering: a critical review

Peter Willett
- 01 Aug 1988 - 
- Vol. 24, Iss: 5, pp 577-597
Reads0
Chats0
TLDR
Algorithms that can be used to allow the implementation of hierarchic agglomerative clustering methods for document retrieval, and experimental evidence suggests that nearest neighbor clusters provide a reasonably efficient and effective means of including interdocument similarity information in document retrieval systems.
Citations
More filters
Book

Foundations of Statistical Natural Language Processing

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Journal ArticleDOI

Machine learning in automated text categorization

TL;DR: This survey discusses the main approaches to text categorization that fall within the machine learning paradigm and discusses in detail issues pertaining to three different problems, namely, document representation, classifier construction, and classifier evaluation.
Patent

System for generation of user profiles for a system for customized electronic identification of desirable objects

TL;DR: In this article, the authors proposed a system that automatically constructs a target profile for each target object in the electronic media based on the frequency with which each word appears in an article relative to its overall frequency of use in all articles, as well as a "target profile interest summary" for each user.
Journal Article

Data Cleaning: Problems and Current Approaches.

TL;DR: This work classifies data quality problems that are addressed by data cleaning and provides an overview of the main solution approaches and discusses current tool support for data cleaning.
Journal ArticleDOI

Scatter/Gather: a cluster-based approach to browsing large document collections

TL;DR: A document browsing technique that employs docum-ent clustering as its primary operation is presented and a fast (linear time) clustering algorithm is presented that provides a powerful new access paradigm.
References
More filters
Book

Introduction to Modern Information Retrieval

TL;DR: Reading is a need and a hobby at once and this condition is the on that will make you feel that you must read.
Book

Cluster Analysis

TL;DR: This fourth edition of the highly successful Cluster Analysis represents a thorough revision of the third edition and covers new and developing areas such as classification likelihood and neural networks for clustering.
Book

Clustering Algorithms

Book

Relevance weighting of search terms

TL;DR: This paper examines statistical techniques for exploiting relevance information to weight search terms using information about the distribution of index terms in documents in general and shows that specific weighted search methods are implied by a general probabilistic theory of retrieval.
Journal ArticleDOI

Relevance weighting of search terms

TL;DR: In this article, a series of relevance weighting functions is derived and is justified by theoretical considerations, in particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval.
Related Papers (5)