A statistical interpretation of term specificity and its application in retrieval

doi:10.1108/EB026526

Journal ArticleDOI

A statistical interpretation of term specificity and its application in retrieval

Karen Sparck Jones

- 01 Jan 1972 -

Journal of Documentation

- Vol. 60, Iss: 1, pp 493-502

TLDR

It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms.

Abstract:

The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing in particular that frequently‐occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Term Weighting Approaches in Automatic Text Retrieval

Gerard Salton, +1 more

- 01 Aug 1988 -

Information Processing and Management

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.

...read moreread less

Journal ArticleDOI

A vector space model for automatic indexing

Gerard Salton, +2 more

- 01 Nov 1975 -

Communications of The ACM

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.

...read moreread less

BookDOI

Semi-Supervised Learning

Olivier Chapelle, +2 more

TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).

...read moreread less

Journal ArticleDOI

A survey of collaborative filtering techniques

Xiaoyuan Su, +1 more

- 01 Jan 2009 -

Advances in Artificial Intelligence

TL;DR: From basic techniques to the state-of-the-art, this paper attempts to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.

...read moreread less

Proceedings Article

Character-level convolutional networks for text classification

Xiang Zhang, +2 more

TL;DR: In this paper, the use of character-level convolutional networks (ConvNets) for text classification has been explored and compared with traditional models such as bag of words, n-grams and their TFIDF variants.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Automatic information organization and retrieval

Gerard Salton

TL;DR: A new book enPDFd automatic information organization and retrieval that can be a new way to explore the knowledge and get one thing to always remember in every reading time, even step by step is shown.

...read moreread less

Journal ArticleDOI

Computer Evaluation of Indexing and Text Processing

Gerard Salton, +1 more

- 01 Jan 1968 -

Journal of the ACM

TL;DR: Automatic indexing methods are evaluated and design criteria for modern information systems are derived in this paper, where the authors propose a set of criteria for information systems that can be used for indexing.

...read moreread less

Book