scispace - formally typeset
Journal ArticleDOI

A statistical interpretation of term specificity and its application in retrieval

Karen Sparck Jones
- 01 Jan 1972 - 
- Vol. 60, Iss: 1, pp 493-502
TLDR
It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms.
Abstract
The exhaustivity of document descriptions and the specificity of index terms are usually regarded as independent. It is suggested that specificity should be interpreted statistically, as a function of term use rather than of term meaning. The effects on retrieval of variations in term specificity are examined, experiments with three test collections showing in particular that frequently‐occurring terms are required for good overall performance. It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms. Results for the test collections show that considerable improvements in performance are obtained with this very simple procedure.

read more

Citations
More filters
Journal ArticleDOI

Term Weighting Approaches in Automatic Text Retrieval

TL;DR: This paper summarizes the insights gained in automatic term weighting, and provides baseline single term indexing models with which other more elaborate content analysis procedures can be compared.
Journal ArticleDOI

A vector space model for automatic indexing

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
BookDOI

Semi-Supervised Learning

TL;DR: Semi-supervised learning (SSL) as discussed by the authors is the middle ground between supervised learning (in which all training examples are labeled) and unsupervised training (where no label data are given).
Journal ArticleDOI

A survey of collaborative filtering techniques

TL;DR: From basic techniques to the state-of-the-art, this paper attempts to present a comprehensive survey for CF techniques, which can be served as a roadmap for research and practice in this area.
Proceedings Article

Character-level convolutional networks for text classification

TL;DR: In this paper, the use of character-level convolutional networks (ConvNets) for text classification has been explored and compared with traditional models such as bag of words, n-grams and their TFIDF variants.
References
More filters
Book

Automatic information organization and retrieval

Gerard Salton
TL;DR: A new book enPDFd automatic information organization and retrieval that can be a new way to explore the knowledge and get one thing to always remember in every reading time, even step by step is shown.
Journal ArticleDOI

Computer Evaluation of Indexing and Text Processing

TL;DR: Automatic indexing methods are evaluated and design criteria for modern information systems are derived in this paper, where the authors propose a set of criteria for information systems that can be used for indexing.
Book

Information retrieval systems; characteristics, testing, and evaluation

TL;DR: Information retrieval systems: characteristics, testing, and evaluation, Information retrieval systems : characteristics, test- and evaluation.
Related Papers (5)