scispace - formally typeset
Journal ArticleDOI

Document representation in probabilistic models of information retrieval

TLDR
This article describes how retrieval models which use either independence or dependence assumptions can be extended to include document representatives containing term significance weights and indicates that search strategies based on models modified in this way can further improve the effectiveness of document retrieval systems.
Abstract
Probabilistic models of retrieval have provided insights into the document retrieval process and contain the basis for very effective search strategies. A major limitation of these models is that they assume that documents are represented by binary index terms. In many cases the index terms will be assigned weights, such as within-document frequency weights, which are derived from the content of the documents by the indexing process. These weights, which are referred to here as term significance weights, indicate the relative importance of the terms in individual documents. This article describes how retrieval models which use either independence or dependence assumptions can be extended to include document representatives containing term significance weights. Comparison with other research indicates that search strategies based on models modified in this way can further improve the effectiveness of document retrieval systems.

read more

Citations
More filters
Journal ArticleDOI

The probability ranking principle in IR

TL;DR: In this article, it is shown that the principle can be justified under certain assumptions, but that in cases where these assumptions do not hold, the principle is not valid, and the major problem appears to lie in the way the principle considers each document independently of the rest.
Journal ArticleDOI

Query evaluation: strategies and optimizations

TL;DR: Several optimization techniques that can be used to reduce evaluation costs and simulation results are presented to compare the performance of these optimization techniques when evaluating natural language queries with a collection of full text legal materials.
Book

Statistical Language Models for Information Retrieval

TL;DR: A great deal of recent work has shown that statistical language models not only achieve superior empirical performance, but also facilitate parameter tuning and provide a more principled way for modeling various kinds of complex and non-traditional retrieval problems.
Journal ArticleDOI

A probabilistic learning approach for document indexing

TL;DR: A method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries based on three new concepts, which allows the integration of new text analysis and knowledge-based methods in this approach as well as the consideration of document structures or different types of terms.
Journal ArticleDOI

Probabilistic and genetic algorithms in document retrieval

TL;DR: Competing document descriptions are associated with a document and altered over time by a genetic algorithm according to the queries used and relevance judgments made during retrieval.
References
More filters
Journal ArticleDOI

A statistical interpretation of term specificity and its application in retrieval

TL;DR: It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms.
Journal ArticleDOI

The automatic creation of literature abstracts

TL;DR: In the exploratory research described, the complete text of an article in machine-readable form is scanned by an IBM 704 data-processing machine and analyzed in accordance with a standard program.
Book

Relevance weighting of search terms

TL;DR: This paper examines statistical techniques for exploiting relevance information to weight search terms using information about the distribution of index terms in documents in general and shows that specific weighted search methods are implied by a general probabilistic theory of retrieval.
Journal ArticleDOI

Relevance weighting of search terms

TL;DR: In this article, a series of relevance weighting functions is derived and is justified by theoretical considerations, in particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval.
Journal ArticleDOI

The probability ranking principle in IR

TL;DR: In this article, it is shown that the principle can be justified under certain assumptions, but that in cases where these assumptions do not hold, the principle is not valid, and the major problem appears to lie in the way the principle considers each document independently of the rest.
Related Papers (5)