Journal ArticleDOI
Document representation in probabilistic models of information retrieval
TLDR
This article describes how retrieval models which use either independence or dependence assumptions can be extended to include document representatives containing term significance weights and indicates that search strategies based on models modified in this way can further improve the effectiveness of document retrieval systems.Abstract:
Probabilistic models of retrieval have provided insights into the document retrieval process and contain the basis for very effective search strategies. A major limitation of these models is that they assume that documents are represented by binary index terms. In many cases the index terms will be assigned weights, such as within-document frequency weights, which are derived from the content of the documents by the indexing process. These weights, which are referred to here as term significance weights, indicate the relative importance of the terms in individual documents. This article describes how retrieval models which use either independence or dependence assumptions can be extended to include document representatives containing term significance weights. Comparison with other research indicates that search strategies based on models modified in this way can further improve the effectiveness of document retrieval systems.read more
Citations
More filters
Journal ArticleDOI
The probability ranking principle in IR
TL;DR: In this article, it is shown that the principle can be justified under certain assumptions, but that in cases where these assumptions do not hold, the principle is not valid, and the major problem appears to lie in the way the principle considers each document independently of the rest.
Journal ArticleDOI
Query evaluation: strategies and optimizations
Howard R. Turtle,James Flood +1 more
TL;DR: Several optimization techniques that can be used to reduce evaluation costs and simulation results are presented to compare the performance of these optimization techniques when evaluating natural language queries with a collection of full text legal materials.
Book
Statistical Language Models for Information Retrieval
TL;DR: A great deal of recent work has shown that statistical language models not only achieve superior empirical performance, but also facilitate parameter tuning and provide a more principled way for modeling various kinds of complex and non-traditional retrieval problems.
Journal ArticleDOI
A probabilistic learning approach for document indexing
Norbert Fuhr,Chris Buckley +1 more
TL;DR: A method for probabilistic document indexing using relevance feedback data that has been collected from a set of queries based on three new concepts, which allows the integration of new text analysis and knowledge-based methods in this approach as well as the consideration of document structures or different types of terms.
Journal ArticleDOI
Probabilistic and genetic algorithms in document retrieval
TL;DR: Competing document descriptions are associated with a document and altered over time by a genetic algorithm according to the queries used and relevance judgments made during retrieval.
References
More filters
Journal ArticleDOI
A statistical interpretation of term specificity and its application in retrieval
TL;DR: It is argued that terms should be weighted according to collection frequency, so that matches on less frequent, more specific, terms are of greater value than matches on frequent terms.
Journal ArticleDOI
The automatic creation of literature abstracts
TL;DR: In the exploratory research described, the complete text of an article in machine-readable form is scanned by an IBM 704 data-processing machine and analyzed in accordance with a standard program.
Book
Relevance weighting of search terms
TL;DR: This paper examines statistical techniques for exploiting relevance information to weight search terms using information about the distribution of index terms in documents in general and shows that specific weighted search methods are implied by a general probabilistic theory of retrieval.
Journal ArticleDOI
Relevance weighting of search terms
TL;DR: In this article, a series of relevance weighting functions is derived and is justified by theoretical considerations, in particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval.
Journal ArticleDOI
The probability ranking principle in IR
TL;DR: In this article, it is shown that the principle can be justified under certain assumptions, but that in cases where these assumptions do not hold, the principle is not valid, and the major problem appears to lie in the way the principle considers each document independently of the rest.