scispace - formally typeset
Open AccessProceedings Article

Okapi at TREC

Reads0
Chats0
TLDR
Much of the work involved investigating plausible methods of applying Okapi-style weighting to phrases, and expansion using terms from the top documents retrieved by a pilot search on topic terms was used.
Abstract
City submitted two runs each for the automatic ad hoc, very large collection track, automatic routing and Chinese track; and took part in the interactive and filtering tracks. The method used was : expansion using terms from the top documents retrieved by a pilot search on topic terms. Additional runs seem to show that we would have done better without expansion. Twor runs using the method of city96al were also submitted for the Very Large Collection track. The training database and its relevant documents were partitioned into three parts. Working on a pool of terms extracted from the relevant documents for one partition, an iterative procedure added or removed terms and/or varied their weights. After each change in query content or term weights a score was calculated by using the current query to search a second protion of the training database and evaluating the results against the corresponding set of relevant documents. Methods were compared by evaluating queries predictively against the third training partition. Queries from different methods were then merged and the results evaluated in the same way. Two runs were submitted, one based on character searching and the other on words or phrases. Much of the work involved investigating plausible methods of applying Okapi-style weighting to phrases

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Learning to Rank for Information Retrieval

TL;DR: Three major approaches to learning to rank are introduced, i.e., the pointwise, pairwise, and listwise approaches, the relationship between the loss functions used in these approaches and the widely-used IR evaluation measures are analyzed, and the performance of these approaches on the LETOR benchmark datasets is evaluated.
Book

The Probabilistic Relevance Framework

TL;DR: This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F.
Proceedings Article

From Word Embeddings To Document Distances

TL;DR: It is demonstrated on eight real world document classification data sets, in comparison with seven state-of-the-art baselines, that the Word Mover's Distance metric leads to unprecedented low k-nearest neighbor document classification error rates.
Journal ArticleDOI

Relevance-Based Language Models

TL;DR: This work proposes a novel technique for estimating a relevance model with no training data and demonstrates that it can produce highly accurate relevance models, addressing important notions of synonymy and polysemy.
Journal ArticleDOI

A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval

TL;DR: This paper examines the sensitivity of retrieval performance to the smoothing parameters and compares several popular smoothing methods on different test collection.
References
More filters
Book

Relevance weighting of search terms

TL;DR: This paper examines statistical techniques for exploiting relevance information to weight search terms using information about the distribution of index terms in documents in general and shows that specific weighted search methods are implied by a general probabilistic theory of retrieval.
Journal ArticleDOI

Relevance weighting of search terms

TL;DR: In this article, a series of relevance weighting functions is derived and is justified by theoretical considerations, in particular, it is shown that specific weighted search methods are implied by a general probabilistic theory of retrieval.
Proceedings ArticleDOI

Probabilistic models of indexing and searching

TL;DR: There is a considerable body of related work by Salton, Yu and associates on automatic indexing using within-document frequencies of terms.
Journal ArticleDOI

The use of term position devices in ranked output experiments

TL;DR: The use of term proximity devices is proposed here by analogy with Boolean techniques and seven algorithms are devised to incorporate the ideas of sentence matching, proximate terms, term order specification and term distance computations.
Journal ArticleDOI

An evaluation of automatic query expansion in an online library catalogue

TL;DR: An automatic query expansion (AQE) facility in an online catalogue was evaluated in an operational library setting and found that contrary to previous results, AQE was beneficial in a substantial number of searches.