The Probabilistic Relevance Framework

Open AccessBook

The Probabilistic Relevance Framework

Chats0

TLDR

This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F.

Abstract:

The Probabilistic Relevance Framework (PRF) is a formal framework for document retrieval, grounded in work done in the 1970—1980s, which led to the development of one of the most successful text-retrieval algorithms, BM25. In recent years, research in the PRF has yielded new retrieval models capable of taking into account document meta-data (especially structure and link-graph information). Again, this has led to one of the most successful Web-search and corporate-search algorithms, BM25F. This work presents the PRF from a conceptual point of view, describing the probabilistic modelling assumptions behind the framework and the different ranking algorithms that result from its application: the binary independence model, relevance feedback models, BM25 and BM25F. It also discusses the relation between the PRF and other statistical models for IR, and covers some related topics, such as the use of non-textual features, and parameter optimisation for models with free parameters.

Citations

PDF

Open Access

More filters

Pattern Recognition and Machine Learning

Christopher M. Bishop

TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.

...read moreread less

Proceedings Article

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset.

Tri Nguyen, +6 more

TL;DR: MS MARCO as mentioned in this paper is a large scale dataset for reading comprehension and question answering, where all questions are sampled from real anonymized user queries and context passages from which answers in the dataset are derived from real web documents using the most advanced version of the Bing search engine.

...read moreread less

Posted Content

Dense Passage Retrieval for Open-Domain Question Answering

Vladimir Karpukhin, +7 more

- 10 Apr 2020 -

arXiv: Computation and Language

TL;DR: This work shows that retrieval can be practically implemented using dense representations alone, where embeddings are learned from a small number of questions and passages by a simple dual-encoder framework.

...read moreread less

Posted Content

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

Payal Bajaj, +14 more

- 28 Nov 2016 -

arXiv: Computation and Language

TL;DR: This new dataset is aimed to overcome a number of well-known weaknesses of previous publicly available datasets for the same task of reading comprehension and question answering, and is the most comprehensive real-world dataset of its kind in both quantity and quality.

...read moreread less

Proceedings ArticleDOI

Dense Passage Retrieval for Open-Domain Question Answering

Vladimir Karpukhin, +7 more

TL;DR: In this paper, a dual-encoder framework is proposed to learn dense representations from a small number of questions and passages by a simple dual encoder framework, which outperforms a strong Lucene-BM25 system greatly.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

An introduction to probability theory and its applications

William Feller

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Book

Pattern Recognition and Machine Learning

Christopher M. Bishop

TL;DR: Probability Distributions, linear models for Regression, Linear Models for Classification, Neural Networks, Graphical Models, Mixture Models and EM, Sampling Methods, Continuous Latent Variables, Sequential Data are studied.

...read moreread less