Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Passage-level evidence in document retrieval

[...]

James P. Callan¹•Institutions (1)

University of Massachusetts Amherst¹

01 Aug 1994

TL;DR: The increasing lengths of documents in full-text collections encourages renewed interest in the ranking and retrieval of document passages, but questions about how passages are defined, how they can be ranked efficiently, and what is their proper role in long, structured documents are raised.

...read moreread less

Abstract: The increasing lengths of documents in full-text collections encourages renewed interest in the ranking and retrieval of document passages. Past research showed that evidence from passages can improve retrieval results, but it also raised questions about how passages are defined, how they can be ranked efficiently, and what is their proper role in long, structured documents.

...read moreread less

544 citations

Journal Article•DOI•

The use of hierarchic clustering in information retrieval

[...]

N. Jardine¹, C. J. Van Rijsbergen¹•Institutions (1)

University of Cambridge¹

01 Dec 1971-Information Storage and Retrieval

TL;DR: It is shown that cluster-based retrieval strategies can be devised which are as effective as linear associative retrieval strategies and much more efficient.

...read moreread less

535 citations

Using Language Models for Information Retrieval

[...]

Djoerd Hiemstra

19 Jan 2001

TL;DR: Experimental results on three standard tasks show that the language model-based algorithms work as well as, or better than, today's top-performing retrieval algorithms.

...read moreread less

Abstract: Because of the world wide web, information retrieval systems are now used by millions of untrained users all over the world. The search engines that perform the information retrieval tasks, often retrieve thousands of potentially interesting documents to a query. The documents should be ranked in decreasing order of relevance in order to be useful to the user. This book describes a mathematical model of information retrieval based on the use of statistical language models. The approach uses simple document-based unigram models to compute for each document the probability that it generates the query. This probability is used to rank the documents. The study makes the following research contributions. * The development of a model that integrates term weighting, relevance feedback and structured queries. * The development of a model that supports multiple representations of a request or information need by integrating a statistical translation model. * The development of a model that supports multiple representations of a document, for instance by allowing proximity searches or searches for terms from a particular record field (e.g. a search for terms from the title). * A mathematical interpretation of stop word removal and stemming. * A mathematical interpretation of operators for mandatory terms, wildcards and synonyms. * A practical comparison of a language model-based retrieval system with similar systems that are based on well-established models and term weighting algorithms in a controlled experiment. * The application of the model to cross-language information retrieval and adaptive information filtering, and the evaluation of two prototype systems in a controlled experiment. Experimental results on three standard tasks show that the language model-based algorithms work as well as, or better than, today's top-performing retrieval algorithms. The standard tasks investigated are ad-hoc retrieval (when there are no previously retrieved documents to guide the search), retrospective relevance weighting (find the optimum model for a given set of relevant documents), and ad-hoc retrieval using manually formulated Boolean queries. The application to cross-language retrieval and adaptive filtering shows the practical use of respectively structured queries, and relevance feedback.

...read moreread less

528 citations

Book•

Information Retrieval: Implementing and Evaluating Search Engines

[...]

Stefan Büttcher, Charles L. A. Clarke¹, Gordon V. Cormack•Institutions (1)

Massachusetts Institute of Technology¹

23 Jul 2010

TL;DR: Information Retrieval offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation, and is a valuable reference for professionals in computer science, computer engineering, and software engineering.

...read moreread less

Abstract: Information retrieval is the foundation for modern search engines. This text offers an introduction to the core topics underlying modern search technologies, including algorithms, data structures, indexing, retrieval, and evaluation. The emphasis is on implementation and experimentation; each chapter includes exercises and suggestions for student projects. Wumpusa multiuser open-source information retrieval system developed by one of the authors and available onlineprovides model implementations and a basis for student work. The modular structure of the book allows instructors to use it in a variety of graduate-level courses, including courses taught from a database systems perspective, traditional information retrieval courses with a focus on IR theory, and courses covering the basics of Web retrieval. After an introduction to the basics of information retrieval, the text covers three major topic areasindexing, retrieval, and evaluationin self-contained parts. The final part of the book draws on and extends the general material in the earlier parts, treating such specific applications as parallel search engines, Web search, and XML retrieval. End-of-chapter references point to further reading; exercises range from pencil and paper problems to substantial programming projects. In addition to its classroom use, Information Retrieval will be a valuable reference for professionals in computer science, computer engineering, and software engineering.

...read moreread less

523 citations

Proceedings Article•

Exponential Family Harmoniums with an Application to Information Retrieval

[...]

Max Welling¹, Michal Rosen-Zvi¹, Geoffrey E. Hinton²•Institutions (2)

University of California, Irvine¹, University of Toronto²

01 Dec 2004

TL;DR: An alternative two-layer model based on exponential family distributions and the semantics of undirected models is proposed, which performs well on document retrieval tasks and provides an elegant solution to searching with keywords.

...read moreread less

Abstract: Directed graphical models with one layer of observed random variables and one or more layers of hidden random variables have been the dominant modelling paradigm in many research fields. Although this approach has met with considerable success, the causal semantics of these models can make it difficult to infer the posterior distribution over the hidden variables. In this paper we propose an alternative two-layer model based on exponential family distributions and the semantics of undirected models. Inference in these "exponential family harmoniums" is fast while learning is performed by minimizing contrastive divergence. A member of this family is then studied as an alternative probabilistic model for latent semantic indexing. In experiments it is shown that they perform well on document retrieval tasks and provide an elegant solution to searching with keywords.

...read moreread less

520 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics