Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Guided Transformer: Leveraging Multiple External Sources for Representation Learning in Conversational Search

[...]

Helia Hashemi¹, Hamed Zamani², W. Bruce Croft¹•Institutions (2)

University of Massachusetts Amherst¹, Microsoft²

25 Jul 2020

TL;DR: This paper enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources that weights each term in the conversation to implement the proposed representation learning model for two downstream tasks in conversational search; document retrieval and next clarifying question selection.

...read moreread less

Abstract: Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user responses to clarifying questions has been relatively less explored. In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources that weights each term in the conversation. We evaluate this Guided Transformer model in a conversational search scenario that includes clarifying questions. In our experiments, we use two separate external sources, including the top retrieved documents and a set of different possible clarifying questions for the query. We implement the proposed representation learning model for two downstream tasks in conversational search; document retrieval and next clarifying question selection. Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.

...read moreread less

45 citations

An object-oriented architecture for text retrieval.

[...]

Douglas R. Cutting, Jan O. Pedersen, Per Christian Halvorsen

01 Jan 1991

TL;DR: A software implementation architecture for text retrieval systems that facilitates functional modularization, a mix-and-match combination of module implementations and a deenition of inter-module protocols is presented.

...read moreread less

Abstract: For almost all aspects of information access systems it is still the case that their optimal composition and functionality is hotly debated. Moreover, diierent application scenarios put diierent demands on individual components. It is therefore of the essence to be able to quickly build systems that permit exploration of diierent designs and implementation strategies. This paper presents a software implementation architecture for text retrieval systems that facilitates (a) functional modularization (b) mix-and-match combination of module implementations and (c) deenition of inter-module protocols. We show how an object-oriented approach easily accommodates this type of architecture. The design principles are exempliied by code examples in Common Lisp. Taken together these code examples constitute an operational retrieval system. The design principles and protocols implemented have also been instantiated in a large scale retrieval prototype in our research laboratory.

...read moreread less

45 citations

Proceedings Article•

A Cross-Language Approach to Historic Document Retrieval

[...]

Jaap Kamps, Marijn Koolen, Frans Adriaans, Maarten de Rijke

01 Jan 2007

TL;DR: HDR is defined as the retrieval of relevant historic documents for a modern query to treat the historic and modern languages as different languages, and use cross-language information retrieval (CLIR) techniques to translate one language into the other.

...read moreread less

Abstract: Our cultural heritage, as preserved in libraries, archives and museums, is made up of documents written many centuries ago. Large-scale digitization initiatives, like DigiCULT, make these documents available to non-expert users through digital libraries and vertical search engines. For a user, querying a historic document collection may be a disappointing experience. Natural languages evolve over time, changing in pronunciation and spelling, and new words are introduced continuously, while older words may disappear out of everyday use. For these reasons, queries involving modern words may not be very effective for retrieving documents that contain many historic terms. Although reading a 300-year-old document might not be problematic because the words are still recognizable, the changes in vocabulary and spelling can make it difficult to use a search engine to find relevant documents. To illustrate this, consider the following example from our collection of 17th century Dutch law texts. Looking for information on the tasks of a lawyer (modern Dutch: {it advocaat}) in these texts, the modern spelling will not lead you to documents containing the 17th century Dutch spelling variant {it advocaet}. Since spelling rules were not introduced until the 19th century, 17th century Dutch spelling is inconsistent. Being based mainly on pronunciation, words were often spelled in several different variants, which poses a problem for standard retrieval engines. We therefore define Historic Document Retrieval (HDR) as the retrieval of relevant historic documents for a modern query. Our approach to this problem is to treat the historic and modern languages as different languages, and use cross-language information retrieval (CLIR) techniques to translate one language into the other.

...read moreread less

45 citations

Proceedings Article•

Semi-supervised ensemble ranking

[...]

Steven C. H. Hoi¹, Rong Jin²•Institutions (2)

Nanyang Technological University¹, Michigan State University²

13 Jul 2008

TL;DR: The proposed SSER algorithm is formulated as an SVM-like quadratic program (QP), and therefore can be solved efficiently by taking advantage of optimization techniques that were widely used in existing SVM solvers.

...read moreread less

Abstract: Ranking plays a central role in many Web search and information retrieval applications. Ensemble ranking, sometimes called meta-search, aims to improve the retrieval performance by combining the outputs from multiple ranking algorithms. Many ensemble ranking approaches employ supervised learning techniques to learn appropriate weights for combining multiple rankers. The main shortcoming with these approaches is that the learned weights for ranking algorithms are query independent. This is suboptimal since a ranking algorithm could perform well for certain queries but poorly for others. In this paper, we propose a novel semi-supervised ensemble ranking (SSER) algorithm that learns query-dependent weights when combining multiple rankers in document retrieval. The proposed SSER algorithm is formulated as an SVM-like quadratic program (QP), and therefore can be solved efficiently by taking advantage of optimization techniques that were widely used in existing SVM solvers. We evaluated the proposed technique on a standard document retrieval testbed and observed encouraging results by comparing to a number of state-of-the-art techniques.

...read moreread less

45 citations

Book Chapter•DOI•

Improving Retrieval Effectiveness by Reranking Documents Based on Controlled Vocabulary

[...]

Jaap Kamps¹•Institutions (1)

University of Amsterdam¹

05 Apr 2004

TL;DR: This work explores a new feedback technique that reranks the set of initially retrieved documents based on the controlled vocabulary terms assigned to the documents, and significantly improves retrieval effectiveness in domain-specific collections.

...read moreread less

Abstract: There is a common availability of classification terms in online text collections and digital libraries, such as manually assigned keywords or key-phrases from a controlled vocabulary in scientific collections. Our goal is to explore the use of additional classification information for improving retrieval effectiveness. Earlier research explored the effect of adding classification terms to user queries, leading to little or no improvement. We explore a new feedback technique that reranks the set of initially retrieved documents based on the controlled vocabulary terms assigned to the documents. Since we do not want to rely on the availability of special dictionaries or thesauri, we compute the meaning of controlled vocabulary terms based on their occurrence in the collection. Our reranking strategy significantly improves retrieval effectiveness in domain-specific collections. Experimental evaluation is done on the German GIRT and French Amaryllis collections, using the test-suite of the Cross-Language Evaluation Forum (CLEF).

...read moreread less

45 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics