Topic
Document retrieval
About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.
Papers published on a yearly basis
Papers
More filters
••
25 Jul 2020TL;DR: This paper enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources that weights each term in the conversation to implement the proposed representation learning model for two downstream tasks in conversational search; document retrieval and next clarifying question selection.
Abstract: Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, especially conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying questions have been studied recently but the accurate utilization of user responses to clarifying questions has been relatively less explored. In this paper, we enrich the representations learned by Transformer networks using a novel attention mechanism from external information sources that weights each term in the conversation. We evaluate this Guided Transformer model in a conversational search scenario that includes clarifying questions. In our experiments, we use two separate external sources, including the top retrieved documents and a set of different possible clarifying questions for the query. We implement the proposed representation learning model for two downstream tasks in conversational search; document retrieval and next clarifying question selection. Our experiments use a public dataset for search clarification and demonstrate significant improvements compared to competitive baselines.
45 citations
01 Jan 1991
TL;DR: A software implementation architecture for text retrieval systems that facilitates functional modularization, a mix-and-match combination of module implementations and a deenition of inter-module protocols is presented.
Abstract: For almost all aspects of information access systems it is still the case that their optimal composition and functionality is hotly debated. Moreover, diierent application scenarios put diierent demands on individual components. It is therefore of the essence to be able to quickly build systems that permit exploration of diierent designs and implementation strategies. This paper presents a software implementation architecture for text retrieval systems that facilitates (a) functional modularization (b) mix-and-match combination of module implementations and (c) deenition of inter-module protocols. We show how an object-oriented approach easily accommodates this type of architecture. The design principles are exempliied by code examples in Common Lisp. Taken together these code examples constitute an operational retrieval system. The design principles and protocols implemented have also been instantiated in a large scale retrieval prototype in our research laboratory.
45 citations
•
01 Jan 2007
TL;DR: HDR is defined as the retrieval of relevant historic documents for a modern query to treat the historic and modern languages as different languages, and use cross-language information retrieval (CLIR) techniques to translate one language into the other.
Abstract: Our cultural heritage, as preserved in libraries, archives and
museums, is made up of documents written many centuries ago.
Large-scale digitization initiatives, like DigiCULT, make these
documents available to non-expert users through digital
libraries and vertical search engines.
For a user, querying a historic document collection may be a
disappointing experience. Natural languages evolve over time, changing
in pronunciation and spelling, and new words are introduced
continuously, while older words may disappear out of everyday use. For
these reasons, queries involving modern words may not be very
effective for retrieving documents that contain many historic terms.
Although reading a 300-year-old document might not be problematic
because the words are still recognizable, the changes in vocabulary
and spelling can make it difficult to use a search engine to find
relevant documents. To illustrate this, consider the following example
from our collection of 17th century Dutch law texts. Looking for
information on the tasks of a lawyer (modern Dutch: {it advocaat}) in
these texts, the modern spelling will not lead you to documents
containing the 17th century Dutch spelling variant {it advocaet}.
Since spelling rules were not introduced until the 19th century, 17th
century Dutch spelling is inconsistent. Being based mainly on
pronunciation, words were often spelled in several different variants,
which poses a problem for standard retrieval engines.
We therefore define Historic Document Retrieval (HDR) as the retrieval
of relevant historic documents for a modern query. Our approach to
this problem is to treat the historic and modern languages as
different languages, and use cross-language information retrieval
(CLIR) techniques to translate one language into the other.
45 citations
•
13 Jul 2008TL;DR: The proposed SSER algorithm is formulated as an SVM-like quadratic program (QP), and therefore can be solved efficiently by taking advantage of optimization techniques that were widely used in existing SVM solvers.
Abstract: Ranking plays a central role in many Web search and information retrieval applications. Ensemble ranking, sometimes called meta-search, aims to improve the retrieval performance by combining the outputs from multiple ranking algorithms. Many ensemble ranking approaches employ supervised learning techniques to learn appropriate weights for combining multiple rankers. The main shortcoming with these approaches is that the learned weights for ranking algorithms are query independent. This is suboptimal since a ranking algorithm could perform well for certain queries but poorly for others. In this paper, we propose a novel semi-supervised ensemble ranking (SSER) algorithm that learns query-dependent weights when combining multiple rankers in document retrieval. The proposed SSER algorithm is formulated as an SVM-like quadratic program (QP), and therefore can be solved efficiently by taking advantage of optimization techniques that were widely used in existing SVM solvers. We evaluated the proposed technique on a standard document retrieval testbed and observed encouraging results by comparing to a number of state-of-the-art techniques.
45 citations
••
05 Apr 2004TL;DR: This work explores a new feedback technique that reranks the set of initially retrieved documents based on the controlled vocabulary terms assigned to the documents, and significantly improves retrieval effectiveness in domain-specific collections.
Abstract: There is a common availability of classification terms in online text collections and digital libraries, such as manually assigned keywords or key-phrases from a controlled vocabulary in scientific collections. Our goal is to explore the use of additional classification information for improving retrieval effectiveness. Earlier research explored the effect of adding classification terms to user queries, leading to little or no improvement. We explore a new feedback technique that reranks the set of initially retrieved documents based on the controlled vocabulary terms assigned to the documents. Since we do not want to rely on the availability of special dictionaries or thesauri, we compute the meaning of controlled vocabulary terms based on their occurrence in the collection. Our reranking strategy significantly improves retrieval effectiveness in domain-specific collections. Experimental evaluation is done on the German GIRT and French Amaryllis collections, using the test-suite of the Cross-Language Evaluation Forum (CLEF).
45 citations