Topic
Document retrieval
About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.
Papers published on a yearly basis
Papers
More filters
•
01 Oct 1996
TL;DR: The 1994 NIST Text Retrieval Conference as discussed by the authors was co-sponsored by the National Inst. of Standards & Technology (NIST) and the Advanced Research Projects Agency (ARPA).
Abstract: From the Publisher:
Held in Gaithersburg, MD, August November 2-4, 1994. The conference was co-sponsored by the National Inst. of Standards & Technology (NIST) & the Advanced Research Projects Agency (ARPA) & was attended by 150 people involved in the 32 participating groups. Evaluates new technologies in text retrieval. Includes 34 papers: indexing structures, fragmentation schemes, probabilistic retrieval, latent semantic indexing, interactive document retrieval, & much more. Numerous graphs, tables & charts.
76 citations
••
01 Jul 2000TL;DR: In this article, the effects of out-of-vocabulary (OOV) items in spoken document retrieval (SDR) were investigated and the use of a parallel corpus for query and document expansion was found to be especially beneficial.
Abstract: The effects of out-of-vocabulary (OOV) items in spoken document retrieval (SDR) are investigated. Several sets of transcriptions were created for the TREC-8 SDR task using a speech recognition system varying the vocabulary sizes and OOV rates, and the relative retrieval performance measured. The effects of OOV terms on a simple baseline IR system and on more sophisticated retrieval systems are described. The use of a parallel corpus for query and document expansion is found to be especially beneficial, and with this data set, good retrieval performance can be achieved even for fairly high OOV rates.
76 citations
••
TL;DR: This survey paper presents a critical study of different document layout analysis techniques and discusses comprehensively the different phases of the DLA algorithms based on a general framework that is formed as an outcome of reviewing the research in the field.
Abstract: Document layout analysis (DLA) is a preprocessing step of document understanding systems. It is responsible for detecting and annotating the physical structure of documents. DLA has several important applications such as document retrieval, content categorization, text recognition, and the like. The objective of DLA is to ease the subsequent analysis/recognition phases by identifying the document-homogeneous blocks and by determining their relationships. The DLA pipeline consists of several phases that could vary among DLA methods, depending on the documents’ layouts and final analysis objectives. In this regard, a universal DLA algorithm that fits all types of document-layouts or that satisfies all analysis objectives has not been developed, yet. In this survey paper, we present a critical study of different document layout analysis techniques. The study highlights the motivational reasons for pursuing DLA and discusses comprehensively the different phases of the DLA algorithms based on a general framework that is formed as an outcome of reviewing the research in the field. The DLA framework consists of preprocessing, layout analysis strategies, post-processing, and performance evaluation phases. Overall, the article delivers an essential baseline for pursuing further research in document layout analysis.
76 citations
••
27 Jun 2008
TL;DR: This thesis presents a number of task-specific search solutions and tries to set them into more generic frameworks, taking a look at the three areas (1) context adaptivity of search, (2) efficient XML retrieval, and (3) entity ranking.
Abstract: Text retrieval is an active area of research since decades Several issues have
been studied over the entire period, like the development of statistical models
for the estimation of relevance, or the challenge to keep retrieval tasks efficient with ever growing text collections Especially in the last decade, we have also seen a diversification of retrieval tasks Passage or XML retrieval systems allow a more focused search Question answering or expert search systems
do not even return a ranked list of text units, but for instance persons with expertise on a given topic The sketched situation forms the starting point of this thesis, which presents a number of task-specific search solutions and tries to set them into more generic frameworks In particular, we take a look at the three areas (1) context adaptivity of search, (2) efficient XML retrieval, and (3) entity ranking
In the first case, we show how different types of context information can
be incorporated in the retrieval of documents When users are searching for
information, the search task is typically part of a wider working process This
search context, however, is often not reflected by the few search keywords
stated to the retrieval system, though it can contain valuable information for
query refinement We address with this work two research questions related
to the aim of developing context-aware retrieval systems First, we show
how already available information about the user’s context can be employed
effectively to gain highly precise search results Second, we investigate how
such meta-data about the search context can be gathered The proposed
“query profiles” have a central role in the query refinement process They
automatically detect necessary context information and help the user to explicitly
express context-dependent search constraints The effectiveness of
the approach is tested with retrieval experiments on newspaper data
When documents are not regarded as a simple sequence of words, but their content is structured in a machine readable form, it is attractive to
try to develop retrieval systems that make use of the additional structure
information Structured retrieval first asks for the design of a suitable language
that enables the user to express queries on content and structure We
investigate here existing query languages, whether and how they support
the basic needs of structured querying However, our main focus lies on the
efficiency of structured retrieval systems Conventional inverted indices for
document retrieval systems are not suitable for maintaining structure indices
We identify base operations involved in the execution of structured queries
and show how they can be supported by new indices and algorithms on a
database system Efficient query processing has to be concerned with the
optimization of query plans as well We investigate low-level query plans of
physical database operators for the execution of simple query patterns Furthermore,
It is demonstrated how complex queries benefit from higher level
query optimization
New search tasks and interfaces for the presentation of search results,
like faceted search applications, question answering, expert search, and automatic
timeline construction, come with the need to rank entities instead of
documents By entities we mean unique (named) existences, such as persons,
organizations or dates Modern language processing tools are able to automatically
detect and categorize named entities in large text collections In
order to estimate their relevance to a given search topic, we develop retrieval
models for entities which are based on the relevance of texts that mention the
entity A graph-based relevance propagation framework is introduced for this
purpose that enables to derive the relevance of entities Several options for
the modeling of entity containment graphs and different relevance propagation
approaches are tested, demonstrating the usefulness of the graph-based
ranking framework
76 citations