Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•

UMLS knowledge for biomedical language processing.

[...]

Alexa T. McCray¹, Alan R. Aronson, Allen C. Browne, T C Rindflesch, Razi A, Suresh Srinivasan - Show less +2 more•Institutions (1)

National Institutes of Health¹

01 Apr 1993-Bulletin of The Medical Library Association

TL;DR: The focus of the effort is the development of SPECIALIST, an experimental natural language processing system for the biomedical domain that includes a broad coverage parser supported by a large lexicon, modules that provide access to the extensive Unified Medical Language System Knowledge Sources, and a retrieval module that permits experiments in information retrieval.

...read moreread less

Abstract: This paper describes efforts to provide access to the free text in biomedical databases. The focus of the effort is the development of SPECIALIST, an experimental natural language processing system for the biomedical domain. The system includes a broad coverage parser supported by a large lexicon, modules that provide access to the extensive Unified Medical Language System (UMLS) Knowledge Sources, and a retrieval module that permits experiments in information retrieval. The UMLS Metathesaurus and Semantic Network provide a rich source of biomedical concepts and their interrelationships. Investigations have been conducted to determine the type of information required to effect a map between the language of queries and the language of relevant documents. Mappings are never straightforward and often involve multiple inferences.

...read moreread less

91 citations

Book Chapter•DOI•

The impact of named entity normalization on information retrieval for question answering

[...]

Khalid, Valentin Jijkoun, M. de Rijke

01 Jan 2008-Lecture Notes in Computer Science

TL;DR: The authors evaluate two entity normalization methods based on Wikipedia in the context of both passage and document retrieval for question anwering and find that even a simple normalization method leads to improvements of early precision, both for document and passage retrieval.

...read moreread less

Abstract: In the named entity normalization task, a system identifies a canonical unambiguous referent for names like Bush or Alabama. Resolving synonymy and ambiguity of such names can benefit end-to-end information access tasks. We evaluate two entity normalization methods based on Wikipedia in the context of both passage and document retrieval for question anwering. We find that even a simple normalization method leads to improvements of early precision, both for document and passage retrieval. Moreover, better normalization results in better retrieval performance.

...read moreread less

91 citations

Journal Article•DOI•

Signature Detection and Matching for Document Image Retrieval

[...]

Guangyu Zhu¹, Yefeng Zheng², David Doermann¹, Stefan Jaeger•Institutions (2)

University of Maryland, College Park¹, Princeton University²

01 Nov 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes a novel multiscale approach to jointly detecting and segmenting signatures from document images, and quantitatively studies state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval.

...read moreread less

Abstract: As one of the most pervasive methods of individual identification and document authentication, signatures present convincing evidence and provide an important form of indexing for effective document image processing and retrieval in a broad range of applications. However, detection and segmentation of free-form objects such as signatures from clustered background is currently an open document analysis problem. In this paper, we focus on two fundamental problems in signature-based document image retrieval. First, we propose a novel multiscale approach to jointly detecting and segmenting signatures from document images. Rather than focusing on local features that typically have large variations, our approach captures the structural saliency using a signature production model and computes the dynamic curvature of 2D contour fragments over multiple scales. This detection framework is general and computationally tractable. Second, we treat the problem of signature retrieval in the unconstrained setting of translation, scale, and rotation invariant nonrigid shape matching. We propose two novel measures of shape dissimilarity based on anisotropic scaling and registration residual error and present a supervised learning framework for combining complementary shape information from different dissimilarity metrics using LDA. We quantitatively study state-of-the-art shape representations, shape matching algorithms, measures of dissimilarity, and the use of multiple instances as query in document image retrieval. We further demonstrate our matching techniques in offline signature verification. Extensive experiments using large real-world collections of English and Arabic machine-printed and handwritten documents demonstrate the excellent performance of our approaches.

...read moreread less

89 citations

Proceedings Article•DOI•

Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of Data

[...]

Nikita Zhiltsov¹, Alexander Kotov², Fedor Nikolaev²•Institutions (2)

Kazan Federal University¹, Wayne State University²

09 Aug 2015

TL;DR: A novel retrieval model that incorporates term dependencies into structured document retrieval and applies it to the task of ERWD is proposed and experiments indicate significant improvement of the accuracy of retrieval results by the proposed model over state-of-the-art retrieval models for ERWD.

...read moreread less

Abstract: Previously proposed approaches to ad-hoc entity retrieval in the Web of Data (ERWD) used multi-fielded representation of entities and relied on standard unigram bag-of-words retrieval models. Although retrieval models incorporating term dependencies have been shown to be significantly more effective than the unigram bag-of-words ones for ad hoc document retrieval, it is not known whether accounting for term dependencies can improve retrieval from the Web of Data. In this work, we propose a novel retrieval model that incorporates term dependencies into structured document retrieval and apply it to the task of ERWD. In the proposed model, the document field weights and the relative importance of unigrams and bigrams are optimized with respect to the target retrieval metric using a learning-to-rank method. Experiments on a publicly available benchmark indicate significant improvement of the accuracy of retrieval results by the proposed model over state-of-the-art retrieval models for ERWD.

...read moreread less

89 citations

Journal Article•DOI•

Cognitive models in information retrieval—an evaluative review

[...]

P J Daniels¹•Institutions (1)

Northampton Community College¹

04 Dec 1986-Journal of Documentation

TL;DR: Cognitive modelling work in the area of cognitive modelling is reviewed, with particular attention paid to user models (that is, the model held by a system of a user).

...read moreread less

Abstract: Selected current and recent work in the area of cognitive modelling is reviewed. Particular attention is paid to user models (that is, the model held by a system of a user). The relevance of this work to information retrieval is assessed and some attempts to include user models in IR systems are discussed. Implications are drawn for future work in IR.

...read moreread less

89 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics