scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Patent
IJsbrand Jan Aalbersberg1
11 Jan 1995
TL;DR: In this article, a user interface for a full-text document retrieval computerized system comprises a display with a words window in which each query word is displayed by means of a distinctive representation uniquely associated with each displayed word.
Abstract: A user interface for a full-text document retrieval computerized system comprises a display with a words window in which each query word is displayed by means of a distinctive representation uniquely associated with each displayed word. In a subsequent results window, each document header or title or representation is accompanied by an indicator which employs the same distinctive representation to directly indicate to the user the relative contributions of the individual query words to each listed document. In a preferred embodiment, the distinctive representation is integrated with an associated weight first indicator in a words window, and in the results window the distinctive representations are also integrated with an associated weight second indicator. The distinctive representation can take several forms, such as by a different color or by means of hatching or shading or by displayed icons.

158 citations

Proceedings Article
01 Jan 2003
TL;DR: The first year of TREC Genomics Track featured two tasks: ad hoc retrieval and information extraction, which centered around the Gene Reference into Function (GeneRIF) resource of the National Library of Medicine.
Abstract: The first year of TREC Genomics Track featured two tasks: ad hoc retrieval and information extraction. Both tasks centered around the Gene Reference into Function (GeneRIF) resource of the National Library of Medicine, which was used as both pseudorelevance judgments for ad hoc document retrieval as well as target text for information extraction. The track attracted 29 groups who participated in one or both tasks.

157 citations

Journal ArticleDOI
TL;DR: Development of the Envision database, system software, and protocol for client-server communication builds upon work to identify and represent “ objects” that will facilitate reuse and high-level communication of information from author to reader (user).
Abstract: Project Envision aims to build a “user-centered database from the computer science literature,” initially using the publications of the Association for Computing Machinery (ACM) Accordingly, we have interviewed potential users, as well as experts in library, information, and computer science—to understand their needs, to become aware of their perception of existing information systems, and to collect their recommendations Design and formative usability evaluation of our interface have been based on those interviews, leading to innovative query formulation and search results screens that work well according to our usability testing Our development of the Envision database, system software, and protocol for client-server communication builds upon work to identify and represent “objects” that will facilitate reuse and high-level communication of information from author to reader (user) All these efforts are leading not only to a usable prototype digital library but also to a set of nine principles for digital libraries, which we have tried to follow, covering issues of representation, architecture, and interfacing © 1993 John Wiley & Sons, Inc

157 citations

Proceedings Article
01 Jan 2003
TL;DR: NLP needs to be optimized for IR in order to be effective and document retrieval is not an ideal application for NLP, at least given the current state-of-the-art in NLP.
Abstract: Many Natural Language Processing (NLP) techniques have been used in Information Retrieval. The results are not encouraging. Simple methods (stopwording, porter-style stemming, etc.) usually yield significant improvements, while higher-level processing (chunking, parsing, word sense disambiguation, etc.) only yield very small improvements or even a decrease in accuracy. At the same time, higher-level methods increase the processing and storage cost dramatically. This makes them hard to use on large collections. We review NLP techniques and come to the conclusion that (a) NLP needs to be optimized for IR in order to be effective and (b) document retrieval is not an ideal application for NLP, at least given the current state-of-the-art in NLP. Other IR-related tasks, e.g., question answering and information extraction, seem to be better suited.

156 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111