scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Proceedings ArticleDOI
Prabhakar Raghavan1
05 Jan 1997
TL;DR: An overview of some algorithmic problems arising in the representation of text/image/multimedia objects in a form amenable to automated searching, and in conducting these searches efficiently, are given.
Abstract: We give an overview of some algorithmic problems arising in the representation of text/image/multimedia objects in a form amenable to automated searching, and in conducting these searches efficiently. These operations are central to information retrieval and digital library systems. tional linguistics; (3) user interfaces and user models; (4) network and distributed retrieval issues including server/network performance and load balancing; (5) security, access control and rights management.

70 citations

Journal ArticleDOI
TL;DR: In this paper, a computerized intermediary system is proposed to facilitate online document retrieval from large-scale data bases directly by users of the retrieved information, which does not require the user to be knowledgeable or undergo any training in the use of the underlying retrieval system.
Abstract: This paper concerns the provision of a computerized intermediary system to facilitate online document retrieval from large-scale data bases directly by users of the retrieved information. The system does not require the user to be knowledgeable or undergo any training in the use of the underlying retrieval system. The scope for a novel intermediary system relating to recent developments in expert systems has been identified and a system entitled CANSEARCH designed to enable doctors to specify queries to retrieve cancer-therapy-related documents stored in the MEDLINE data base. The design of the intermediary system uses the principle of search space abstraction, employing menu selection from a touch terminal and encapsulating the necessary intermediary expertise using rule-based techniques programmed in PROLOG. CANSEARCH performed well enough to justify the approach taken, suggesting that further development of CANSEARCH and of intermediary systems for document retrieval in other subject areas should be undertaken.

70 citations

Book
01 Jan 1998
TL;DR: The Flow of Information in Information retrieval: Towards a General Framework for the Modelling of Information Retrieval M.J. van Rijsbergen and T. Huibers.
Abstract: List of Figures. List of Tables. Preface. Part I: Genesis. 1. A Non-Classical Logic for Information Retrieval C.J. van Rijsbergen. Part II: Logical Models. 2. Toward a Broader Logical Model for Information Retrieval Jian-Yun Nie, F. Lepage. 3. Experiences in Information Retrieval Modelling Using Structured Formalisms and Modal Logic J.-P. Chevallet, Y. Chiaramella. 4. Preferential Models of Query by Navigation P. Bruza, B. van Linder. 5. A Flexible Framework for Multimedia Information Retrieval A. Muller. 6. The Flow of Information in Information Retrieval: Towards a General Framework for the Modelling of Information Retrieval M. Lalmas. 7. Mirlog: A Logic for Multimedia Information Retrieval C. Meghini, et al. Part III: Uncertainty Models. 8. Semantic Information Retrieval G. Amati, K. van Rijsbergen. 9. Information Retrieval with Probabilistic Datalog T. Roelleke, N. Fuhr. 10. Logical Imaging and Probabilistic Information Retrieval F. Crestani. 11. Simplicity and Information Retrieval G. Amati, K. van Rijsbergen. Part IV: Meta-Models. 12. Towards an Axiomatic Aboutness Theory for Information Retrieval T. Huibers, B. Wondergem.

69 citations

Patent
03 Jun 2004
TL;DR: A question sentence input part of question-answering system inputs a question sentence presented in a natural language. as mentioned in this paper extracts a keyword from the question sentence and retrieves and extracts the document data including the keyword from a document database.
Abstract: A question sentence input part of question-answering system inputs a question sentence presented in a natural language. A document retrieval part of the system extracts a keyword from the question sentence and retrieves and extracts the document data including the keyword from a document database. An answer candidate extracting part of the system extracts a language presentation possibly becoming the answer as an answer candidate from the retrieved and extracted document data. An answer type determination part of the system determines an answer type of the answer candidate. An answer table output part of the system classifies the answer candidates by answer type and outputs an answer table listing all or part of the answer candidates having a predetermined evaluation or greater for each answer type in a table format.

69 citations

Proceedings ArticleDOI
31 Oct 2005
TL;DR: This paper describes a mechanism based on controlled partitioning that can be adapted to suit different balances of insertion and querying operations, and is faster and scales better than previous methods.
Abstract: Inverted index structures are the mainstay of modern text retrieval systems. They can be constructed quickly using off-line merge-based methods, and provide efficient support for a variety of querying modes. In this paper we examine the task of on-line index construction -- that is, how to build an inverted index when the underlying data must be continuously queryable, and the documents must be indexed and available for search as soon they are inserted. When straightforward approaches are used, document insertions become increasingly expensive as the size of the database grows. This paper describes a mechanism based on controlled partitioning that can be adapted to suit different balances of insertion and querying operations, and is faster and scales better than previous methods. Using experiments on 100GB of web data we demonstrate the efficiency of our methods in practice, showing that they dramatically reduce the cost of on-line index construction.

69 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111