scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Proceedings ArticleDOI
11 Aug 2002
TL;DR: The view presented in this paper is that the fundamental vocabulary of the system is the images in the database and that relevance feedback is a document whose words are the images which expresses the semantic intent of the user over that query.
Abstract: This paper proposes a novel view of the information generated by relevance feedback. The latent semantic analysis is adapted to this view to extract useful inter-query information. The view presented in this paper is that the fundamental vocabulary of the system is the images in the database and that relevance feedback is a document whose words are the images. A relevance feedback document contains the intra-query information which expresses the semantic intent of the user over that query. The inter-query information then takes the form of a collection of documents which can be subjected to latent semantic analysis. An algorithm to query the latent semantic index is presented and evaluated against real data sets.

63 citations

Proceedings ArticleDOI
01 Jul 1995
TL;DR: An approach is developed that provides a framework to achieve both scalability and full integration of IR and RDBMS technology and validate the cooperative indexing scheme and suggest alternatives to further improve performance.
Abstract: The full integration of information retrieval (IR) features into a database management system (DBMS) has long been recognized as both a significant goal and a challenging undertaking. By full integration we mean: i) support for document storage, indexing, retrieval, and update, ii) transaction semantics, thus all database operations on documents have the ACID properties of atomicity, consistency, isolation, and durability, iii) concurrent addition, update, and retrieval of documents, and iv) database query language extensions to provide ranking for document retrieval operations. It is also necessary for the integrated offering to exhibit scaleable performance for document indexing and retrieval processes, To identify the implementation requirements imposed by the desired level of integration, we layered a representative IR application on Oracle Rdb and then conducted a number of database load and document retrieval experiments. The results of these experiments suggest that infrastructural extensions are necessary to obtain both the desired level of IR integration and scaleable performance. With the insight gained from our initial experiments, we developed an approach, called cooperative indexing, that provides a framework to achieve both scalability and full integration of IR and RDBMS technology. Prototype implementations of system-level extensions to support cooperative indexing were evaluated with a modified version of Oracle Rdb. Our experimental findings validate the cooperative indexing scheme and suggest alternatives to further improve performance.

63 citations

Journal ArticleDOI
TL;DR: An expert system for online search assistance automatically reformulates queries to improve the search results, and ranks the retrieved passages to speed the identification of relevant information.
Abstract: Unfamiliarity with search tactics creates difficulties for many users of online retrieval systems. User observations indicate that even experienced searchers use vocabulary incorrectly and rarely reformulate their queries. To address these problems, an expert system for online search assistance was developed. This prototype automatically reformulates queries to improve the search results, and ranks the retrieved passages to speed the identification of relevant information. Users' search performance using the expert system was compared with their search performance on their own, and their search performance using an online thesaurus. The following conclusions were reached: (1) The expert system significantly reduced the number of queries necessary to find relevant passages compared with the user searching alone or with the thesaurus. (2)The expert system puced marinally significant improvemen in precision compared with e user searching on their own. There was no significant differnce in e call achieved b e thre system configurations. (3) Overall, the expert system ranked relevand passages above irrelevant passages

63 citations

Book ChapterDOI
13 May 1998
TL;DR: This paper shows how case-based reasoning techniques can be applied to document retrieval, gives an overview over the approach to Textual CBR, describes a particular application project, and evaluates the performance of the system.
Abstract: In this paper, we show how case-based reasoning (CBR) techniques can be applied to document retrieval. The fundamental idea is to automatically convert textual documents into appropriate case representations and use these to retrieve relevant documents in a problem situation. In contrast to Information Retrieval techniques, we assume that a Textual CBR system focuses on a particular domain and thus can employ knowledge from that domain. We give an overview over our approach to Textual CBR, describe a particular application project, and evaluate the performance of the system.

63 citations

Proceedings ArticleDOI
04 Feb 2013
TL;DR: This paper studies new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9], by implementing and comparing Block- Max oriented algorithms based on the well-known Maxscore and WAND approaches.
Abstract: Large web search engines use significant hardware and energy resources to process hundreds of millions of queries each day, and a lot of research has focused on how to improve query processing efficiency. One general class of optimizations called early termination techniques is used in all major engines, and essentially involves computing top results without an exhaustive traversal and scoring of all potentially relevant index entries. Recent work in [9,7] proposed several early termination algorithms for disjunctive top-k query processing, based on a new augmented index structure called Block-Max Index that enables aggressive skipping in the index.In this paper, we build on this work by studying new algorithms and optimizations for Block-Max indexes that achieve significant performance gains over the work in [9,7]. We start by implementing and comparing Block-Max oriented algorithms based on the well-known Maxscore and WAND approaches. Then we study how to build better Block-Max index structures and design better index-traversal strategies, resulting in new algorithms that achieve a factor of 2 speed-up over the best results in [9] with acceptable space overheads. We also describe and evaluate a hierarchical algorithm for a new recursive Block-Max index structure.

63 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111