scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Proceedings ArticleDOI
Kenji Ono1, Kazuo Sumita1, Seiji Miike1
05 Aug 1994
TL;DR: The system first extracts the rhetorical structure, the compound of the rhetorical relations between sentences, and then cuts out less important parts in the extracted structure to generate an abstract of the desired length.
Abstract: We have developed an automatic abstract generation system for Japanese expository writings based on rhetorical structure extraction. The system first extracts the rhetorical structure, the compound of the rhetorical relations between sentences, and then cuts out less important parts in the extracted structure to generate an abstract of the desired length.Evaluation of the generated abstract showed that it contains at maximum 74% of the most important sentences of the original text. The system is now utilized as a text browser for a prototypical interactive document retrieval system.

126 citations

Proceedings ArticleDOI
01 Dec 1989
TL;DR: This paper discusses the implementation of a syntactic phrase generator, as well as the preliminary experiments with producing phrase clusters, and shows small improvements in retrieval effectiveness resulting from the use of phrase clusters.
Abstract: Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combining them to produce superior representations. In this paper we discuss our implementation of a syntactic phrase generator, as well as our preliminary experiments with producing phrase clusters. These experiments show small improvements in retrieval effectiveness resulting from the use of phrase clusters, but it is clear that corpora much larger than standard information retrieval test collections will be required to thoroughly evaluate the use of this technique.

125 citations

Proceedings ArticleDOI
Azer Bestavros1
25 Oct 1995
TL;DR: This work proposes a hierarchical demand-based replication strategy that optimally disseminates information from its producer to servers that are closer to its consumers, and shows that by disseminating the most popular documents on servers closer to clients, network traffic could be reduced considerably, while servers are load-balanced.
Abstract: Research on replication techniques to reduce traffic and minimize the latency of information retrieval in a distributed system has concentrated on client-based caching, whereby recently/frequently accessed information is cached at a client (or at a proxy thereof) in anticipation of future accesses. We believe that such myopic solutions-focussing exclusively on a particular client or set of clients-are likely to have a limited impact. Instead, we offer a solution that allows the replication of information to be done on a global supply/demand basis. We propose a hierarchical demand-based replication strategy that optimally disseminates information from its producer to servers that are closer to its consumers. The level of dissemination depends on the relative popularity of documents, and on the expected reduction in traffic that results from such dissemination. We used extensive HTTP logs to validate an analytical model of server popularity and file access profiles. Using that model we show that by disseminating the most popular documents on servers closer to clients, network traffic could be reduced considerably, while servers are load-balanced. We argue that this process could be generalized to provide for an automated server-based information dissemination protocol that will be more effective in reducing both network bandwidth and document retrieval times than client-based caching protocols.

125 citations

Proceedings ArticleDOI
IJsbrand Jan Aalbersberg1
01 Jun 1992
TL;DR: This paper focuses on a relevance feedback technique that allows easily understandable and manageable user interfaces, and at the same time provides high-quality retrieval results.
Abstract: Although relevance feedback techniques have been investigated for more than 20 years, hardly any of these techniques has been implemented in a commercial full-text document retrieval system. In addition to pure performance problems, this is due to the fact that the application of relevance feedback techniques increases the complexity of the user interface and thus also the use of a document retrieval system. In this paper we concentrate on a relevance feedback technique that allows easily understandable and manageable user interfaces, and at the same time provides high-quality retrieval results. Moreover, the relevance feedback technique introduced unifies as well as improves other well-known relevance feedback techniques.

124 citations

Patent
17 Jun 2004
TL;DR: In this article, a system and method for assisting a human document annotator in recording semantic judgments about the contents of sample electronic documents is described, where a system administrator configures and stores a document annotation definition at a server computer, providing a precise and consistent structure for annotating documents and portions of documents.
Abstract: A system and method is provided for assisting a human document annotator in recording semantic judgments about the contents of sample electronic documents. A system administrator first configures and stores a document annotation definition at a server computer, providing a precise and consistent structure for annotating documents and portions of documents. Documents intended to serve as sample documents for pattern matching against unknown documents are collected and stored at the server computer. A human annotator located at a client computer connected by a net-work to the server computer requests a display of a sample document to be annotated. A document is transmitted in an annotatable form from the server computer to the client computer. The human document annotator reviews the annotatable document, records semantic judgments about the document using interactive controls displayed with the document, and transmits a set of selected annotation values to the server computer. The server computer then stores the values and associates them with the document. The set of annotated documents, enhanced by the addition of structured semantic judgment information, then may be queried by other document management systems, improving the accuracy with which other systems perform automated document retrieval, comparison or filtering actions.

124 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111