Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Abstract generation based on rhetorical structure extraction

[...]

Kenji Ono¹, Kazuo Sumita¹, Seiji Miike¹•Institutions (1)

Toshiba¹

05 Aug 1994

TL;DR: The system first extracts the rhetorical structure, the compound of the rhetorical relations between sentences, and then cuts out less important parts in the extracted structure to generate an abstract of the desired length.

...read moreread less

Abstract: We have developed an automatic abstract generation system for Japanese expository writings based on rhetorical structure extraction. The system first extracts the rhetorical structure, the compound of the rhetorical relations between sentences, and then cuts out less important parts in the extracted structure to generate an abstract of the desired length.Evaluation of the generated abstract showed that it contains at maximum 74% of the most important sentences of the original text. The system is now utilized as a text browser for a prototypical interactive document retrieval system.

...read moreread less

126 citations

Proceedings Article•DOI•

Term clustering of syntactic phrases

[...]

David D. Lewis¹, W. B. Croft¹•Institutions (1)

University of Massachusetts Amherst¹

01 Dec 1989

TL;DR: This paper discusses the implementation of a syntactic phrase generator, as well as the preliminary experiments with producing phrase clusters, and shows small improvements in retrieval effectiveness resulting from the use of phrase clusters.

...read moreread less

Abstract: Term clustering and syntactic phrase formation are methods for transforming natural language text. Both have had only mixed success as strategies for improving the quality of text representations for document retrieval. Since the strengths of these methods are complementary, we have explored combining them to produce superior representations. In this paper we discuss our implementation of a syntactic phrase generator, as well as our preliminary experiments with producing phrase clusters. These experiments show small improvements in retrieval effectiveness resulting from the use of phrase clusters, but it is clear that corpora much larger than standard information retrieval test collections will be required to thoroughly evaluate the use of this technique.

...read moreread less

125 citations

Proceedings Article•DOI•

Demand-based document dissemination to reduce traffic and balance load in distributed information systems

[...]

Azer Bestavros¹•Institutions (1)

Boston University¹

25 Oct 1995

TL;DR: This work proposes a hierarchical demand-based replication strategy that optimally disseminates information from its producer to servers that are closer to its consumers, and shows that by disseminating the most popular documents on servers closer to clients, network traffic could be reduced considerably, while servers are load-balanced.

...read moreread less

Abstract: Research on replication techniques to reduce traffic and minimize the latency of information retrieval in a distributed system has concentrated on client-based caching, whereby recently/frequently accessed information is cached at a client (or at a proxy thereof) in anticipation of future accesses. We believe that such myopic solutions-focussing exclusively on a particular client or set of clients-are likely to have a limited impact. Instead, we offer a solution that allows the replication of information to be done on a global supply/demand basis. We propose a hierarchical demand-based replication strategy that optimally disseminates information from its producer to servers that are closer to its consumers. The level of dissemination depends on the relative popularity of documents, and on the expected reduction in traffic that results from such dissemination. We used extensive HTTP logs to validate an analytical model of server popularity and file access profiles. Using that model we show that by disseminating the most popular documents on servers closer to clients, network traffic could be reduced considerably, while servers are load-balanced. We argue that this process could be generalized to provide for an automated server-based information dissemination protocol that will be more effective in reducing both network bandwidth and document retrieval times than client-based caching protocols.

...read moreread less

125 citations

Proceedings Article•DOI•

Incremental relevance feedback

[...]

IJsbrand Jan Aalbersberg¹•Institutions (1)

Philips¹

01 Jun 1992

TL;DR: This paper focuses on a relevance feedback technique that allows easily understandable and manageable user interfaces, and at the same time provides high-quality retrieval results.

...read moreread less

Abstract: Although relevance feedback techniques have been investigated for more than 20 years, hardly any of these techniques has been implemented in a commercial full-text document retrieval system. In addition to pure performance problems, this is due to the fact that the application of relevance feedback techniques increases the complexity of the user interface and thus also the use of a document retrieval system. In this paper we concentrate on a relevance feedback technique that allows easily understandable and manageable user interfaces, and at the same time provides high-quality retrieval results. Moreover, the relevance feedback technique introduced unifies as well as improves other well-known relevance feedback techniques.

...read moreread less

124 citations

Patent•

System and method for associating structured and manually selected annotations with electronic document contents

[...]

Jeffrey Glass, Elizabeth Derr

17 Jun 2004

TL;DR: In this article, a system and method for assisting a human document annotator in recording semantic judgments about the contents of sample electronic documents is described, where a system administrator configures and stores a document annotation definition at a server computer, providing a precise and consistent structure for annotating documents and portions of documents.

...read moreread less

Abstract: A system and method is provided for assisting a human document annotator in recording semantic judgments about the contents of sample electronic documents. A system administrator first configures and stores a document annotation definition at a server computer, providing a precise and consistent structure for annotating documents and portions of documents. Documents intended to serve as sample documents for pattern matching against unknown documents are collected and stored at the server computer. A human annotator located at a client computer connected by a net-work to the server computer requests a display of a sample document to be annotated. A document is transmitted in an annotatable form from the server computer to the client computer. The human document annotator reviews the annotatable document, records semantic judgments about the document using interactive controls displayed with the document, and transmits a set of selected annotation values to the server computer. The server computer then stores the values and associates them with the document. The set of annotated documents, enhanced by the addition of structured semantic judgment information, then may be queried by other document management systems, improving the accuracy with which other systems perform automated document retrieval, comparison or filtering actions.

...read moreread less

124 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics