Topic
Document retrieval
About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.
Papers published on a yearly basis
Papers
More filters
••
TL;DR: Evidence from available studies comparing manual and automatic text-retrieval systems does not support the conclusion that intellectual content analysis produces better results than comparable automatic systems.
Abstract: Evidence from available studies comparing manual and automatic text-retrieval systems does not support the conclusion that intellectual content analysis produces better results than comparable automatic systems.
321 citations
•
14 Aug 1997TL;DR: It is shown that word-intersection clustering produces superior clusters and does so faster than standard techniques, and that the O(n log n) time phrase-intersections clustering method produces comparable clusters anddoes so more than two orders of magnitude faster than all methods tested.
Abstract: Conventional document retrieval systems (e.g., Alta Vista) return long lists of ranked documents in response to user queries. Recently, document clustering has been put forth as an alternative method of organizing retrieval results (Cutting et al. 1992). A person browsing the clusters can discover patterns that could be overlooked in the traditional presentation.
This paper describes two novel clustering methods that intersect the documents in a cluster to determine the set of words (or phrases) shared by all the documents in the cluster. We report on experiments that evaluate these intersection-based clustering methods on collections of snippets returned from Web search engines. First, we show that word-intersection clustering produces superior clusters and does so faster than standard techniques. Second, we show that our O(n log n) time phrase-intersection clustering method produces comparable clusters and does so more than two orders of magnitude faster than all methods tested.
321 citations
••
20 Jun 2002
TL;DR: This text covers the emerging technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical issues.
Abstract: This text covers the emerging technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical issues. It seeks to satisfy a need on the part of technology practitioners in the Internet space, faced with having to make difficult decisions as to what research has been done and what the best practices are. It is not intended as a vendor guide or as a recipe for building applications. But it does identify the key technologies, the issues involved, and the strengths and weaknesses of the various approaches. There is also a strong emphasis on evaluation in every chapter, both in terms of methodology (how to evaluate) and what controlled experimentation and industrial experience have to tell us.
321 citations
••
01 May 1995TL;DR: The second Text Retrieval Conference (TREC-2) was held in August, 1993, and was attended by about 150 people involved in 31 participating groups as discussed by the authors, with a large variation of retrieval techniques reported on, including methods using automatic thesaurii, sophisticated term weighting, natural language techniques, relevance feedback, and advanced pattern matching.
Abstract: The second Text Retrieval Conference (TREC-2) was held in August, 1993, and was attended by about 150 people involved in 31 participating groups. The goal of the conference was to bring research groups together to discuss their work on a new large test collection. There was a large variation of retrieval techniques reported on, including methods using automatic thesaurii, sophisticated term weighting, natural language techniques, relevance feedback, and advanced pattern matching. As results had been run through a common evaluation package, groups were able to compare the effectiveness of different techniques, and discuss how differences between the systems affected performance.
318 citations
•
13 Jan 1995TL;DR: A document retrieval and accessing system in which documents are provided with links to other documents is described in this paper, where the selection of one or more of the links causes the corresponding documents to be retrieved and sent to the requesting party.
Abstract: A document retrieval and accessing system in which documents are provided with links to other documents. Selection of one or more of the links causes the corresponding documents to be retrieved and sent to the requesting party. Then retrieved documents may also include links to yet even more documents.
317 citations