scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Journal ArticleDOI
Gerard Salton1
TL;DR: Evidence from available studies comparing manual and automatic text-retrieval systems does not support the conclusion that intellectual content analysis produces better results than comparable automatic systems.
Abstract: Evidence from available studies comparing manual and automatic text-retrieval systems does not support the conclusion that intellectual content analysis produces better results than comparable automatic systems.

321 citations

Proceedings Article
14 Aug 1997
TL;DR: It is shown that word-intersection clustering produces superior clusters and does so faster than standard techniques, and that the O(n log n) time phrase-intersections clustering method produces comparable clusters anddoes so more than two orders of magnitude faster than all methods tested.
Abstract: Conventional document retrieval systems (e.g., Alta Vista) return long lists of ranked documents in response to user queries. Recently, document clustering has been put forth as an alternative method of organizing retrieval results (Cutting et al. 1992). A person browsing the clusters can discover patterns that could be overlooked in the traditional presentation. This paper describes two novel clustering methods that intersect the documents in a cluster to determine the set of words (or phrases) shared by all the documents in the cluster. We report on experiments that evaluate these intersection-based clustering methods on collections of snippets returned from Web search engines. First, we show that word-intersection clustering produces superior clusters and does so faster than standard techniques. Second, we show that our O(n log n) time phrase-intersection clustering method produces comparable clusters and does so more than two orders of magnitude faster than all methods tested.

321 citations

MonographDOI
20 Jun 2002
TL;DR: This text covers the emerging technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical issues.
Abstract: This text covers the emerging technologies of document retrieval, information extraction, and text categorization in a way which highlights commonalities in terms of both general principles and practical issues. It seeks to satisfy a need on the part of technology practitioners in the Internet space, faced with having to make difficult decisions as to what research has been done and what the best practices are. It is not intended as a vendor guide or as a recipe for building applications. But it does identify the key technologies, the issues involved, and the strengths and weaknesses of the various approaches. There is also a strong emphasis on evaluation in every chapter, both in terms of methodology (how to evaluate) and what controlled experimentation and industrial experience have to tell us.

321 citations

Journal ArticleDOI
01 May 1995
TL;DR: The second Text Retrieval Conference (TREC-2) was held in August, 1993, and was attended by about 150 people involved in 31 participating groups as discussed by the authors, with a large variation of retrieval techniques reported on, including methods using automatic thesaurii, sophisticated term weighting, natural language techniques, relevance feedback, and advanced pattern matching.
Abstract: The second Text Retrieval Conference (TREC-2) was held in August, 1993, and was attended by about 150 people involved in 31 participating groups. The goal of the conference was to bring research groups together to discuss their work on a new large test collection. There was a large variation of retrieval techniques reported on, including methods using automatic thesaurii, sophisticated term weighting, natural language techniques, relevance feedback, and advanced pattern matching. As results had been run through a common evaluation package, groups were able to compare the effectiveness of different techniques, and discuss how differences between the systems affected performance.

318 citations

Patent
Gregory J. Wolff1
13 Jan 1995
TL;DR: A document retrieval and accessing system in which documents are provided with links to other documents is described in this paper, where the selection of one or more of the links causes the corresponding documents to be retrieved and sent to the requesting party.
Abstract: A document retrieval and accessing system in which documents are provided with links to other documents. Selection of one or more of the links causes the corresponding documents to be retrieved and sent to the requesting party. Then retrieved documents may also include links to yet even more documents.

317 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111