scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Patent
08 Nov 1996
TL;DR: In this article, a method for direct translation of a paper document into a hypertext-based format so that it may be directly accessed through the Internet using current browsers such as Mosaic, Netscape and Microsoft's Explorer.
Abstract: Paper documents are automatically converted into a hypertext-based format so that they can be accessed through electronic networks, including the Internet, or via non-volatile transfer media such as disks or CD-ROMs The invention generalizes the concept of form-based recognition while extending the concept of document retrieval to include document structure knowledge, thereby providing the advantages found in both form-based recognition (utilization of document structure knowledge) and image-based information retrieval (robustness) In a preferred embodiment, a method according to the invention enables direct translation of a paper document into a hypertext-based format so that it may be directly accessed through the Internet using current browsers such as Mosaic, Netscape and Microsoft's Explorer

170 citations

Journal ArticleDOI
TL;DR: This paper shows how aboutness is related to probability of satisfaction and shows that about is, in fact, not the central concept in a theory of document retrieval.
Abstract: The primary objective of this paper is to examine the concept of about as it is used in its information retrieval sense when, for example, an indexer judges that a document is (or is not) about some given subject. The problem with about is that it is a very complex notion and we are unable to say precisely what it is we do when we make judgment of aboutness. Since about is at the heart of indexing, how are we to formulate any proper theory of indexing if we cannot explicate precisely the key concept of about? In this paper we look at this concept of about and offer a solution to the problem mentioned; it consists of an operational definition of about which interprets about in terms of search behavior. A second objective of this paper is to show that about is, in fact, not the central concept in a theory of document retrieval. A document retrieval system ought to provide a ranked output (in response to a search query) not according to the degree that they are about the topic sought by the inquiring patron, but rather according to the probability that they will satisfy that person's information need. This paper shows how aboutness is related to probability of satisfaction.

170 citations

Journal ArticleDOI
TL;DR: This paper reports experiments with a term weighting model incorporating relevance information in which it is assumed that index terms are distributed dependently and argues that if high recall searches are required, relevance feedback based on the modified dependence model may be superior to the widely used Boolean search.
Abstract: This paper reports experiments with a term weighting model incorporating relevance information in which it is assumed that index terms are distributed dependently. Initially this model was tested with complete relevance information against a similar model which assumes index terms are distributed independently. The experiments demonstrated conclusively that index terms are not independent for a number of diverse document collections. It was concluded that the use of relevance information together with dependence information could potentially improve retrieval effectiveness. As a result of further experiments the initial strict dependence model was modified and in particular a new relevance‐based term weight was developed. This modified dependence model was then used as the basis for relevance feedback, i.e. with partial relevance information only, and significant increases in retrieval effectiveness were achieved. The evaluation method used in the feedback experiments emphasized the effect of the feedback on documents which the potential user would not previously have seen. Finally the incorporation of relevance feedback in an operational system is considered and in particular it is argued that if high recall searches are required, relevance feedback based on the modified dependence model may be superior to the widely used Boolean search.

170 citations

Journal ArticleDOI
TL;DR: The authors' technique combines several data compression features to provide economical storage, faster indexing, and accelerated searches.
Abstract: The continually growing Web challenges information retrieval systems to deliver data quickly The authors' technique combines several data compression features to provide economical storage, faster indexing, and accelerated searches

169 citations

Book ChapterDOI
09 Sep 1996
TL;DR: Experiments are presented that analyze the factors that affect dictionary based methods for cross-lingual retrieval and present methods that dramatically reduce the errors such an approach usually makes.
Abstract: Multi-lingual information retrieval (IR) has largely been limited to the development of systems for use with a specific foreign language. The explosion in the availability of electronic media in languages other than English makes the development of IR systems that can cross language boundaries increasingly important. In this paper, we present experiments that analyze the factors that affect dictionary based methods for cross-lingual retrieval and present methods that dramatically reduce the errors such an approach usually makes.

168 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111