Home
/
Topics
/
Document retrieval

Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Finding information on the World Wide Web: the retrieval effectiveness of search engines

[...]

Michael D. Gordon¹, Praveen Pathak¹•Institutions (1)

University of Michigan¹

01 Mar 1999-Information Processing and Management

TL;DR: Traditional information retrieval measures of recall and precision at varying numbers of retrieved documents are calculated and used as the bases for statistical comparisons of retrieval effectiveness among the eight search engines.

...read moreread less

Abstract: Search engines are essential for finding information on the World Wide Web. We conducted a study to see how effective eight search engines are. Expert searchers sought information on the Web for users who had legitimate needs for information, and these users assessed the relevance of the information retrieved. We calculated traditional information retrieval measures of recall and precision at varying numbers of retrieved documents and used these as the bases for statistical comparisons of retrieval effectiveness among the eight search engines. We also calculated the likelihood that a document retrieved by one search engine was retrieved by other search engines as well.

...read moreread less

382 citations

Journal Article•DOI•

Document ranking and the vector-space model

[...]

Dik Lun Lee¹, Huei Chuang, Kent E. Seamons•Institutions (1)

Hong Kong University of Science and Technology¹

01 Mar 1997-IEEE Software

TL;DR: Using several simplifications of the vector-space model for text retrieval queries, the authors seek the optimal balance between processing efficiency and retrieval effectiveness as expressed in relevant document rankings.

...read moreread less

Abstract: Efficient and effective text retrieval techniques are critical in managing the increasing amount of textual information available in electronic form. Yet text retrieval is a daunting task because it is difficult to extract the semantics of natural language texts. Many problems must be resolved before natural language processing techniques can be effectively applied to a large collection of texts. Most existing text retrieval techniques rely on indexing keywords. Unfortunately, keywords or index terms alone cannot adequately capture the document contents, resulting in poor retrieval performance. Yet keyword indexing is widely used in commercial systems because it is still the most viable way by far to process large amounts of text. Using several simplifications of the vector-space model for text retrieval queries, the authors seek the optimal balance between processing efficiency and retrieval effectiveness as expressed in relevant document rankings.

...read moreread less

382 citations

Book•

Understanding Search Engines: Mathematical Modeling and Text Retrieval

[...]

Michael W. Berry¹, Murray Browne¹•Institutions (1)

University of Tennessee¹

01 Jul 1999

TL;DR: In this paper, the authors bridge the gap between applied mathematics and information retrieval and discuss some of the current problems in information retrieval that may not be familiar to applied mathematicians and computer scientists.

...read moreread less

Abstract: A discussion of many of the key design issues for building search engines. It emphasizes the important roles that applied mathematics can play in improving information retrieval. The authors discuss not only important data structures, algorithms and software, but also user-centred issues such as interfaces, manual indexing, and document preparation. The authors bridge the gap between applied mathematics and information retrieval. They discuss some of the current problems in information retrieval that may not be familiar to applied mathematicians and computer scientists and present some of the driving computational methods (SVD, SDD) for automated conceptual indexing. This book introduces topics in a non-technical way and provides insights into common problems found in information retrieval. The more mathematical details are provided in sidebars or offset from the regular text.

...read moreread less

381 citations

Proceedings Article•DOI•

Two supervised learning approaches for name disambiguation in author citations

[...]

Hui Han¹, C. Lee Giles¹, Hongyuan Zha¹, Cheng Li², Kostas Tsioutsiouliklis³ - Show less +1 more•Institutions (3)

Pennsylvania State University¹, Harvard University², Princeton University³

07 Jun 2004

TL;DR: Two supervised learning approaches to disambiguate authors in the citations are investigated, one uses the naive Bayes probability model, a generative model; the other uses support vector machines (SVMs) and the vector space representation of citations, a discriminative model.

...read moreread less

Abstract: Due to name abbreviations, identical names, name misspellings, and pseudonyms in publications or bibliographies (citations), an author may have multiple names and multiple authors may share the same name. Such name ambiguity affects the performance of document retrieval, Web search, database integration, and may cause improper attribution to authors. We investigate two supervised learning approaches to disambiguate authors in the citations. One approach uses the naive Bayes probability model, a generative model; the other uses support vector machines (SVMs) [V. Vapnik (1995)] and the vector space representation of citations, a discriminative model. Both approaches utilize three types of citation attributes: coauthor names, the title of the paper, and the title of the journal or proceeding. We illustrate these two approaches on two types of data, one collected from the Web, mainly publication lists from homepages, the other collected from the DBLP citation databases.

...read moreread less

378 citations

Journal Article•DOI•

Access methods for text

[...]

Chris Faloutsos¹•Institutions (1)

University of Toronto¹

01 Mar 1985-ACM Computing Surveys

TL;DR: This paper compares text retrieval methods intended for office systems with methods from database systems and from information retrieval systems, and examines the most interesting representatives of each class.

...read moreread less

Abstract: This paper compares text retrieval methods intended for office systems. The operational requirements of the office environment are discussed, and retrieval methods from database systems and from information retrieval systems are examined. We classify these methods and examine the most interesting representatives of each class. Attempts to speed up retrieval with special purpose hardware are also presented, and issues such as approximate string matching and compression are discussed. A qualitative comparison of the examined methods is presented. The signature file method is discussed in more detail.

...read moreread less

375 citations

1
2
3
4
5
6
7
8
9
10
11
…
12
13
14
15
16
17
18
…
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics