Home
/
Topics
/
Document retrieval

Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Journal Article•DOI•

The Nature of Relevance in Information Retrieval: An Empirical Study

[...]

Taemin Kim Park

29 Sep 2015-The Library Quarterly

TL;DR: Using naturalistic inquiry methodology, an empirical study of user-based relevance interpretations is reported that reflects the nature of the thought processes of users who are evaluating bibliographic citations produced by a document retrieval system.

...read moreread less

Abstract: Experimental research in information retrieval (IR) depends on the idea of relevance. Because of its key role in IR, recent questions about relevance have raised issues of methodological concern and have shaken the philosophical foundations of IR theory development. Despite an existing set of theoretical definitions of this concept, our understanding of relevance from users' perspectives is still limited. Using naturalistic inquiry methodology, this article reports an empirical study of user-based relevance interpretations. A model is presented that reflects the nature of the thought processes of users who are evaluating bibliographic citations produced by a document retrieval system. Three major categories of variables affecting relevance assessments-internal context, external context, and problem context-are identified and described. Users' relevance assessments involve multiple layers of interpretations that are derived from individuals' experiences, perceptions, and private knowledge related to the pa...

...read moreread less

201 citations

Proceedings Article•DOI•

Incremental updates of inverted lists for text document retrieval

[...]

Anthony Tomasic¹, Hector Garcia-Molina¹, Kurt A. Shoens²•Institutions (2)

Stanford University¹, IBM²

24 May 1994

TL;DR: In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index that dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list.

...read moreread less

Abstract: With the proliferation of the world's “information highways” a renewed interest in efficient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index. The index dynamically separates long and short inverted lists and optimizes retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering trade-offs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.

...read moreread less

200 citations

Proceedings Article•

EntityRank: searching entities directly and holistically

[...]

Tao Cheng¹, Xifeng Yan¹, Kevin Chen-Chuan Chang¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

23 Sep 2007

TL;DR: This work focuses on the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking.

...read moreread less

Abstract: As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. While entities appear in many pages, current engines only find each page individually. Toward searching directly and holistically for finding information of finer granularity, we study the problem of entity search, a significant departure from traditional document retrieval. We focus on the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We evaluate our online prototype over a 2TB Web corpus, and show that EntityRank performs effectively.

...read moreread less

200 citations

Journal Article•DOI•

Subword-based approaches for spoken document retrieval

[...]

Kenney Ng¹, Victor W. Zue¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Oct 2000-Speech Communication

TL;DR: It is found that with the appropriate subword units, it is possible to achieve performance comparable to that of text-based word units if the underlying phonetic units are recognized correctly.

...read moreread less

200 citations

Journal Article•DOI•

Information extraction: beyond document retrieval

[...]

Robert Gaizauskas¹, Yorick Wilks¹•Institutions (1)

University of Sheffield¹

01 Mar 1998-Journal of Documentation

TL;DR: A synoptic view of the growth of the text processing technology of information extraction whose function is to extract information about a pre‐specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates is given.

...read moreread less

Abstract: In this paper we give a synoptic view of the growth of the text processing technology of information extraction (IE) whose function is to extract information about a pre‐specified set of entities, relations or events from natural language texts and to record this information in structured representations called templates. Here we describe the nature of the IE task, review the history of the area from its origins in AI work in the 1960s and 70s till the present, discuss the techniques being used to carry out the task, describe application areas where IE systems are or are about to be at work, and conclude with a discussion of the challenges facing the area. What emerges is a picture of an exciting new text processing technology with a host of new applications, both on its own and in conjunction with other technologies, such as information retrieval, machine translation and data mining.

...read moreread less

199 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
…
30
31
32
33
34
35
36
…
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics