Home
/
Topics
/
Document retrieval

Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

2023
2022
2021
2020
2019
2018
2017
2016
2015
2014
2013
2012
2011
2010
2009
2008
2007
2006
2005
2004
2003
2002
2001
2000
1999
1998
1997
1996
1995
1994
1993
1992
1991
1990
1989
1988
1987
1986
1985
1984
1983
1982
1981
1980
1979
1978
1977
1976
1975
1974
1973
1972
1971
1970
1969

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Web object retrieval

[...]

Zaiqing Nie¹, Yunxiao Ma¹, Shuming Shi¹, Ji-Rong Wen¹, Wei-Ying Ma¹ - Show less +1 more•Institutions (1)

Microsoft¹

08 May 2007

TL;DR: This paper proposes several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieved model, and a hybrid model with both structured and unstructuring retrieval features, and concludes that the hybrid model is the superior by taking into account the extraction errors at varying levels.

...read moreread less

Abstract: The primary function of current Web search engines is essentially relevance ranking at the document level. However, myriad structured information about real-world objects is embedded in static Web pages and online Web databases. Document-level information retrieval can unfortunately lead to highly inaccurate relevance ranking in answering object-oriented queries. In this paper, we propose a paradigm shift to enable searching at the object level. In traditional information retrieval models, documents are taken as the retrieval units and the content of a document is considered reliable. However, this reliability assumption is no longer valid in the object retrieval context when multiple copies of information about the same object typically exist. These copies may be inconsistent because of diversity of Web site qualities and the limited performance of current information extraction techniques. If we simply combine the noisy and inaccurate attribute information extracted from different sources, we may not be able to achieve satisfactory retrieval performance. In this paper, we propose several language models for Web object retrieval, namely an unstructured object retrieval model, a structured object retrieval model, and a hybrid model with both structured and unstructured retrieval features. We test these models on a paper search engine and compare their performances. We conclude that the hybrid model is the superior by taking into account the extraction errors at varying levels.

...read moreread less

129 citations

Journal Article•DOI•

Learning and inferencing in user ontology for personalized Semantic Web search

[...]

Xing Jiang¹, Ah-Hwee Tan¹•Institutions (1)

Nanyang Technological University¹

15 Jul 2009-Information Sciences

TL;DR: The proposed user ontology model with the spreading activation based inferencing procedure has been incorporated into a semantic search engine, called OntoSearch, to provide personalized document retrieval services.

...read moreread less

128 citations

Journal Article•DOI•

The InfoSky visual explorer: exploiting hierarchical structure and document similarities

[...]

Keith Andrews¹, Wolfgang Kienreich, Vedran Sabol, Jutta Becker, Georg Droschl, Frank Kappe, Michael Granitzer, Peter Auer¹, Klaus Tochtermann - Show less +5 more•Institutions (1)

Graz University of Technology¹

01 Dec 2002-Information Visualization

TL;DR: InfoSky is a system enabling users to explore large, hierarchically structured document collections using a planar graphical representation with variable magnification, and can map metadata such as document size or age to attributes of the visualisation such as colour and luminance.

...read moreread less

Abstract: InfoSky is a system enabling users to explore large, hierarchically structured document collections. Similar to a real-world telescope, InfoSky employs a planar graphical representation with variable magnification. Documents of similar content are placed close to each other and are visualised as stars, forming clusters with distinct shapes. For greater performance, the hierarchical structure is exploited and force-directed placement is applied recursively at each level on much fewer objects, rather than on the whole corpus. Collections of documents at a particular level in the hierarchy are visualised with bounding polygons using a modified weighted Voronoi diagram. Their area is related to the number of documents contained. Textual labels are displayed dynamically during navigation, adjusting to the visualisation content. Navigation is animated and provides a seamless zooming transition between summary and detail view. Users can map metadata such as document size or age to attributes of the visualisation such as colour and luminance. Queries can be made and matching documents or collections are highlighted. Formative usability testing is ongoing; a small baseline experiment comparing the telescope browser to a tree browser is discussed.

...read moreread less

128 citations

Novelty and diversity metrics for recommender systems: Choice, discovery and relevance

[...]

Pablo Castells, Saúl Vargas, Jun Wang

01 Jan 2011

TL;DR: This is an electronic version of the paper presented at the International Workshop on Diversity in Document Retrieval, held in Dublin on 2011.

...read moreread less

Abstract: This is an electronic version of the paper presented at the International Workshop on Diversity in Document Retrieval, held in Dublin on 2011

...read moreread less

128 citations

Proceedings Article•

Overview of the TREC 2019 deep learning track

[...]

Nick Craswell¹, Bhaskar Mitra¹, Emine Yilmaz², Daniel Campos¹, Ellen M. Voorhees³ - Show less +1 more•Institutions (3)

Microsoft¹, University College London², National Institute of Standards and Technology³

01 Mar 2020

TL;DR: The Deep Learning Track as mentioned in this paper is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC-style blind evaluation and reusable test sets.

...read moreread less

Abstract: The Deep Learning Track is a new track for TREC 2019, with the goal of studying ad hoc ranking in a large data regime. It is the first track with large human-labeled training sets, introducing two sets corresponding to two tasks, each with rigorous TREC-style blind evaluation and reusable test sets. The document retrieval task has a corpus of 3.2 million documents with 367 thousand training queries, for which we generate a reusable test set of 43 queries. The passage retrieval task has a corpus of 8.8 million passages with 503 thousand training queries, for which we generate a reusable test set of 43 queries. This year 15 groups submitted a total of 75 runs, using various combinations of deep learning, transfer learning and traditional IR ranking methods. Deep learning runs significantly outperformed traditional IR runs. Possible explanations for this result are that we introduced large training data and we included deep models trained on such data in our judging pools, whereas some past studies did not have such training data or pooling.

...read moreread less

128 citations

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
…
54
55
56
57
58
59
60
…
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics