Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Posted Content•

Latent Collaborative Retrieval

[...]

Jason Weston¹, Chong Wang², Ron Weiss¹, Adam Berenzweig¹•Institutions (2)

Google¹, Princeton University²

18 Jun 2012-arXiv: Information Retrieval

TL;DR: This paper introduces a factorized model for this new task that optimizes the top-ranked items returned for the given query and user and reports empirical results where it outperforms several baselines.

...read moreread less

Abstract: Retrieval tasks typically require a ranking of items given a query. Collaborative filtering tasks, on the other hand, learn to model user's preferences over items. In this paper we study the joint problem of recommending items to a user with respect to a given query, which is a surprisingly common task. This setup differs from the standard collaborative filtering one in that we are given a query x user x item tensor for training instead of the more traditional user x item matrix. Compared to document retrieval we do have a query, but we may or may not have content features (we will consider both cases) and we can also take account of the user's profile. We introduce a factorized model for this new task that optimizes the top-ranked items returned for the given query and user. We report empirical results where it outperforms several baselines.

...read moreread less

40 citations

Journal Article•DOI•

The evaluation of automatic retrieval procedures : Selected test results using the SMART system

[...]

Gerard Salton¹•Institutions (1)

Harvard University¹

01 Jul 1965-American Documentation

TL;DR: The present report deals with the evaluation of a variety of automatic indexing and retrieval procedures incorporated into the SMART automatic document retrieval system.

...read moreread less

Abstract: The generation of effective methods for the evaluation of information retrieval systems and techniques is becoming increasingly important as more and more systems are designed and implemented. The present report deals with the evaluation of a variety of automatic indexing and retrieval procedures incorporated into the SMART automatic document retrieval system. The design of the SMART system is first briefly reviewed. The document file, search requests, and other parameters affecting the evaluation system are then examined in detail, and the measures used to assess the effectiveness of the retrieval performance are described. The main test results are given and tentative conclusions are reached concerning the design of fully automatic information systems.

...read moreread less

40 citations

Evaluation of noisy transcripts for spoken document retrieval

[...]

Laurens Bastiaan van der Werff

05 Jul 2012

TL;DR: A novel framework is introduced, in which evaluation is done in an extrinsic, and query-dependent manner but without depending on relevance judgments, which is expected to be helpful for the task of optimizing the configuration of ASR systems for the transcription of (large) speech collections for use in Spoken Document Retrieval.

...read moreread less

Abstract: Spoken Document Retrieval (SDR) is usually implemented by using an Information Retrieval (IR) engine on speech transcripts that are produced by an Automatic Speech Recognition (ASR) system. These transcripts generally contain a substantial amount of transcription errors (noise) and are mostly unstructured. This thesis addresses two challenges that arise when doing IR on this type of source material: i. segmentation of speech transcripts into suitable retrieval units, and ii. evaluation of the impact of transcript noise on the results of an IR task. It is shown that intrinsic evaluation results in different conclusions with regard to the quality of automatic story boundaries than when (extrinsic) Mean Average Precision (MAP) is used. This indicates that for automatic story segmentation for search applications, the traditionally used (intrinsic) segmentation cost may not be a good performance target. The best performance in an SDR context was achieved using lexical cohesion-based approaches, rather than the statistical approaches that were most popular in story segmentation benchmarks. For the evaluation of speech transcript noise in an SDR context a novel framework is introduced, in which evaluation is done in an extrinsic, and query-dependent manner but without depending on relevance judgments. This is achieved by making a direct comparison between the ranked results lists of IR tasks on a reference and an ASR-derived transcription. The resulting measures are highly correlated with MAP, making it possible to do extrinsic evaluation of ASR transcripts for ad-hoc collections, while using a similar amount of reference material as the popular intrinsic metric Word Error Rate. The proposed evaluation methods are expected to be helpful for the task of optimizing the configuration of ASR systems for the transcription of (large) speech collections for use in Spoken Document Retrieval, rather than the more traditional dictation tasks.

...read moreread less

40 citations

Proceedings Article•

ANU/ACSys TREC-5 Experiments.

[...]

David Hawking¹, Paul B. Thistlewaite¹, Peter Bailey¹•Institutions (1)

Australian National University¹

01 Jan 1996

TL;DR: In this article, distance-based relevance scoring (spans) is used to identify promising information servers in the context of the ad hoc retrieval task and lightweight probe queries are shown to be an effective method for identifying promising information server in the latter task.

...read moreread less

Abstract: A number of experiments conducted within the framework of the TREC-5 conference and using the Parallel Document Retrieval Engine (PADRE) are reported. Several of the experiments involve the use of distance-based relevance scoring (spans). This scoring method is shown to be capable of very good precision-recall performance, provided that good queries are described and evaluated in the context of the adhoc retrieval task. Span queries are also applied to processing a larger (4.5 gigabytes) collection, to retrieval over OCR-corrupted data and to a database merging task. Lightweight probe queries are shown to be an effective method for identifying promising information servers in the context of the latter task. New tehniques for automatically generating more conventional weighted-terms queries from short topic descriptions have also been devised and are evaluated

...read moreread less

40 citations

Journal Article•

Free text vs. controlled vocabulary; a reassessment

[...]

C. P. R. Dubois

01 Jan 1987-Online Information Review

TL;DR: Etude comparative des avantages et des inconvenients du langage naturel and du vocabulaire controle pour la recherche documentaire automatise.

...read moreread less

Abstract: Etude comparative des avantages et des inconvenients du langage naturel et du vocabulaire controle pour la recherche documentaire automatise L'auteur analyse egalement la pertinence de l'une ou l'autre methode, dans le contexte de systemes experts ou de banques de donnees en texte integral

...read moreread less

40 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics