Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Alternative Approaches for Cross-Language Text Retrieval

[...]

Douglas W. Oard¹•Institutions (1)

University of Maryland, College Park¹

01 Jan 1997

TL;DR: It is shown that users who are able to read more than one language will likely prefer a multilingual text retrieval system over a collection of monolingual systems and cross language text retrieval was selected as the preferred term in the interest of standardization.

...read moreread less

Abstract: The explosive growth of the Internet and other sources of networked information have made automatic me diation of access to networked information sources an increasingly important problem Much of this informa tion is expressed as electronic text and it is becoming practical to automatically convert some printed docu ments and recorded speech to electronic text as well Thus automated systems capable of detecting useful documents are nding widespread application With even a small number of languages it can be in convenient to issue the same query repeatedly in every language so users who are able to read more than one language will likely prefer a multilingual text retrieval system over a collection of monolingual systems And since reading ability in a language does not always im ply uent writing ability in that language such users will likely nd cross language text retrieval particularly useful for languages in which they are less con dent of their ability to express their information needs e ec tively The use of such systems can be also be bene cial if the user is able to read only a single language For example when only a small portion of the doc ument collection will ever be examined by the user performing retrieval before translation can be signif icantly more economical than performing translation before retrieval So when the application is su ciently important to justify the time and e ort required for translation those costs can be minimized if an e ec tive cross language text retrieval system is available Even when translation is not available there are cir cumstances in which cross language text retrieval could be useful to a monolingual user For example a re searcher might nd a paper published in an unfamil iar language useful if that paper contains references to works by the same author that are in the researcher s native language Multilingual text retrieval can be de ned as selec tion of useful documents from collections that may con tain several languages English French Chinese etc This formulation allows for the possibility that individ ual documents might contain more than one language a common occurrence in some applications Both cross language and within language retrieval are in cluded in this formulation but it is the cross language aspect of the problem which distinguishes multilin gual text retrieval from its well studied monolingual counterpart At the SIGIR workshop on Cross Linguistic Information Retrieval the participants dis cussed the proliferation of terminology being used to describe the eld and settled on Cross Language as the best single description of the salient aspect of the problem Multilingual was felt to be too broad since that term has also been used to describe systems able to perform within language retrieval in more than one language but that lack any cross language capabil ity Cross lingual and cross linguistic were felt to be equally good descriptions of the eld but cross language was selected as the preferred term in the interest of standardization Unfortunately at about the same time the U S Defense Advanced Research Projects Agency DARPA introduced translingual as their preferred term so we are still some distance from reaching consensus on this matter

...read moreread less

99 citations

Proceedings Article•DOI•

Web image retrieval re-ranking with relevance model

[...]

W.-H. Lin¹, Rong Jin¹, Alexander G. Hauptmann¹•Institutions (1)

Carnegie Mellon University¹

13 Oct 2003

TL;DR: A re-ranking method to improve Web image retrieval by reordering the images retrieved from an image search engine based on a relevance model, which is a probabilistic model that evaluates the relevance of the HTML document linking to the image, and assigns a probability of relevance.

...read moreread less

Abstract: Web image retrieval is a challenging task that requires efforts from image processing, link structure analysis, and Web text retrieval. Since content-based image retrieval is still considered very difficult, most current large-scale Web image search engines exploit text and link structure to "understand" the content of the Web images. However, local text information, such as caption, filenames and adjacent text, is not always reliable and informative. Therefore, global information should be taken into account when a Web image retrieval system makes relevance judgment. We propose a re-ranking method to improve Web image retrieval by reordering the images retrieved from an image search engine. The re-ranking process is based on a relevance model, which is a probabilistic model that evaluates the relevance of the HTML document linking to the image, and assigns a probability of relevance. The experiment results showed that the re-ranked image retrieval achieved better performance than original Web image retrieval, suggesting the effectiveness of the re-ranking method. The relevance model is learned from the Internet without preparing any training data and independent of the underlying algorithm of the image search engines. The re-ranking process should be applicable to any image search engines with little effort.

...read moreread less

99 citations

Journal Article•DOI•

Threshold values and Boolean retrieval systems

[...]

Duncan A. Buell¹, Donald H. Kraft¹•Institutions (1)

Louisiana State University¹

01 Jan 1981-Information Processing and Management

TL;DR: It is shown that the concept of threshold values resolves the problems inherent with relevance weights, and possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations are explored.

...read moreread less

Abstract: Several papers have appeared that have analyzed recent developments in the problem of processing, in a document retrieval system, queries expressed as Boolean expressions. The purpose of this paper is to continue that analysis. We shall show that the concept of threshold values resolves the problems inherent with relevance weights. Moreover, we shall explore possible evaluation mechanisms for retrieval of documents, based on fuzzy-set-theoretic considerations.

...read moreread less

99 citations

Proceedings Article•DOI•

Phonetic confusion matrix based spoken document retrieval

[...]

Savitha Srinivasan¹, Dragutin Petkovic¹•Institutions (1)

IBM¹

01 Jul 2000

TL;DR: This work proposes a novel method for phonetic retrieval in the CueVideo system based on the probabilistic formulation of term weighting using phone confusion data in a Bayesian framework and evaluates this method of spoken document retrieval against word-based retrieval for the search levels identified in a realistic video-based distributed learning setting.

...read moreread less

Abstract: Combined word-based index and phonetic indexes have been used to improve the performance of spoken document retrieval systems primarily by addressing the out-of-vocabulary retrieval problem. However, a known problem with phonetic recognition is its limited accuracy in comparison with word level recognition. We propose a novel method for phonetic retrieval in the CueVideo system based on the probabilistic formulation of term weighting using phone confusion data in a Bayesian framework. We evaluate this method of spoken document retrieval against word-based retrieval for the search levels identified in a realistic video-based distributed learning setting. Using our test data, we achieved an average recall of 0.88 with an average precision of 0.69 for retrieval of out-of-vocabulary words on phonetic transcripts with 35% word error rate. For in-vocabulary words, we achieved a 17% improvement in recall over word-based retrieval with a 17% loss in precision for word error rites ranging from 35 to 65%.

...read moreread less

98 citations

Proceedings Article•DOI•

The impact of named entity normalization on information retrieval for question answering

[...]

Mahboob Alam Khalid, Valentin Jijkoun¹, Maarten de Rijke¹•Institutions (1)

University of Amsterdam¹

30 Mar 2008

TL;DR: It is found that even a simple normalization method leads to improvements of early precision, both for document and passage retrieval, and better normalization results in better retrieval performance.

...read moreread less

Abstract: In the named entity normalization task, a system identifies a canonical unambiguous referent for names like Bush or Alabama. Resolving synonymy and ambiguity of such names can benefit end-to-end information access tasks. We evaluate two entity normalization methods based on Wikipedia in the context of both passage and document retrieval for question anwering. We find that even a simple normalization method leads to improvements of early precision, both for document and passage retrieval. Moreover, better normalization results in better retrieval performance.

...read moreread less

98 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics