Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Information retrieval on the semantic web

[...]

Urvi Shah, Tim Finin¹, Anupam Joshi¹, R. Scott Cost¹, James Matfield² - Show less +1 more•Institutions (2)

University of Maryland, Baltimore County¹, Johns Hopkins University Applied Physics Laboratory²

04 Nov 2002

TL;DR: An approach to retrieval of documents that contain of both free text and semantically enriched markup in which both documents and queries can be marked up with statements in the DAML+OIL semantic web language is described.

...read moreread less

Abstract: We describe an approach to retrieval of documents that contain of both free text and semantically enriched markup. In particular, we present the design and implementation prototype of a framework in which both documents and queries can be marked up with statements in the DAML+OIL semantic web language. These statements provide both structured and semi-structured information about the documents and their content. We claim that indexing text and semantic markup together will significantly improve retrieval performance. Our approach allows inferencing to be done over this information at several points: when a document is indexed, when a query is processed and when query results are evaluated.

...read moreread less

227 citations

Proceedings Article•DOI•

Generating phonetic cognates to handle named entities in English-Chinese cross-language spoken document retrieval

[...]

Helen Meng¹, Wai-Kit Lo¹, Berlin Chen², Karen P. Tang³•Institutions (3)

The Chinese University of Hong Kong¹, National Taiwan University², Princeton University³

01 Jan 2001

TL;DR: A technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval is presented, showing consistent retrieval performance improvement by including the use of named entities in this way.

...read moreread less

Abstract: We have developed a technique for automatic transliteration of named entities for English-Chinese cross-language spoken document retrieval (CL-SDR). Our retrieval system integrates machine translation, speech recognition and information retrieval technologies. An English news story forms a textual query that is automatically translated into Chinese words, which are mapped into Mandarin syllables by pronunciation dictionary lookup. Mandarin radio news broadcasts form spoken documents that are indexed by word and syllable recognition. The information retrieval engine performs matching in both word and syllable scales. The English queries contain many named entities that tend to be out-of-vocabulary words for machine translation and speech recognition, and are omitted in retrieval. Names are often transliterated across languages and are generally important for retrieval. We present a technique that takes in a name spelling and automatically generates a phonetic cognate in terms of Chinese syllables to be used in retrieval. Experiments show consistent retrieval performance improvement by including the use of named entities in this way.

...read moreread less

226 citations

Automatic Cross-Language Retrieval Using Latent Semantic Indexing

[...]

Susan T. Dumais, Todd A. Letsche, Michael L. Littman, Thomas K. Landauer

01 Jan 1997

TL;DR: A method for fully automated cross-language document retrieval in which no query translation is required and this automatic method performs comparably to a retrieval method based on machine translation (MT-LSI).

...read moreread less

Abstract: We describe a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multilingual semantic space using Latent Semantic Indexing (LSI). Strong test results for the cross-language LSI (CLLSI) method are presented for a new French-English collection. We also provide evidence that this automatic method performs comparably to a retrieval method based on machine translation (MT-LSI), and explore several practical training methods. By all available measures, CL-LSI performs quite well and is widely applicable.

...read moreread less

225 citations

Journal Article•DOI•

Filtered document retrieval with frequency-sorted indexes

[...]

Michael Persin¹, Justin Zobel¹, Ron Sacks-Davis¹•Institutions (1)

RMIT University¹

01 Sep 1996-Journal of the Association for Information Science and Technology

TL;DR: An evaluation technique that uses early recognition of which documents are likely to be highly ranked to reduce costs is proposed and it is shown that frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed.

...read moreread less

Abstract: Ranking techniques are effective at finding answers in document collections but can be expensive to evaluate. We propose an evaluation technique that uses early recognition of which documents are likely to be highly ranked to reduce costs ; for our test data, queries are evaluated in 2% of the memory of the standard implementation without degradation in retrieval effectiveness. Cpu time and disk traffic can also be dramatically reduced by designing inverted indexes explicitly to support the technique. The principle of the index design is that inverted lists are sorted by decreasing within-document frequency rather than by document number, and this method experimentally reduces cpu time and disk traffic to around one third of the original requirement. We also show that frequency sorting can lead to a net reduction in index size, regardless of whether the index is compressed.

...read moreread less

225 citations

Book Chapter•DOI•

Automatic Cross-Language Information Retrieval Using Latent Semantic Indexing

[...]

Michael L. Littman¹, Susan T. Dumais, Thomas K. Landauer²•Institutions (2)

Brown University¹, University of Colorado Boulder²

01 Jan 1998

TL;DR: This work describes a method for fully automated cross-language document retrieval in which no query translation is required and provides some evidence that this automatic method performs comparably to a retrieval method based on machine translation (MT-LSI).

...read moreread less

Abstract: We describe a method for fully automated cross-language document retrieval in which no query translation is required. Queries in one language can retrieve documents in other languages (as well as the original language). This is accomplished by a method that automatically constructs a multi-lingual semantic space using Latent Semantic Indexing (LSI). We present strong preliminary test results for our cross-language LSI (CL-LSI) method for a French-English collection. We also provide some evidence that this automatic method performs comparably to a retrieval method based on machine translation (MT-LSI).

...read moreread less

225 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics