Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

A hybrid word / phoneme-based approach for improved vocabulary-independent search in spontaneous speech.

[...]

Peng Yu¹, Frank Seide¹•Institutions (1)

Microsoft¹

04 Oct 2004

TL;DR: In this paper, the authors presented a system for phonetic indexing and searching of spontaneous speech based on phoneme lattices and combined it with word-based search into a hybrid approach.

...read moreread less

Abstract: For efficient organization of speech recordings – meetings, interviews, voice mails, and lectures – being able to search for spoken keywords is essential. Today, most spoken document retrieval systems use large-vocabulary recognition. For the above scenarios, such systems suffer from the unpredictable domain, out-ofvocabulary queries, and generally high word-error rate (WER). In [1], we presented a system for phonetic indexing and searching of spontaneous speech. It is vocabulary-independent and based on phoneme lattices. In the present paper, we propose to combine it with word-based search into a hybrid approach. We explore two methods of combination: posterior combination (merging search results of a word-based and a phoneme-based system) and prior combination (combining word and phoneme language models and vocabularies to form a hybrid recognizer). The search accuracy of our best purely phonetic baseline is 64% (Figure of Merit), and our purely word-based baselines are below 50%. The new hybrid approach achieves 73%, if the recognizer uses a language model that matches the test-set domain. With a mismatched language model, 71% is achieved. Our results show that the proposed hybrid model benefits from the best of two worlds: Word-level language context and robustness of phonetic search to unknown words and domain mismatch.

...read moreread less

87 citations

Journal Article•DOI•

Genetic Programming-Based Discovery of Ranking Functions for Effective Web Search

[...]

Weiguo Fan¹, Michael D. Gordon², Praveen Pathak²•Institutions (2)

Virginia Tech¹, University of Michigan²

01 Apr 2005-Journal of Management Information Systems

TL;DR: A methodology using genetic programming to discover new ranking functions for the Web-based information-seeking task and the retrieval performance of these newly discovered ranking functions has been found to be superior to the performance obtained by well-known ranking strategies in the information retrieval literature.

...read moreread less

Abstract: Web search engines have become an integral part of the daily life of a knowledge worker, who depends on these search engines to retrieve relevant information from the Web or from the company's vast document databases. Current search engines are very fast in terms of their response time to a user query. But their usefulness to the user in terms of retrieval performance leaves a lot to be desired. Typically, the user has to sift through a lot of nonrelevant documents to get only a few relevant ones for the user's information needs. Ranking functions play a very important role in the search engine retrieval performance. In this paper, we describe a methodology using genetic programming to discover new ranking functions for the Web-based information-seeking task. We exploit the content as well as structural information in the Web documents in the discovery process. The discovery process is carried out for both the ad hoc task and the routing task in retrieval. For either of the retrieval tasks, the retrieval performance of these newly discovered ranking functions has been found to be superior to the performance obtained by well-known ranking strategies in the information retrieval literature.

...read moreread less

86 citations

Proceedings Article•DOI•

Integrating Stance Detection and Fact Checking in a Unified Corpus

[...]

Ramy Baly¹, Mitra Mohtarami¹, James Glass¹, Lluís Màrquez², Alessandro Moschitti², Preslav Nakov² - Show less +2 more•Institutions (2)

Massachusetts Institute of Technology¹, Qatar Computing Research Institute²

01 Jun 2018

TL;DR: In this paper, the authors support the interdependencies between fact checking, document retrieval, source credibility, stance detection and rationale extraction as annotations in the same corpus, and implement this setup on an Arabic fact checking corpus.

...read moreread less

Abstract: A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e.g., news websites, social media, etc.), determining the stance of each document with respect to the claim, and finally making a prediction about the claim’s factuality by aggregating the strength of the stances, while taking the reliability of the source into account. Moreover, a fact checking system should be able to explain its decision by providing relevant extracts (rationales) from the documents. Yet, this setup is not directly supported by existing datasets, which treat fact checking, document retrieval, source credibility, stance detection and rationale extraction as independent tasks. In this paper, we support the interdependencies between these tasks as annotations in the same corpus. We implement this setup on an Arabic fact checking corpus, the first of its kind.

...read moreread less

86 citations

Proceedings Article•DOI•

A comparison of indexing techniques for Japanese text retrieval

[...]

Hideo Fujii, W. Bruce Croft

01 Jul 1993

TL;DR: Using structured queries, the character-based indexing performed retrieval as well as, or slightly better, than the word-based system, which has practical significance since the character's speed is considerably faster than the traditional word- based indexing.

...read moreread less

Abstract: A series of Japanese full-text retrieval experiments were conducted using an inference network document retrieval model. The retrieval performance of two major indexing methods, character-based and word-based, were evaluated. Using structured queries, the character-based indexing performed retrieval as well as, or slightly better, than the word-based system. This result has practical significance since the character-based indexing speed is considerably faster than the traditional word-based indexing. All the queries in this experiment were automatically formulated from natural language input.

...read moreread less

86 citations

Book Chapter•DOI•

Knowledge-Based Approaches to Query Expansion in Information Retrieval

[...]

Richard C. Bodner¹, Fei Song¹•Institutions (1)

University of Guelph¹

21 May 1996

TL;DR: This paper explores the possibility of extending traditional information retrieval systems with knowledge-based approaches to automatically expand natural language queries and shows that an increase in retrieval performance can be obtained using certain knowledge- based approaches.

...read moreread less

Abstract: Textual information is becoming increasingly available in electronic forms Users need tools to sift through non-relevant information and retrieve only those pieces relevant to their needs The traditional methods such as Boolean operators and key terms have somehow reached thek limitations An emerging trend is to combine the traditional information retrieval and artificial intelligence techniques This paper explores the possibility of extending traditional information retrieval systems with knowledge-based approaches to automatically expand natural language queries Two types of knowledge-bases, a domain-specific and a general world knowledge, are used in the expansion process Experiments are also conducted using different search strategies and various combinations of the knowledge-bases Our results show that an increase in retrieval performance can be obtained using certain knowledge-based approaches

...read moreread less

86 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics