scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Proceedings ArticleDOI
Peng Yu1, Frank Seide1
04 Oct 2004
TL;DR: In this paper, the authors presented a system for phonetic indexing and searching of spontaneous speech based on phoneme lattices and combined it with word-based search into a hybrid approach.
Abstract: For efficient organization of speech recordings – meetings, interviews, voice mails, and lectures – being able to search for spoken keywords is essential. Today, most spoken document retrieval systems use large-vocabulary recognition. For the above scenarios, such systems suffer from the unpredictable domain, out-ofvocabulary queries, and generally high word-error rate (WER). In [1], we presented a system for phonetic indexing and searching of spontaneous speech. It is vocabulary-independent and based on phoneme lattices. In the present paper, we propose to combine it with word-based search into a hybrid approach. We explore two methods of combination: posterior combination (merging search results of a word-based and a phoneme-based system) and prior combination (combining word and phoneme language models and vocabularies to form a hybrid recognizer). The search accuracy of our best purely phonetic baseline is 64% (Figure of Merit), and our purely word-based baselines are below 50%. The new hybrid approach achieves 73%, if the recognizer uses a language model that matches the test-set domain. With a mismatched language model, 71% is achieved. Our results show that the proposed hybrid model benefits from the best of two worlds: Word-level language context and robustness of phonetic search to unknown words and domain mismatch.

87 citations

Journal ArticleDOI
TL;DR: A methodology using genetic programming to discover new ranking functions for the Web-based information-seeking task and the retrieval performance of these newly discovered ranking functions has been found to be superior to the performance obtained by well-known ranking strategies in the information retrieval literature.
Abstract: Web search engines have become an integral part of the daily life of a knowledge worker, who depends on these search engines to retrieve relevant information from the Web or from the company's vast document databases. Current search engines are very fast in terms of their response time to a user query. But their usefulness to the user in terms of retrieval performance leaves a lot to be desired. Typically, the user has to sift through a lot of nonrelevant documents to get only a few relevant ones for the user's information needs. Ranking functions play a very important role in the search engine retrieval performance. In this paper, we describe a methodology using genetic programming to discover new ranking functions for the Web-based information-seeking task. We exploit the content as well as structural information in the Web documents in the discovery process. The discovery process is carried out for both the ad hoc task and the routing task in retrieval. For either of the retrieval tasks, the retrieval performance of these newly discovered ranking functions has been found to be superior to the performance obtained by well-known ranking strategies in the information retrieval literature.

86 citations

Proceedings ArticleDOI
01 Jun 2018
TL;DR: In this paper, the authors support the interdependencies between fact checking, document retrieval, source credibility, stance detection and rationale extraction as annotations in the same corpus, and implement this setup on an Arabic fact checking corpus.
Abstract: A reasonable approach for fact checking a claim involves retrieving potentially relevant documents from different sources (e.g., news websites, social media, etc.), determining the stance of each document with respect to the claim, and finally making a prediction about the claim’s factuality by aggregating the strength of the stances, while taking the reliability of the source into account. Moreover, a fact checking system should be able to explain its decision by providing relevant extracts (rationales) from the documents. Yet, this setup is not directly supported by existing datasets, which treat fact checking, document retrieval, source credibility, stance detection and rationale extraction as independent tasks. In this paper, we support the interdependencies between these tasks as annotations in the same corpus. We implement this setup on an Arabic fact checking corpus, the first of its kind.

86 citations

Proceedings ArticleDOI
01 Jul 1993
TL;DR: Using structured queries, the character-based indexing performed retrieval as well as, or slightly better, than the word-based system, which has practical significance since the character's speed is considerably faster than the traditional word- based indexing.
Abstract: A series of Japanese full-text retrieval experiments were conducted using an inference network document retrieval model. The retrieval performance of two major indexing methods, character-based and word-based, were evaluated. Using structured queries, the character-based indexing performed retrieval as well as, or slightly better, than the word-based system. This result has practical significance since the character-based indexing speed is considerably faster than the traditional word-based indexing. All the queries in this experiment were automatically formulated from natural language input.

86 citations

Book ChapterDOI
21 May 1996
TL;DR: This paper explores the possibility of extending traditional information retrieval systems with knowledge-based approaches to automatically expand natural language queries and shows that an increase in retrieval performance can be obtained using certain knowledge- based approaches.
Abstract: Textual information is becoming increasingly available in electronic forms Users need tools to sift through non-relevant information and retrieve only those pieces relevant to their needs The traditional methods such as Boolean operators and key terms have somehow reached thek limitations An emerging trend is to combine the traditional information retrieval and artificial intelligence techniques This paper explores the possibility of extending traditional information retrieval systems with knowledge-based approaches to automatically expand natural language queries Two types of knowledge-bases, a domain-specific and a general world knowledge, are used in the expansion process Experiments are also conducted using different search strategies and various combinations of the knowledge-bases Our results show that an increase in retrieval performance can be obtained using certain knowledge-based approaches

86 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111