scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Two programs are described, INDEX and INDEXD, which locate repeated phrases in a document, gather statistical information about them, and rank them according to their value as index phrases, showing promise as the basis for a sophisticated conceptual indexing system.
Abstract: In recent years researchers have become increasingly convinced that the performance of information retrieval systems can be greatly enhanced by the use of key phrases for automatic conceptual document indexing and retrieval. In this article we describe two programs, INDEX and INDEXD, which locate repeated phrases in a document, gather statistical information about them, and rank them according to their value as index phrases. The programs show promise as the basis for a sophisticated conceptual indexing system. The simpler program, INDEX, ranks phrases in such a way that frequently occurring phrases which contain several frequently occurring words are given a high ranking. INDEXD is an extension of INDEX which incorporates a dictionary for stemming, weighting of words and validation of syntax of output phrases. Sample output of both programs is included, and we discuss plans to combine INDEXD with linguistic and artificial intelligence techniques to provide a general conceptual phrase-indexing system that can incorporate expert knowledge about a given application area. © 1990 John Wiley & Sons, Inc.

39 citations

01 Jan 1999
TL;DR: This paper presents a meta-modelling architecture suitable for media asset management systems in corporations, audio-visual broadcast servers, and personal media servers for consumers that was designed for media search on the Web.
Abstract: Multimedia search and retrieval has become an active research field thanks to the increasing demand that accompanies many new practical applications. The applications include largescale multimedia search engines on the Web, media asset management systems in corporations, audio-visual broadcast servers, and personal media servers for consumers. Diverse requirements derived from these applications impose great challenges and incentives for research in this field.

39 citations

Proceedings ArticleDOI
Hamed Zamani1, Nick Craswell1
25 Jul 2020
TL;DR: Macaw is an open-source framework with a modular architecture for CIS research that supports multi-turn, multi-modal, and mixed-initiative interactions, and enables research for tasks such as document retrieval, question answering, recommendation, and structured data exploration.
Abstract: Conversational information seeking (CIS) has been recognized as a major emerging research area in information retrieval. Such research will require data and tools, to allow the implementation and study of conversational systems. This paper introduces Macaw, an open-source framework with a modular architecture for CIS research. Macaw supports multi-turn, multi-modal, and mixed-initiative interactions, and enables research for tasks such as document retrieval, question answering, recommendation, and structured data exploration. It has a modular design to encourage the study of new CIS algorithms, which can be evaluated in batch mode. It can also integrate with a user interface, which allows user studies and data collection in an interactive mode, where the back end can be fully algorithmic or a wizard of oz setup. Macaw is distributed under the MIT License.

39 citations

Patent
23 Oct 2001
TL;DR: In this article, a retrieval support device capable of automatically obtaining output results of a retrieval engine specialized in a specific field, without imposing burden of selecting the retrieval engine on a user, is presented.
Abstract: PROBLEM TO BE SOLVED: To provide a retrieval support device capable of automatically obtaining output results of a retrieval engine specialized in a specific field, without imposing burden of selecting the retrieval engine on a user. SOLUTION: Retrieval input sentences acquired from a client terminal device are parsed and matched with example sentences registered in advance. These example sentences are associated with the retrieval engine specialized in the specific field. Based on this, the retrieval input sentence and retrieval engine can be selected. Next, a key word is extracted from the retrieval input sentence, and if necessary, the key word is subjected to predetermined processing, such as an abbreviated designation, is converted into a formal nomenclature to create a retrieval request sentence. By using the retrieval request sentence, retrieval is requested to the preselected retrieval engine, and retrieval results are presented at the client terminal device. COPYRIGHT: (C)2003,JPO

39 citations

Journal ArticleDOI
TL;DR: Experimental results show that the method consistently achieves better retrieval performance than using only the 1-best transcripts in statistical retrieval, outperforms a recently proposed lattice-based vector space retrieval method, and also compares favorably with a lattICE-based retrieval method based on the Okapi BM25 model.
Abstract: Recent research efforts on spoken document retrieval have tried to overcome the low quality of 1-best automatic speech recognition transcripts, especially in the case of conversational speech, by using statistics derived from speech lattices containing multiple transcription hypotheses as output by a speech recognizer. We present a method for lattice-based spoken document retrieval based on a statistical n-gram modeling approach to information retrieval. In this statistical lattice-based retrieval (SLBR) method, a smoothed statistical model is estimated for each document from the expected counts of words given the information in a lattice, and the relevance of each document to a query is measured as a probability under such a model. We investigate the efficacy of our method under various parameter settings of the speech recognition and lattice processing engines, using the Fisher English Corpus of conversational telephone speech. Experimental results show that our method consistently achieves better retrieval performance than using only the 1-best transcripts in statistical retrieval, outperforms a recently proposed lattice-based vector space retrieval method, and also compares favorably with a lattice-based retrieval method based on the Okapi BM25 model.

39 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111