scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Proceedings ArticleDOI
28 Jul 2003
TL;DR: The process of producing a test collection for patent retrieval, the NTCIR-3 Patent Retrieval Collection, is described, which includes two years of Japanese patent applications and 31 topics produced by professional patent searchers, and experimental results obtained are reported.
Abstract: Reflecting the rapid growth in the utilization of large test collections for information retrieval since the 1990s, extensive comparative experiments have been performed to explore the effectiveness of various retrieval models. However, most collections were intended for retrieving newspaper articles and technical abstracts. In this paper, we describe the process of producing a test collection for patent retrieval, the NTCIR-3 Patent Retrieval Collection, which includes two years of Japanese patent applications and 31 topics produced by professional patent searchers. We also report experimental results obtained by using this collection to re-examine the effectiveness of existing retrieval models in the context of patent retrieval. The relative superiority among existing retrieval models did not significantly differ depending on the document genre, that is, patents and newspaper articles. Issues related to patent retrieval are also discussed.

57 citations

Book ChapterDOI
21 Aug 2003
TL;DR: The monolingual, bilingual, and multilingual retrieval experiments using the CLEF 2003 test collection show that document translation- based retrieval is slightly better than the query translation-based retrieval on the CLEFs.
Abstract: This paper describes monolingual, bilingual, and multilingual retrieval experiments using the CLEF 2003 test collection. The paper compares query translation-based multilingual retrieval with document translation-based multilingual retrieval where the documents are translated into the query language by translating the document words individually using machine translation systems or statistical translation lexicons derived from parallel texts. The multilingual retrieval results show that document translation-based retrieval is slightly better than the query translation-based retrieval on the CLEF 2003 test collection. Furthermore, combining query translation and document translation in multilingual retrieval achieves even better performance.

57 citations

Patent
03 Jun 2002
TL;DR: In this article, a document search and retrieval system and program product therefor is described, where search requests are provided to the system through a user interface, and a document decomposer decomposes documents into individual document components.
Abstract: A document search and retrieval system and program product therefor. Search requests are provided to the system through a user interface. A document decomposer decomposes documents into individual document components. Document components and corresponding searchable indices for each are stored in a Component Library. A search unit searches stored document components responsive to search queries. A results validator compares document hitlists with a document type identified in a search query to select valid hitlists entries for a final hitlist. A document view assembly module collects identified document components and assembles them into a document for view at the user interface.

57 citations

Proceedings ArticleDOI
30 Mar 2008
TL;DR: It is shown that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model and shown that the homogeneity measures are effective means for integrating document-query and passage-query similarity information for document retrieval.
Abstract: We show that several previously proposed passage-based document ranking principles, along with some new ones, can be derived from the same probabilistic model. We use language models to instantiate specific algorithms, and propose a passage language model that integrates information from the ambient document to an extent controlled by the estimated document homogeneity. Several document-homogeneity measures that we propose yield passage language models that are more effective than the standard passage model for basic document retrieval and for constructing and utilizing passage-based relevance models; the latter outperform a document-based relevance model. We also show that the homogeneity measures are effective means for integrating document-query and passage-query similarity information for document retrieval.

57 citations

Journal ArticleDOI
Gerard Salton1
TL;DR: A comparison was made of the performance in an automatic information retrieval environment of user queries and document abstracts available in natural language form in both English and French to indicate that the automatic indexing and retrieval techniques used appear equally effective in handling the query and document texts in both languages.

57 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111