scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
Proceedings ArticleDOI
20 Jul 2008
TL;DR: In this paper, the authors address a specific enterprise document search scenario, where the information need is expressed using a short query (of a few keywords) together with examples of key reference pages, and investigate how the examples can be utilized to improve the end-to-end performance on the document retrieval task.
Abstract: We address a specific enterprise document search scenario, where the information need is expressed in an elaborate manner. In our scenario, information needs are expressed using a short query (of a few keywords) together with examples of key reference pages. Given this setup, we investigate how the examples can be utilized to improve the end-to-end performance on the document retrieval task. Our approach is based on a language modeling framework, where the query model is modified to resemble the example pages. We compare several methods for sampling expansion terms from the example pages to support query-dependent and query-independent query expansion; the latter is motivated by the wish to increase "aspect recall", and attempts to uncover aspects of the information need not captured by the query.For evaluation purposes we use the CSIRO data set created for the TREC 2007 Enterprise track. The best performance is achieved by query models based on query-independent sampling of expansion terms from the example documents.

45 citations

Journal ArticleDOI
TL;DR: The earher model is extended to include interactions among terms, which allows one to decide whether to retrieve a document by taking into consideration occurrences of all the words in the text.
Abstract: This paper begins with a review of earher work in which a model of word occurrence formed the basis of a decision-making procedure for indexing or, more generally, retrieving documents in response to a request In the earlier work words were considered individually This paper extends the earher model to include interactions among terms The elaborated model allows one to decide whether to retrieve a document by taking into consideration occurrences of all the words in the text Retrieval in response to Boolean expresstons IS also considered, as are procedures for ranking documents in accordance with their assessed relevance to a request The discussion is within the framework of Bayesian decision theory

45 citations

Journal ArticleDOI
TL;DR: Different types of similarity like lexical similarity, semantic similarity etc. are described, which play an important role in the categorization of text as well as document.
Abstract: With large number of documents on the web, there is a increasing need to be able to retrieve the best relevant document. There are different techniques through which we can retrieve most relevant document from the large corpus. Similarity between words, sentences, paragraphs and documents is an important component in various tasks such as information retrieval, document clustering, word-sense disambiguation, automatic essay scoring, short answer grading, machine translation and text summarization. Text similarity means user’s query text is matched with the document text and on the basis on this matching user retrieves the most relevant documents. Text similarity also plays an important role in the categorization of text as well as document. We can measure the similarity between sentences, words, paragraphs and documents to categorize them in an efficient way. On the basis of this categorization, we can retrieve the best relevant document corresponding to user’s query. This paper describes different types of similarity like lexical similarity, semantic similarity etc.

45 citations

Book ChapterDOI
25 Jul 1997
TL;DR: The objective is to provide a tool that helps finding documents related to a given query, such as answers in Frequently Asked Questions databases, by developing a running prototypical system which is currently under practical evaluation.
Abstract: This paper reports about a project on document retrieval in an industrial setting. The objective is to provide a tool that helps finding documents related to a given query, such as answers in Frequently Asked Questions databases. A CBR approach has been used to develop a running prototypical system which is currently under practical evaluation.

45 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111