scispace - formally typeset
Search or ask a question
Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.


Papers
More filters
ReportDOI
01 Jan 2005
TL;DR: The High Accuracy Retrieval from Documents (HARD) track explores methods for improving the accuracy of document retrieval systems by considering three questions, including: can additional metadata about the query, the searcher, or the context of the search provide more focused and more accurate results?
Abstract: : The effectiveness of ad-hoc retrieval systems appears to have reached a plateau. After several years of 10% gains every year in TREC, improvements dwindled or even stopped. This lack of progress was undoubtedly one of the reasons behind abandoning suspending the ad-hoc TREC after TREC-9. One plausible reason that document retrieval has been unable to improve is that the nature of the task requires that systems adopt "one size fits all" approaches. Given a query, a system will generally do best to return results that are good for an "average" user. Doing otherwise (i.e., targeting the results for a particular type of user) might result in substantial improvements on a query, but it is just as likely (in a TREC environment) to cause horrible degradation. By ignoring the user (or, more accurately, by treating all users identically), systems cannot possibly advance beyond a particular level of accuracy on average for a specific user. The goal of this track is to bring the user out of hiding, making him or her an integral part of both the search process and the evaluation. Systems do not have just a query to chew on, but also have as much information as possible about the person making the request, ranging from biographical data, through information seeking context, to expected type of result.

137 citations

Proceedings ArticleDOI
04 Jan 2000
TL;DR: This work looks at the possibility of applying GAs to adapt various matching functions in order to lead to a better retrieval performance than that obtained by using a single matching function.
Abstract: Knowledge intensive organizations have vast array of information contained in large document repositories. With the advent of E-commerce and corporate intranets/extranets, these repositories are expected to grow at a fast pace. This explosive growth has led to huge, fragmented, and unstructured document collections. Although it has become easier to collect and store information in document collections, it has become increasingly difficult to retrieve relevant information from these large document collections. This paper addresses the issue of improving retrieval performance (in terms of precision and recall) for retrieval from document collections. There are three important paradigms of research in the area of information retrieval (1R): Probabilistic IR, Knowledge-based IR, and, Artificial Intelligence based techniques like neural networks and symbolic learning. Very few researcher have tried to use evolutionary algorithms like genetic algorithms (GAs). Previous attempts at using GAs have concentrated on modifying document representations or modifying query representations. This work looks at the possibility of applying GAs to adapt various matching functions. It is hoped that such an adaptation of the matching functions in lead to a better retrieval performance than that obtained by using a single matching function. An overall matching function is treated as an weighted combination of scores produced by individual matching functions. This overall score is asked to rank and retrieve documents. Weights associated with individual functions are searched using Genetic Algorithms. The idea is tested on a real document collection called the Cranfield collection. The results look very encouraging.

136 citations

Patent
29 Apr 2002
TL;DR: In this article, the search results display area and topic word display area adjacently on a retrieval assisting interface, the title information and topic information can be browsed by users; by arranging search results analysis means such as mark title button for emphasizing documents containing designated topic words, along with mark topic word button for emphasis topic words contained in a designated document, users can analyze search results readily from various standpoints.
Abstract: Achieving efficient analysis of search results, which is required for the examination of search queries, by listing up both title information of a retrieved document group and the whole information. By arranging search results display area and topic word display area adjacently on a retrieval assisting interface, the title information and topic information can be browsed by users; by arranging search results analysis means such as mark title button for emphasizing documents containing designated topic words, along with mark topic word button for emphasizing topic words contained in a designated document, users can analyze search results readily from various standpoints.

136 citations

Proceedings ArticleDOI
01 Aug 1999
TL;DR: Phrasier, an interactive system for browsing, querying and relating documents within a digital library that exploits keyphrases that have been automatically extracted from source documents to create links to similar documents and to suggest appropriate query phrases to users.
Abstract: UsersO information needs are often too complex to be effectively expressed in standard query interfaces to full-text retrieval systems. A typical need is to find documents that are similar to a given source document, yet describing the content of a document in a few terms is a difficult task. We describe Phrasier, an interactive system for browsing, querying and relating documents within a digital library. Phrasier exploits keyphrases that have been automatically extracted from source documents to create links to similar documents and to suggest appropriate query phrases to users. PhrasierOs keyphrase-based retrieval engine returns ranked lists of documents that are similar to a given source text. Evaluation indicates that PhrasierOs keyphrase-based retrieval performs as well as full-text retrieval when recall and relevance scores assigned by human assessors are considered.

136 citations

Patent
02 Nov 2001
TL;DR: In this article, a system and method for document filtering and selection based on quality automatically operates to make value judgments for document retrieval, i.e., items of data, e.g. documents, are automatically associated a value.
Abstract: A system and method for document filtering and selection based on quality automatically operates to make value judgments for document retrieval. Items of data, e.g. documents, are automatically associated a value. Items of data may be then selected based upon value, which is not only for the specific subject or topic requested, but also desirable according to certain criteria, including each document's quality. A specific application of the invention is to a filter for computerized bulletin boards. Many of these systems, also known as discussion groups, have thousands of new messages per day. Readers and human editors do not have time to classify new messages by quality quickly. Messages may be ranked by quality automatically, to perform the same function performed by a human editor or moderator. Values and qualities may be assigned by interestingness, appropriateness, timeliness, humor, style of language, obscenity, sentiment, and any combinations thereof, for example.

135 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
81% related
Metadata
43.9K papers, 642.7K citations
79% related
Recommender system
27.2K papers, 598K citations
79% related
Ontology (information science)
57K papers, 869.1K citations
78% related
Natural language
31.1K papers, 806.8K citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20239
202239
2021107
2020130
2019144
2018111