Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Report•DOI•

HARD Track Overview in TREC 2003 High Accuracy Retrieval from Documents

[...]

James Allan

01 Jan 2005

TL;DR: The High Accuracy Retrieval from Documents (HARD) track explores methods for improving the accuracy of document retrieval systems by considering three questions, including: can additional metadata about the query, the searcher, or the context of the search provide more focused and more accurate results?

...read moreread less

Abstract: : The effectiveness of ad-hoc retrieval systems appears to have reached a plateau. After several years of 10% gains every year in TREC, improvements dwindled or even stopped. This lack of progress was undoubtedly one of the reasons behind abandoning suspending the ad-hoc TREC after TREC-9. One plausible reason that document retrieval has been unable to improve is that the nature of the task requires that systems adopt "one size fits all" approaches. Given a query, a system will generally do best to return results that are good for an "average" user. Doing otherwise (i.e., targeting the results for a particular type of user) might result in substantial improvements on a query, but it is just as likely (in a TREC environment) to cause horrible degradation. By ignoring the user (or, more accurately, by treating all users identically), systems cannot possibly advance beyond a particular level of accuracy on average for a specific user. The goal of this track is to bring the user out of hiding, making him or her an integral part of both the search process and the evaluation. Systems do not have just a query to chew on, but also have as much information as possible about the person making the request, ranging from biographical data, through information seeking context, to expected type of result.

...read moreread less

137 citations

Proceedings Article•DOI•

Effective information retrieval using genetic algorithms based matching functions adaptation

[...]

Praveen Pathak¹, Michael D. Gordon², Weiguo Fan²•Institutions (2)

Purdue University¹, University of Michigan²

04 Jan 2000

TL;DR: This work looks at the possibility of applying GAs to adapt various matching functions in order to lead to a better retrieval performance than that obtained by using a single matching function.

...read moreread less

Abstract: Knowledge intensive organizations have vast array of information contained in large document repositories. With the advent of E-commerce and corporate intranets/extranets, these repositories are expected to grow at a fast pace. This explosive growth has led to huge, fragmented, and unstructured document collections. Although it has become easier to collect and store information in document collections, it has become increasingly difficult to retrieve relevant information from these large document collections. This paper addresses the issue of improving retrieval performance (in terms of precision and recall) for retrieval from document collections. There are three important paradigms of research in the area of information retrieval (1R): Probabilistic IR, Knowledge-based IR, and, Artificial Intelligence based techniques like neural networks and symbolic learning. Very few researcher have tried to use evolutionary algorithms like genetic algorithms (GAs). Previous attempts at using GAs have concentrated on modifying document representations or modifying query representations. This work looks at the possibility of applying GAs to adapt various matching functions. It is hoped that such an adaptation of the matching functions in lead to a better retrieval performance than that obtained by using a single matching function. An overall matching function is treated as an weighted combination of scores produced by individual matching functions. This overall score is asked to rank and retrieve documents. Weights associated with individual functions are searched using Genetic Algorithms. The idea is tested on a real document collection called the Cranfield collection. The results look very encouraging.

...read moreread less

136 citations

Patent•

Document retrieval assisting method and system for the same and document retrieval service using the same

[...]

Shingo Nishioka¹, Makoto Iwayama¹, Kazuhiro Ono¹, Akihiko Takano¹, Yoshiki Niwa¹, Atsuko Yamaguchi¹ - Show less +2 more•Institutions (1)

Hitachi¹

29 Apr 2002

TL;DR: In this article, the search results display area and topic word display area adjacently on a retrieval assisting interface, the title information and topic information can be browsed by users; by arranging search results analysis means such as mark title button for emphasizing documents containing designated topic words, along with mark topic word button for emphasis topic words contained in a designated document, users can analyze search results readily from various standpoints.

...read moreread less

Abstract: Achieving efficient analysis of search results, which is required for the examination of search queries, by listing up both title information of a retrieved document group and the whole information. By arranging search results display area and topic word display area adjacently on a retrieval assisting interface, the title information and topic information can be browsed by users; by arranging search results analysis means such as mark title button for emphasizing documents containing designated topic words, along with mark topic word button for emphasizing topic words contained in a designated document, users can analyze search results readily from various standpoints.

...read moreread less

136 citations

Proceedings Article•DOI•

Phrasier: a system for interactive document retrieval using keyphrases

[...]

Steve Jones¹, Mark S. Staveley¹•Institutions (1)

University of Waikato¹

01 Aug 1999

TL;DR: Phrasier, an interactive system for browsing, querying and relating documents within a digital library that exploits keyphrases that have been automatically extracted from source documents to create links to similar documents and to suggest appropriate query phrases to users.

...read moreread less

Abstract: UsersO information needs are often too complex to be effectively expressed in standard query interfaces to full-text retrieval systems. A typical need is to find documents that are similar to a given source document, yet describing the content of a document in a few terms is a difficult task. We describe Phrasier, an interactive system for browsing, querying and relating documents within a digital library. Phrasier exploits keyphrases that have been automatically extracted from source documents to create links to similar documents and to suggest appropriate query phrases to users. PhrasierOs keyphrase-based retrieval engine returns ranked lists of documents that are similar to a given source text. Evaluation indicates that PhrasierOs keyphrase-based retrieval performs as well as full-text retrieval when recall and relevance scores assigned by human assessors are considered.

...read moreread less

136 citations

Patent•

Method and system for selecting documents by measuring document quality

[...]

Charles Elkan¹•Institutions (1)

University of California¹

02 Nov 2001

TL;DR: In this article, a system and method for document filtering and selection based on quality automatically operates to make value judgments for document retrieval, i.e., items of data, e.g. documents, are automatically associated a value.

...read moreread less

Abstract: A system and method for document filtering and selection based on quality automatically operates to make value judgments for document retrieval. Items of data, e.g. documents, are automatically associated a value. Items of data may be then selected based upon value, which is not only for the specific subject or topic requested, but also desirable according to certain criteria, including each document's quality. A specific application of the invention is to a filter for computerized bulletin boards. Many of these systems, also known as discussion groups, have thousands of new messages per day. Readers and human editors do not have time to classify new messages by quality quickly. Messages may be ranked by quality automatically, to perform the same function performed by a human editor or moderator. Values and qualities may be assigned by interestingness, appropriateness, timeliness, humor, style of language, obscenity, sentiment, and any combinations thereof, for example.

...read moreread less

135 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics