Topic

Document retrieval

About: Document retrieval is a research topic. Over the lifetime, 6821 publications have been published within this topic receiving 214383 citations.

...read moreread less

Papers published on a yearly basis

1 / 2

Papers

PDF

Open Access

More filters

Proceedings Article•

Overview of the TREC 2005 Question Answering Track

[...]

Ellen M. Voorhees, Hoa Trang Dang

01 Jan 2005

TL;DR: The TREC-2005 QA track as discussed by the authors has three tasks: the main question answering task, the document ranking task, and the relationship task, which is the same as the single TREC 2004 QA task.

...read moreread less

Abstract: The TREC 2005 Question Answering track contained three tasks: the main question answering task, the document ranking task, and the relationship task. The main task was the same as the single TREC 2004 QA task. In the main task, question series were used to define a set of targets. Eac h series was about a single target and contained factoid and list questions. The final question in the series was an “Ot her” question that asked for additional information about the target that was not covered by previous questions in the series. The document ranking task was to return a ranked list of documents for each question from a subset of the questions in the main task, where the documents were thought to contain an answer to the question. In the relationship tas k, systems were given TREC-like topic statements that ended with a question asking for evidence for a particular relationship. The goal of the TREC question answering (QA) track is to foster research on systems that return answers themselves, rather than documents containing answers, in response to a question. The track started in TREC-8 (1999), with the first several editions of the track focused on factoid questions. A factoid question is a fact-based, short answer question such as How many calories are there in a Big Mac? . The task in the TREC 2003 QA track was a combined task that contained list and definition questions in additio n to factoid questions [1]. A list question asks for differen t instances of a particular kind of information to be returned , such as List the names of chewing gums . Answering such questions requires a system to assemble an answer from information located in multiple documents. A definition question asks for interesting information about a particul ar person or thing such as Who is Vlad the Impaler?or What is a golden parachute?. Definition questions also require systems to locate inform ation in multiple documents, but in this case the information of interest is much less crisply de lineated. The TREC 2004 test set contained factoid and list questions grouped into different series, where each series had the target of a definition associated with it [2]. Each questi on in a series asked for some information about the target. In addition, the final question in each series was an explicit “Other” question, which was to be interpreted as “Tell me other interesting things about this target I don’t know enou gh to ask directly”. This last question is roughly equivalen t to the definition questions in the TREC 2003 task. Several concerns regarding the TREC 2005 QA track were raised during the TREC 2004 QA breakout session. Since the TREC 2004 task was rather different from previous years’ tasks, there was the desire to repeat the task largely unchanged. There was also the desire to build infrastructure that would allow a closer examination of the role document retrieval techniques play in supporting QA technology. As a result of this discussion, the main task for the 2005 QA track was decided to be essentially the same as the 2004 task in that the test set would consist of a set of questio n series where each series asks for information regarding a particular target. As in TREC 2004, the targets included people, organizations, and other entities (things); unlike TREC 2004 the target could also be an event. Events were added since the document set from which the answers are to be drawn are newswire articles. The runs were evaluated using the same methodology as in TREC 2004, except that the primary measure was the per-series score instead of the combined component score. The document ranking task was added to the TREC 2005 track to address the concern regarding document retrieval and QA. The task was to submit, for a subset of 50 of the questions in the main task, a ranked list of up to 1000 documents for each question. Groups whose primary emphasis was document retrieval rather than QA, were allowed to participate in the document ranking task without submitting actual answers for the main task. However, all TREC 2005 submissions to the main task were required to include a ranked list of documents for each question in the document

...read moreread less

130 citations

Proceedings Article•DOI•

Automatic phrase indexing for document retrieval

[...]

Joel L. Fagan¹•Institutions (1)

Cornell University¹

01 Nov 1987

TL;DR: An automatic phrase indexing method based on the term discrimination model is described, and the results of retrieval experiments on five document collections are presented.

...read moreread less

Abstract: An automatic phrase indexing method based on the term discrimination model is described, and the results of retrieval experiments on five document collections are presented. Problems related to this non-syntactic phrase construction method are discussed, and some possible solutions are proposed that make use of information about the syntactic structure of document and query texts.

...read moreread less

130 citations

Journal Article•

Information Seeking in Full-Text End-User-Oriented Search Systems: The Roles of Domain and Search Expertise

[...]

Gary Marchionini

01 Dec 1993-Library & Information Science Research

TL;DR: A series of studies explored the effects of domain expertise and search expertise in hypertext or full-text CD-ROM databases to investigate how highly interactive electronic access to primary information affects information seeking.

...read moreread less

130 citations

Proceedings Article•DOI•

Image retrieval evaluation

[...]

J.R. Smith¹•Institutions (1)

IBM¹

21 Jun 1998

TL;DR: This work addresses the growing need for establishing a common content-based image retrieval test-bed and establishes a benchmark set of images and queries for this type of retrieval.

...read moreread less

Abstract: One of the most significant problems in content-based image retrieval results from the lack of a common test-bed for researchers. Although many published articles report on content-based retrieval results using color photographs, there has been little effort in establishing a benchmark set of images and queries. Doing so would have many benefits in advancing the technology and utility of content-based image retrieval systems. We address the growing need for establishing a common content-based image retrieval test-bed.

...read moreread less

129 citations

Journal Article•DOI•

Stemming and lemmatization: A comparison of retrieval performances

[...]

Vimala Balakrishnan, Ethel Lloyd-Yemoh

01 Apr 2014

TL;DR: Overall the findings suggest that language modeling techniques improves document retrieval, with lemmatization technique producing the best result.

...read moreread less

Abstract: The current study proposes to compare document retrieval precision performances based on language modeling techniques, particularly stemming and lemmatization. Stemming is a procedure to reduce all words with the same stem to a common form whereas lemmatization removes inflectional endings and returns the base or dictionary form of a word. Comparisons were also made between these two techniques with a baseline ranking algorithm (i.e. with no language processing). A search engine was developed and the algorithms were tested based on a test collection. Both mean average precisions and histograms indicate stemming and lemmatization to outperform the baseline algorithm. As for the language modeling techniques, lemmatization produced better precision compared to stemming, however the differences are insignificant. Overall the findings suggest that language modeling techniques improves document retrieval, with lemmatization technique producing the best result.

...read moreread less

129 citations

Collapse

Network Information

Performance

Metrics

6,866

Papers

224,605

Citations

No. of papers in the topic in previous years
Year	Papers
2023	9
2022	39
2021	107
2020	130
2019	144
2018	111

Document retrieval

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics