scispace - formally typeset
Open AccessReportDOI

HARD Track Overview in TREC 2004 High Accuracy Retrieval from Documents

James Allan
TLDR
The High Accuracy Retrieval from Documents (HARD) Track as mentioned in this paper explores methods for improving the accuracy of document retrieval systems by considering three questions: can additional metadata about the query, the searcher, or the context of the search provide more focused and, therefore, more accurate results?
Abstract
: The High Accuracy Retrieval from Documents (HARD) track explores methods for improving the accuracy of document retrieval systems. It does so by considering three questions. Can additional metadata about the query, the searcher, or the context of the search provide more focused and, therefore, more accurate results? These metadata items generally do not directly affect whether or not a document is on topic, but they do affect whether it is relevant. For example, a person looking for introductory material will not find an on-topic but highly technical document relevant. Can highly focused, short-duration, interaction with the searcher be used to improve the accuracy of a system? Participants created "clarification forms" generated in response to a query -- and leveraging any information available in the corpus -- that were filled out by the searcher. Typical clarification questions might ask whether some titles seem relevant, whether some words or names are on topic, or whether a short passage of text is related. Can passage retrieval be used to effectively focus attention on relevant material, increasing accuracy by eliminating unwanted text in an otherwise useful document? For this aspect of the problem, there are challenges in finding relevant passages, but also in determining how best to evaluate the results. The HARD track ran for the second time in TREC 2004. It used a new corpus and a new set of 50 topics for evaluation. All topics included metadata information and clarification forms were considered for each of them. Because of the expense of sub-document relevance judging, only half of the topics were used in the passage-level evaluation. A total of 16 sites participated in HARD, up from 14 sites the year before. Interest remains strong, so the HARD track will run again in TREC 2005, but because of funding uncertainties will only address a subset of the issues.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Improving web search ranking by incorporating user behavior information

TL;DR: In this paper, the authors show that incorporating implicit feedback can augment other features, improving the accuracy of a competitive web search ranking algorithm by as much as 31% relative to the original performance.
Journal ArticleDOI

Exploratory Search:Beyond the Query-Response Paradigm

TL;DR: This lecture introduces exploratory search, relates it to relevant extant research, outline the features of exploratorySearch systems, discuss the evaluation of these systems, and suggest some future directions for supporting exploratorysearch.
Proceedings ArticleDOI

Learning user interaction models for predicting web search result preferences

TL;DR: This work presents a real-world study of modeling the behavior of web search users to predict web search result preferences and generalizes the approach to model user behavior beyond clickthrough, which results in higher preference prediction accuracy than models based on clickthrough information alone.
Book

Methods for Evaluating Interactive Information Retrieval Systems with Users

TL;DR: In this paper, the authors provide an overview and instruction regarding the evaluation of interactive information retrieval systems with users and present core instruments and data collection techniques and measures, as well as a discussion of outstanding challenges and future research directions.
Journal ArticleDOI

Mining meaning from Wikipedia

TL;DR: This article focuses on research that extracts and makes use of the concepts, relations, facts and descriptions found in Wikipedia, and organizes the work into four broad categories: applying Wikipedia to natural language processing; using it to facilitate information retrieval and information extraction; and as a resource for ontology building.
References
More filters
Proceedings ArticleDOI

Retrieval evaluation with incomplete information

TL;DR: It is shown that current evaluation measures are not robust to substantially incomplete relevance judgments, and a new measure is introduced that is both highly correlated with existing measures when complete judgments are available and more robust to incomplete judgment sets.
Proceedings ArticleDOI

Overview of TREC 2004

TL;DR: The thirteenth Text REtrieval Conference, TREC 2004, was held at the National Institute of Standards and Technology (NIST) November 16–19, 2004.