Showing papers presented at "International ACM SIGIR Conference on Research and Development in Information Retrieval in 2006"

PDF

Open Access

Proceedings Article•DOI•

LDA-based document models for ad-hoc retrieval

[...]

Xing Wei¹, W. Bruce Croft¹•Institutions (1)

06 Aug 2006

TL;DR: This paper proposes an LDA-based document model within the language modeling framework, and evaluates it on several TREC collections, and shows that improvements over retrieval using cluster-based models can be obtained with reasonable efficiency.

...read moreread less

Abstract: Search algorithms incorporating some form of topic model have a long history in information retrieval. For example, cluster-based retrieval has been studied since the 60s and has recently produced good results in the language model framework. An approach to building topic models based on a formal generative model of documents, Latent Dirichlet Allocation (LDA), is heavily cited in the machine learning literature, but its feasibility and effectiveness in information retrieval is mostly unknown. In this paper, we study how to efficiently use LDA to improve ad-hoc retrieval. We propose an LDA-based document model within the language modeling framework, and evaluate it on several TREC collections. Gibbs sampling is employed to conduct approximate inference in LDA and the computational complexity is analyzed. We show that improvements over retrieval using cluster-based models can be obtained with reasonable efficiency.

...read moreread less

1,148 citations

Proceedings Article•DOI•

Improving web search ranking by incorporating user behavior information

[...]

Eugene Agichtein¹, Eric D. Brill¹, Susan T. Dumais¹•Institutions (1)

Microsoft¹

06 Aug 2006

TL;DR: In this paper, the authors show that incorporating implicit feedback can augment other features, improving the accuracy of a competitive web search ranking algorithm by as much as 31% relative to the original performance.

...read moreread less

Abstract: We show that incorporating user behavior data can significantly improve ordering of top results in real web search setting. We examine alternatives for incorporating feedback into the ranking process and explore the contributions of user feedback compared to other common web search features. We report results of a large scale evaluation over 3,000 queries and 12 million user interactions with a popular web search engine. We show that incorporating implicit feedback can augment other features, improving the accuracy of a competitive web search ranking algorithms by as much as 31% relative to the original performance.

...read moreread less

1,119 citations

Proceedings Article•DOI•

Unifying user-based and item-based collaborative filtering approaches by similarity fusion

[...]

Jun Wang¹, Arjen P. de Vries¹, Marcel J. T. Reinders¹•Institutions (1)

Delft University of Technology¹

06 Aug 2006

TL;DR: This paper re-formulates the memory-based collaborative filtering problem in a generative probabilistic framework, treating individual user-item ratings as predictors of missing ratings and shows that the proposed methods are indeed more robust against data sparsity and give better recommendations.

...read moreread less

Abstract: Memory-based methods for collaborative filtering predict new ratings by averaging (weighted) ratings between, respectively, pairs of similar users or items. In practice, a large number of ratings from similar users or similar items are not available, due to the sparsity inherent to rating data. Consequently, prediction quality can be poor. This paper re-formulates the memory-based collaborative filtering problem in a generative probabilistic framework, treating individual user-item ratings as predictors of missing ratings. The final rating is estimated by fusing predictions from three sources: predictions based on ratings of the same item by other users, predictions based on different item ratings made by the same user, and, third, ratings predicted based on data from other but similar users rating other but similar items. Existing user-based and item-based approaches correspond to the two simple cases of our framework. The complete model is however more robust to data sparsity, because the different types of ratings are used in concert, while additional ratings from similar users towards similar items are employed as a background model to smooth the predictions. Experiments demonstrate that the proposed methods are indeed more robust against data sparsity and give better recommendations.

...read moreread less

716 citations

Proceedings Article•DOI•

Adapting ranking SVM to document retrieval

[...]

Yunbo Cao¹, Jun Xu², Tie-Yan Liu¹, Hang Li¹, Yalou Huang², Hsiao-Wuen Hon¹ - Show less +2 more•Institutions (2)

Microsoft¹, Nankai University²

06 Aug 2006

TL;DR: Experimental results show that the modifications made in conventional Ranking SVM can outperform the conventional ranking SVM and other existing methods for document retrieval on two datasets and employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming.

...read moreread less

Abstract: The paper is concerned with applying learning to rank to document retrieval. Ranking SVM is a typical method of learning to rank. We point out that there are two factors one must consider when applying Ranking SVM, in general a "learning to rank" method, to document retrieval. First, correctly ranking documents on the top of the result list is crucial for an Information Retrieval system. One must conduct training in a way that such ranked results are accurate. Second, the number of relevant documents can vary from query to query. One must avoid training a model biased toward queries with a large number of relevant documents. Previously, when existing methods that include Ranking SVM were applied to document retrieval, none of the two factors was taken into consideration. We show it is possible to make modifications in conventional Ranking SVM, so it can be better used for document retrieval. Specifically, we modify the "Hinge Loss" function in Ranking SVM to deal with the problems described above. We employ two methods to conduct optimization on the loss function: gradient descent and quadratic programming. Experimental results show that our method, referred to as Ranking SVM for IR, can outperform the conventional Ranking SVM and other existing methods for document retrieval on two datasets.

...read moreread less

648 citations

Proceedings Article•DOI•

Formal models for expert finding in enterprise corpora

[...]

Krisztian Balog¹, Leif Azzopardi², Maarten de Rijke¹•Institutions (2)

University of Amsterdam¹, University of Strathclyde²

06 Aug 2006

TL;DR: This work presents two general strategies to expert searching given a document collection which are formalized using generative probabilistic models, and shows that the second strategy consistently outperforms the first.

...read moreread less

Abstract: Searching an organization's document repositories for experts provides a cost effective solution for the task of expert finding. We present two general strategies to expert searching given a document collection which are formalized using generative probabilistic models. The first of these directly models an expert's knowledge based on the documents that they are associated with, whilst the second locates documents on topic, and then finds the associated expert. Forming reliable associations is crucial to the performance of expert finding systems. Consequently, in our evaluation we compare the different approaches, exploring a variety of associations along with other operational parameters (such as topicality). Using the TREC Enterprise corpora, we show that the second strategy consistently outperforms the first. A comparison against other unsupervised techniques, reveals that our second model delivers excellent performance.

...read moreread less

624 citations

Proceedings Article•DOI•

Learning user interaction models for predicting web search result preferences

[...]

Eugene Agichtein¹, Eric D. Brill¹, Susan T. Dumais¹, Robert J. Ragno¹•Institutions (1)

Microsoft¹

06 Aug 2006

TL;DR: This work presents a real-world study of modeling the behavior of web search users to predict web search result preferences and generalizes the approach to model user behavior beyond clickthrough, which results in higher preference prediction accuracy than models based on clickthrough information alone.

...read moreread less

Abstract: Evaluating user preferences of web search results is crucial for search engine development, deployment, and maintenance. We present a real-world study of modeling the behavior of web search users to predict web search result preferences. Accurate modeling and interpretation of user behavior has important applications to ranking, click spam detection, web search personalization, and other tasks. Our key insight to improving robustness of interpreting implicit feedback is to model query-dependent deviations from the expected "noisy" user behavior. We show that our model of clickthrough interpretation improves prediction accuracy over state-of-the-art clickthrough methods. We generalize our approach to model user behavior beyond clickthrough, which results in higher preference prediction accuracy than models based on clickthrough information alone. We report results of a large-scale experimental evaluation that show substantial improvements over published implicit feedback interpretation methods.

...read moreread less

575 citations

Proceedings Article•DOI•

Finding near-duplicate web pages: a large-scale evaluation of algorithms

[...]

Monika Henzinger¹•Institutions (1)

Google¹

06 Aug 2006

TL;DR: A combined algorithm is presented which achieves precision 0.79 with 79% of the recall of the other algorithms, and since Charikar's algorithm finds more near-duplicate pairs on different sites, it achieves a better precision overall than Broder et al.'s algorithm.

...read moreread less

Abstract: Broder et al.'s [3] shingling algorithm and Charikar's [4] random projection based approach are considered "state-of-the-art" algorithms for finding near-duplicate web pages. Both algorithms were either developed at or used by popular web search engines. We compare the two algorithms on a very large scale, namely on a set of 1.6B distinct web pages. The results show that neither of the algorithms works well for finding near-duplicate pairs on the same site, while both achieve high precision for near-duplicate pairs on different sites. Since Charikar's algorithm finds more near-duplicate pairs on different sites, it achieves a better precision overall, namely 0.50 versus 0.38 for Broder et al.'s algorithm. We present a combined algorithm which achieves precision 0.79 with 79% of the recall of the other algorithms.

...read moreread less

506 citations

Proceedings Article•DOI•

Identifying comparative sentences in text documents

[...]

Nitin Jindal¹, Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

06 Aug 2006

TL;DR: This paper first categorizes comparative sentences into different types, and then presents a novel integrated pattern discovery and supervised learning approach to identifying comparative sentences from text documents.

...read moreread less

Abstract: This paper studies the problem of identifying comparative sentences in text documents. The problem is related to but quite different from sentiment/opinion sentence identification or classification. Sentiment classification studies the problem of classifying a document or a sentence based on the subjective opinion of the author. An important application area of sentiment/opinion identification is business intelligence as a product manufacturer always wants to know consumers' opinions on its products. Comparisons on the other hand can be subjective or objective. Furthermore, a comparison is not concerned with an object in isolation. Instead, it compares the object with others. An example opinion sentence is "the sound quality of CD player X is poor". An example comparative sentence is "the sound quality of CD player X is not as good as that of CD player Y". Clearly, these two sentences give different information. Their language constructs are quite different too. Identifying comparative sentences is also useful in practice because direct comparisons are perhaps one of the most convincing ways of evaluation, which may even be more important than opinions on each individual object. This paper proposes to study the comparative sentence identification problem. It first categorizes comparative sentences into different types, and then presents a novel integrated pattern discovery and supervised learning approach to identifying comparative sentences from text documents. Experiment results using three types of documents, news articles, consumer reviews of products, and Internet forum postings, show a precision of 79% and recall of 81%. More detailed results are given in the paper.

...read moreread less

457 citations

Proceedings Article•DOI•

User performance versus precision measures for simple search tasks

[...]

Andrew Turpin¹, Falk Scholer¹•Institutions (1)

RMIT University¹

06 Aug 2006

TL;DR: This study evaluates two different information retrieval tasks on TREC Web-track data: a precision-based user task, measured by the length of time that users need to find a single document that is relevant to a TREC topic; and, a simple recall-based task, represented by the total number of relevant documents that users can identify within five minutes.

...read moreread less

Abstract: Several recent studies have demonstrated that the type of improvements in information retrieval system effectiveness reported in forums such as SIGIR and TREC do not translate into a benefit for users. Two of the studies used an instance recall task, and a third used a question answering task, so perhaps it is unsurprising that the precision based measures of IR system effectiveness on one-shot query evaluation do not correlate with user performance on these tasks. In this study, we evaluate two different information retrieval tasks on TREC Web-track data: a precision-based user task, measured by the length of time that users need to find a single document that is relevant to a TREC topic; and, a simple recall-based task, represented by the total number of relevant documents that users can identify within five minutes. Users employ search engines with controlled mean average precision (MAP) of between 55% and 95%. Our results show that there is no significant relationship between system effectiveness measured by MAP and the precision-based task. A significant, but weak relationship is present for the precision at one document returned metric. A weak relationship is present between MAP and the simple recall-based task.

...read moreread less

414 citations

Proceedings Article•DOI•

A framework to predict the quality of answers with non-textual features

[...]

Jiwoon Jeon¹, W. Bruce Croft¹, Joon Ho Lee, Soyeon Park²•Institutions (2)

University of Massachusetts Amherst¹, Duksung Women's University²

06 Aug 2006

TL;DR: This paper presents a framework to use non-textual features to predict the quality of documents and shows the quality measure can be successfully incorporated into the language modeling-based retrieval model.

...read moreread less

Abstract: New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we present a framework to use non-textual features to predict the quality of documents. We also show our quality measure can be successfully incorporated into the language modeling-based retrieval model. We test our approach on a collection of question and answer pairs gathered from a community based question answering service where people ask and answer questions. Experimental results using our quality measure show a significant improvement over our baseline.

...read moreread less

383 citations

Proceedings Article•DOI•

Less is more: probabilistic models for retrieving fewer relevant documents

[...]

Harr Chen¹, David R. Karger¹•Institutions (1)

Massachusetts Institute of Technology¹

06 Aug 2006

TL;DR: This work considers a number of information retrieval metrics from the literature, including the rank of the first relevant result, the %no metric that penalizes a system only for retrieving no relevant results near the top, and the diversity of retrieved results when queries have multiple interpretations.

...read moreread less

Abstract: Traditionally, information retrieval systems aim to maximize the number of relevant documents returned to a user within some window of the top. For that goal, the probability ranking principle, which ranks documents in decreasing order of probability of relevance, is provably optimal. However, there are many scenarios in which that ranking does not optimize for the users information need. One example is when the user would be satisfied with some limited number of relevant documents, rather than needing all relevant documents. We show that in such a scenario, an attempt to return many relevant documents can actually reduce the chances of finding any relevant documents. We consider a number of information retrieval metrics from the literature, including the rank of the first relevant result, the %no metric that penalizes a system only for retrieving no relevant results near the top, and the diversity of retrieved results when queries have multiple interpretations. We observe that given a probabilistic model of relevance, it is appropriate to rank so as to directly optimize these metrics in expectation. While doing so may be computationally intractable, we show that a simple greedy optimization algorithm that approximately optimizes the given objectives produces rankings for TREC queries that outperform the standard approach based on the probability ranking principle.

...read moreread less

Journal Article•DOI•

The Wikipedia XML corpus

[...]

Ludovic Denoyer, Patrick Gallinari

01 Jun 2006

TL;DR: This encyclopedia is composed of millions of articles in different languages and anyone can edit an article using a wiki markup language that offers a simplified alternative to HTML.

...read moreread less

Abstract: Wikipedia is a well known free content, multilingual encyclopedia written collaboratively by contributors around the world. Anybody can edit an article using a wiki markup language that offers a simplified alternative to HTML. This encyclopedia is composed of millions of articles in different languages.

...read moreread less

Proceedings Article•DOI•

Building bridges for web query classification

[...]

Dou Shen¹, Jian-Tao Sun², Qiang Yang¹, Zheng Chen²•Institutions (2)

Hong Kong University of Science and Technology¹, Microsoft²

06 Aug 2006

TL;DR: A novel approach for QC is presented that outperforms the winning solution of the ACM KDDCUP 2005 competition and introduces category selection as a new method for narrowing down the scope of the intermediate taxonomy based on which the authors classify the queries.

...read moreread less

Abstract: Web query classification (QC) aims to classify Web users' queries, which are often short and ambiguous, into a set of target categories QC has many applications including page ranking in Web search, targeted advertisement in response to queries, and personalization In this paper, we present a novel approach for QC that outperforms the winning solution of the ACM KDDCUP 2005 competition, whose objective is to classify 800,000 real user queries In our approach, we first build a bridging classifier on an intermediate taxonomy in an offline mode This classifier is then used in an online mode to map user queries to the target categories via the above intermediate taxonomy A major innovation is that by leveraging the similarity distribution over the intermediate taxonomy, we do not need to retrain a new classifier for each new set of target categories, and therefore the bridging classifier needs to be trained only once In addition, we introduce category selection as a new method for narrowing down the scope of the intermediate taxonomy based on which we classify the queries Category selection can improve both efficiency and effectiveness of the online classification By combining our algorithm with the winning solution of KDDCUP 2005, we made an improvement by 97% and 38% in terms of precision and F1 respectively compared with the best results of KDDCUP 2005

...read moreread less

Proceedings Article•DOI•

Improving personalized web search using result diversification

[...]

Filip Radlinski¹, Susan T. Dumais²•Institutions (2)

Cornell University¹, Microsoft²

06 Aug 2006

TL;DR: Three methods to increase the diversity of the top results are proposed and evaluated and the effectiveness of these methods is evaluated.

...read moreread less

Abstract: We present and evaluate methods for diversifying search results to improve personalized web search. A common personalization approach involves reranking the top N search results such that documents likely to be preferred by the user are presented higher. The usefulness of reranking is limited in part by the number and diversity of results considered. We propose three methods to increase the diversity of the top results and evaluate the effectiveness of these methods.

...read moreread less

Proceedings Article•DOI•

Large scale semi-supervised linear SVMs

[...]

Vikas Sindhwani¹, S. Sathiya Keerthi²•Institutions (2)

University of Chicago¹, Yahoo!²

06 Aug 2006

TL;DR: An implementation of Transductive SVM (TSVM) that is significantly more efficient and scalable than currently used dual techniques, for linear classification problems involving large, sparse datasets, and a variant of TSVM that involves multiple switching of labels.

...read moreread less

Abstract: Large scale learning is often realistic only in a semi-supervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In many information retrieval and data mining applications, linear classifiers are strongly preferred because of their ease of implementation, interpretability and empirical performance. In this work, we present a family of semi-supervised linear support vector classifiers that are designed to handle partially-labeled sparse datasets with possibly very large number of examples and features. At their core, our algorithms employ recently developed modified finite Newton techniques. Our contributions in this paper are as follows: (a) We provide an implementation of Transductive SVM (TSVM) that is significantly more efficient and scalable than currently used dual techniques, for linear classification problems involving large, sparse datasets. (b) We propose a variant of TSVM that involves multiple switching of labels. Experimental results show that this variant provides an order of magnitude further improvement in training efficiency. (c) We present a new algorithm for semi-supervised learning based on a Deterministic Annealing (DA) approach. This algorithm alleviates the problem of local minimum in the TSVM optimization procedure while also being computationally attractive. We conduct an empirical study on several document classification tasks which confirms the value of our methods in large scale semi-supervised settings.

...read moreread less

Proceedings Article•DOI•

Minimal test collections for retrieval evaluation

[...]

Ben Carterette¹, James Allan¹, Ramesh K. Sitaraman¹•Institutions (1)

University of Massachusetts Amherst¹

06 Aug 2006

TL;DR: This work links evaluation with test collection construction to gain an understanding of the minimal judging effort that must be done to have high confidence in the outcome of an evaluation.

...read moreread less

Abstract: Accurate estimation of information retrieval evaluation metrics such as average precision require large sets of relevance judgments. Building sets large enough for evaluation of real-world implementations is at best inefficient, at worst infeasible. In this work we link evaluation with test collection construction to gain an understanding of the minimal judging effort that must be done to have high confidence in the outcome of an evaluation. A new way of looking at average precision leads to a natural algorithm for selecting documents to judge and allows us to estimate the degree of confidence by defining a distribution over possible document judgments. A study with annotators shows that this method can be used by a small group of researchers to rank a set of systems in under three hours with 95% confidence. Information retrieval metrics such as average precision require large sets of relevance judgments to be accurately estimated. Building these sets is infeasible and often inefficient for many real-world retrieval implementations. We present a new way of looking at average precision that allows us to estimate the confidence in an evaluation based on the size of the test collection. We use this to build an algorithm for selecting the best documents to judge to have maximum confidence in an evaluation with a minimal number of relevance judgments. A study with annotators shows how the algorithm can be used by a small group of researchers to quickly rank a set of systems with 95% confidence.

...read moreread less

Proceedings Article•DOI•

A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization

[...]

Ani Nenkova¹, Lucy Vanderwende², Kathleen McKeown¹•Institutions (2)

Stanford University¹, Microsoft²

06 Aug 2006

TL;DR: The research shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensitivity improves performance and significantly reduces repetition.

...read moreread less

Abstract: The usual approach for automatic summarization is sentence extraction, where key sentences from the input documents are selected based on a suite of features. While word frequency often is used as a feature in summarization, its impact on system performance has not been isolated. In this paper, we study the contribution to summarization of three factors related to frequency: content word frequency, composition functions for estimating sentence importance from word frequency, and adjustment of frequency weights based on context. We carry out our analysis using datasets from the Document Understanding Conferences, studying not only the impact of these features on automatic summarizers, but also their role in human summarization. Our research shows that a frequency based summarizer can achieve performance comparable to that of state-of-the-art systems, but only with a good composition function; context sensitivity improves performance and significantly reduces repetition.

...read moreread less

Proceedings Article•DOI•

Building a test collection for complex document information processing

[...]

David Lewis, Gady Agam¹, Shlomo Argamon¹, Ophir Frieder¹, David A. Grossman¹, Jefferson Heard¹ - Show less +2 more•Institutions (1)

Illinois Institute of Technology¹

06 Aug 2006

TL;DR: A 1.5 terabyte dataset is assembled to support evaluation of both end-to-end complex document information processing (CDIP) tasks (e.g., text retrieval and data mining) as well as component technologies such as optical character recognition (OCR), document structure analysis, signature matching, and authorship attribution.

...read moreread less

Abstract: Research and development of information access technology for scanned paper documents has been hampered by the lack of public test collections of realistic scope and complexity As part of a project to create a prototype system for search and mining of masses of document images, we are assembling a 15 terabyte dataset to support evaluation of both end-to-end complex document information processing (CDIP) tasks (eg, text retrieval and data mining) as well as component technologies such as optical character recognition (OCR), document structure analysis, signature matching, and authorship attribution

...read moreread less

Proceedings Article•DOI•

Regularized estimation of mixture models for robust pseudo-relevance feedback

[...]

Tao Tao¹, ChengXiang Zhai¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

06 Aug 2006

TL;DR: A more robust method for pseudo feedback based on statistical language models to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the originalquery.

...read moreread less

Abstract: Pseudo-relevance feedback has proven to be an effective strategy for improving retrieval accuracy in all retrieval models. However the performance of existing pseudo feedback methods is often affected significantly by some parameters, such as the number of feedback documents to use and the relative weight of original query terms; these parameters generally have to be set by trial-and-error without any guidance. In this paper, we present a more robust method for pseudo feedback based on statistical language models. Our main idea is to integrate the original query with feedback documents in a single probabilistic mixture model and regularize the estimation of the language model parameters in the model so that the information in the feedback documents can be gradually added to the original query. Unlike most existing feedback methods, our new method has no parameter to tune. Experiment results on two representative data sets show that the new method is significantly more robust than a state-of-the-art baseline language modeling approach for feedback with comparable or better retrieval accuracy.

...read moreread less

Proceedings Article•DOI•

Improving the estimation of relevance models using large external corpora

[...]

Fernando Diaz¹, Donald Metzler¹•Institutions (1)

University of Massachusetts Amherst¹

06 Aug 2006

TL;DR: The results show that using a high quality corpus that is comparable to the evaluation corpus can be as, if not more, effective than using the web.

...read moreread less

Abstract: Information retrieval algorithms leverage various collection statistics to improve performance. Because these statistics are often computed on a relatively small evaluation corpus, we believe using larger, non-evaluation corpora should improve performance. Specifically, we advocate incorporating external corpora based on language modeling. We refer to this process as external expansion. When compared to traditional pseudo-relevance feedback techniques, external expansion is more stable across topics and up to 10% more effective in terms of mean average precision. Our results show that using a high quality corpus that is comparable to the evaluation corpus can be as, if not more, effective than using the web. Our results also show that external expansion outperforms simulated relevance feedback. In addition, we propose a method for predicting the extent to which external expansion will improve retrieval performance. Our new measure demonstrates positive correlation with improvements in mean average precision.

...read moreread less

Journal Article•DOI•

A reference collection for web spam

[...]

Carlos Castillo¹, Debora Donato¹, Luca Becchetti, Paolo Boldi, Stefano Leonardi, Massimo Santini, Sebastiano Vigna - Show less +3 more•Institutions (1)

Yahoo!¹

01 Dec 2006

TL;DR: This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.

...read moreread less

Abstract: We describe the WEBSPAM-UK2006 collection, a large set of Web pages that have been manually annotated with labels indicating if the hosts are include Web spam aspects or not. This is the first publicly available Web spam collection that includes page contents and links, and that has been labelled by a large and diverse set of judges.

...read moreread less

Proceedings Article•DOI•

Evaluating evaluation metrics based on the bootstrap

[...]

Tetsuya Sakai¹•Institutions (1)

Toshiba¹

06 Aug 2006

TL;DR: This paper describes how the Bootstrap approach to statistics can be applied to the evaluation of IR effectiveness metrics, and argues that Bootstrap Hypothesis Tests deserve more attention from the IR community, as they are based on fewer assumptions than traditional statistical significance tests.

...read moreread less

Abstract: This paper describes how the Bootstrap approach to statistics can be applied to the evaluation of IR effectiveness metrics. First, we argue that Bootstrap Hypothesis Tests deserve more attention from the IR community, as they are based on fewer assumptions than traditional statistical significance tests. We then describe straightforward methods for comparing the sensitivity of IR metrics based on Bootstrap Hypothesis Tests. Unlike the heuristics-based "swap" method proposed by Voorhees and Buckley, our method estimates the performance difference required to achieve a given significance level directly from Bootstrap Hypothesis Test results. In addition, we describe a simple way of examining the accuracy of rank correlation between two metrics based on the Bootstrap Estimate of Standard Error. We demonstrate the usefulness of our methods using test collections and runs from the NTCIR CLIR track for comparing seven IR metrics, including those that can handle graded relevance and those based on the Geometric Mean.

...read moreread less

Proceedings Article•DOI•

What makes a query difficult

[...]

David Carmel¹, Elad Yom-Tov¹, Adam Darlow¹, Dan Pelleg¹•Institutions (1)

IBM¹

06 Aug 2006

TL;DR: This work addresses a novel model that captures the main components of a topic and the relationship between those components and topic difficulty and demonstrates the applicability of the difficulty model for several uses such as predicting query difficulty, predicting the number of topic aspects expected to be covered by the search results, and analyzing the findability of a specific domain.

...read moreread less

Abstract: This work tries to answer the question of what makes a query difficult. It addresses a novel model that captures the main components of a topic and the relationship between those components and topic difficulty. The three components of a topic are the textual expression describing the information need (the query or queries), the set of documents relevant to the topic (the Qrels), and the entire collection of documents. We show experimentally that topic difficulty strongly depends on the distances between these components. In the absence of knowledge about one of the model components, the model is still useful by approximating the missing component based on the other components. We demonstrate the applicability of the difficulty model for several uses such as predicting query difficulty, predicting the number of topic aspects expected to be covered by the search results, and analyzing the findability of a specific domain.

...read moreread less

Proceedings Article•DOI•

Type less, find more: fast autocompletion search with a succinct index

[...]

Holger Bast¹, Ingmar Weber¹•Institutions (1)

Max Planck Society¹

06 Aug 2006

TL;DR: A new indexing data structure is presented that uses no more space than a state-of-the-art compressed inverted index, but with 10 times faster query processing times.

...read moreread less

Abstract: We consider the following full-text search autocompletion feature. Imagine a user of a search engine typing a query. Then with every letter being typed, we would like an instant display of completions of the last query word which would lead to good hits. At the same time, the best hits for any of these completions should be displayed. Known indexing data structures that apply to this problem either incur large processing times for a substantial class of queries, or they use a lot of space. We present a new indexing data structure that uses no more space than a state-of-the-art compressed inverted index, but with 10 times faster query processing times. Even on the large TREC Terabyte collection, which comprises over 25 million documents, we achieve, on a single machine and with the index on disk, average response times of one tenth of a second. We have built a full-fledged, interactive search engine that realizes the proposed autocompletion feature combined with support for proximity search, semi-structured (XML) text, subword and phrase completion, and semantic tags.

...read moreread less

Proceedings Article•DOI•

Term proximity scoring for ad-hoc retrieval on very large text collections

[...]

Stefan Büttcher¹, Charles L. A. Clarke¹, Brad Lushman¹•Institutions (1)

University of Waterloo¹

06 Aug 2006

TL;DR: The relative retrieval effectiveness of the retrieval method, compared to pure BM25, varies from collection to collection, and it is shown that for stemmed queries the impact of term proximity scoring is larger than for unstemmed queries.

...read moreread less

Abstract: We propose an integration of term proximity scoring into Okapi BM25. The relative retrieval effectiveness of our retrieval method, compared to pure BM25, varies from collection to collection.We present an experimental evaluation of our method and show that the gains achieved over BM25 as the size of the underlying text collection increases. We also show that for stemmed queries the impact of term proximity scoring is larger than for unstemmed queries.

...read moreread less

Proceedings Article•DOI•

Contextual search and name disambiguation in email using graphs

[...]

Einat Minkov¹, William W. Cohen¹, Andrew Y. Ng²•Institutions (2)

Carnegie Mellon University¹, Stanford University²

06 Aug 2006

TL;DR: This paper provides a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph and shows that reranking schemes based on the graph-walk similarity measures often outperform baseline methods and that further improvements can be obtained by use of appropriate learning methods.

...read moreread less

Abstract: Similarity measures for text have historically been an important tool for solving information retrieval problems. In many interesting settings, however, documents are often closely connected to other documents, as well as other non-textual objects: for instance, email messages are connected to other messages via header information. In this paper we consider extended similarity metrics for documents and other objects embedded in graphs, facilitated via a lazy graph walk. We provide a detailed instantiation of this framework for email data, where content, social networks and a timeline are integrated in a structural graph. The suggested framework is evaluated for two email-related problems: disambiguating names in email documents, and threading. We show that reranking schemes based on the graph-walk similarity measures often outperform baseline methods, and that further improvements can be obtained by use of appropriate learning methods.

...read moreread less

Proceedings Article•

Design Recommendations for Hierarchical Faceted Search Interfaces

[...]

M. Hearst

01 Jan 2006

Proceedings Article•DOI•

Graph-based text classification: learn from your neighbors

[...]

Ralitsa Angelova¹, Gerhard Weikum¹•Institutions (1)

Max Planck Society¹

06 Aug 2006

TL;DR: This paper presents a new method for graph-based classification, with particular emphasis on hyperlinked text documents but broader applicability, based on iterative relaxation labeling and can be combined with either Bayesian or SVM classifiers on the feature spaces of the given data items.

...read moreread less

Abstract: Automatic classification of data items, based on training samples, can be boosted by considering the neighborhood of data items in a graph structure (e.g., neighboring documents in a hyperlink environment or co-authors and their publications for bibliographic data entries). This paper presents a new method for graph-based classification, with particular emphasis on hyperlinked text documents but broader applicability. Our approach is based on iterative relaxation labeling and can be combined with either Bayesian or SVM classifiers on the feature spaces of the given data items. The graph neighborhood is taken into consideration to exploit locality patterns while at the same time avoiding overfitting. In contrast to prior work along these lines, our approach employs a number of novel techniques: dynamically inferring the link/class pattern in the graph in the run of the iterative relaxation labeling, judicious pruning of edges from the neighborhood graph based on node dissimilarities and node degrees, weighting the influence of edges based on a distance metric between the classification labels of interest and weighting edges by content similarity measures. Our techniques considerably improve the robustness and accuracy of the classification outcome, as shown in systematic experimental comparisons with previously published methods on three different real-world datasets.

...read moreread less

Proceedings Article•DOI•

Pruned query evaluation using pre-computed impacts

[...]

Vo Anh¹, Alistair Moffat¹•Institutions (1)

University of Melbourne¹

06 Aug 2006

TL;DR: This work proposes new pruning methods that make use of impact-sorted indexes, and these methods reduce the amount of computation performed, reduction of memory required for accumulators, and amount of data transferred from disk, and at the same time allow performance guarantees in terms of precision and mean average precision.

...read moreread less

Abstract: Exhaustive evaluation of ranked queries can be expensive, particularly when only a small subset of the overall ranking is required, or when queries contain common terms. This concern gives rise to techniques for dynamic query pruning, that is, methods for eliminating redundant parts of the usual exhaustive evaluation, yet still generating a demonstrably "good enough" set of answers to the query. In this work we propose new pruning methods that make use of impact-sorted indexes. Compared to exhaustive evaluation, the new methods reduce the amount of computation performed, reduce the amount of memory required for accumulators, reduce the amount of data transferred from disk, and at the same time allow performance guarantees in terms of precision and mean average precision. These strong claims are backed by experiments using the TREC Terabyte collection and queries.

...read moreread less

Proceedings Article•DOI•

Learning to advertise

[...]

Anisio Lacerda¹, Marco Cristo¹, Marcos André Gonçalves¹, Weiguo Fan², Nivio Ziviani¹, Berthier Ribeiro-Neto¹ - Show less +2 more•Institutions (2)

Universidade Federal de Minas Gerais¹, Virginia Tech²

06 Aug 2006

TL;DR: A new framework for associating ads with web pages based on Genetic Programming (GP), which aims at learning functions that select the most appropriate ads, given the contents of a Web page to optimize overall precision and minimize the number of misplacements.

...read moreread less

Abstract: Content-targeted advertising, the task of automatically associating ads to a Web page, constitutes a key Web monetization strategy nowadays. Further, it introduces new challenging technical problems and raises interesting questions. For instance, how to design ranking functions able to satisfy conflicting goals such as selecting advertisements (ads) that are relevant to the users and suitable and profitable to the publishers and advertisers? In this paper we propose a new framework for associating ads with web pages based on Genetic Programming (GP). Our GP method aims at learning functions that select the most appropriate ads, given the contents of a Web page. These ranking functions are designed to optimize overall precision and minimize the number of misplacements. By using a real ad collection and web pages from a newspaper, we obtained a gain over a state-of-the-art baseline method of 61.7% in average precision. Further, by evolving individuals to provide good ranking estimations, GP was able to discover ranking functions that are very effective in placing ads in web pages while avoiding irrelevant ones.

...read moreread less

Collapse