Conference
European Conference on Information Retrieval
About: European Conference on Information Retrieval is an academic conference. The conference publishes majorly in the area(s): Computer science & Ranking (information retrieval). Over the lifetime, 2006 publications have been published by the conference receiving 37931 citations.
Topics: Computer science, Ranking (information retrieval), Relevance (information retrieval), Query expansion, Task (project management)
Papers published on a yearly basis
Papers
More filters
••
21 Mar 2005
TL;DR: A probabilistic setting is used which allows us to obtain posterior distributions on these performance indicators, rather than point estimates, and is applied to the case where different methods are run on different datasets from the same source.
Abstract: We address the problems of 1/ assessing the confidence of the standard point estimates, precision, recall and F-score, and 2/ comparing the results, in terms of precision, recall and F-score, obtained using two different methods. To do so, we use a probabilistic setting which allows us to obtain posterior distributions on these performance indicators, rather than point estimates. This framework is applied to the case where different methods are run on different datasets from the same source, as well as the standard situation where competing results are obtained on the same data.
1,402 citations
••
18 Apr 2011TL;DR: This paper empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling, and finds interesting and useful findings for downstream IR or DM applications.
Abstract: Twitter as a new form of social media can potentially contain much useful information, but content analysis on Twitter has not been well studied. In particular, it is not clear whether as an information source Twitter can be simply regarded as a faster news feed that covers mostly the same information as traditional news media. In This paper we empirically compare the content of Twitter with a traditional news medium, New York Times, using unsupervised topic modeling. We use a Twitter-LDA model to discover topics from a representative sample of the entire Twitter. We then use text mining techniques to compare these Twitter topics with topics from New York Times, taking into consideration topic categories and types. We also study the relation between the proportions of opinionated tweets and retweets and topic categories and types. Our comparisons show interesting and useful findings for downstream IR or DM applications.
1,193 citations
••
28 Mar 2010
TL;DR: The characteristics needed in an information retrieval (IR) test collection to facilitate the evaluation of integrated search, i.e. search across a range of different sources but with one search box and one ranked result list, are discussed and a new test collection is described and analyses.
Abstract: The poster discusses the characteristics needed in an information retrieval (IR) test collection to facilitate the evaluation of integrated search, i.e. search across a range of different sources but with one search box and one ranked result list, and describes and analyses a new test collection constructed for this purpose. The test collection consists of approx. 18,000 monographic records, 160,000 papers and journal articles in PDF and 275,000 abstracts with a varied set of metadata and vocabularies from the physics domain, 65 topics based on real work tasks and corresponding graded relevance assessments. The test collection may be used for systems- as well as user-oriented evaluation.
1,039 citations
••
02 Apr 2007TL;DR: This work defines a general framework for inference in summarization and presents three algorithms: a greedy approximate method, a dynamic programming approach based on solutions to the knapsack problem, and an exact algorithm that uses an Integer Linear Programming formulation of the problem.
Abstract: In this work we study the theoretical and empirical properties of various global inference algorithms for multi-document summarization. We start by defining a general framework for inference in summarization. We then present three algorithms: The first is a greedy approximate method, the second a dynamic programming approach based on solutions to the knapsack problem, and the third is an exact algorithm that uses an Integer Linear Programming formulation of the problem. We empirically evaluate all three algorithms and show that, relative to the exact solution, the dynamic programming algorithm provides near optimal results with preferable scaling properties.
382 citations
••
02 Apr 2007TL;DR: This work formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log, and provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency.
Abstract: Measuring the similarity between documents and queries has been extensively studied in information retrieval However, there are a growing number of tasks that require computing the similarity between two very short segments of text These tasks include query reformulation, sponsored search, and image retrieval Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency
354 citations