scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Linear combination of component results in information retrieval

Shengli Wu1
01 Jan 2012-Vol. 71, Iss: 1, pp 114-126
TL;DR: This paper uses the multiple linear regression technique with estimated relevance scores and judged scores to obtain suitable weights and shows that the linear combination method with such a weighting strategy steadily outperforms the best component system and other data fusion methods by large margins.
Abstract: In information retrieval, data fusion (also known as meta-search) has been investigated by many researchers. Previous investigation and experimentation demonstrate that the linear combination method is an effective data fusion method for combining multiple information retrieval results. One advantage is its flexibility, since different weights can be assigned to different component systems so as to obtain better fusion results. The key issue is how to assign good weights to all the component retrieval systems involved. Surprisingly, research in this field is limited and it is still an open question. In this paper, we use the multiple linear regression technique with estimated relevance scores and judged scores to obtain suitable weights. Although the multiple linear regression technique is not new, the way of using it in this paper has never been attempted before for the data fusion problem in information retrieval. Our experiments with five groups of runs submitted to TREC show that the linear combination method with such a weighting strategy steadily outperforms the best component system and other data fusion methods including CombSum, CombMNZ, PosFuse, MAPFuse, SegFuse, and the linear combination method with performance level/performance square weighting schemes by large margins.
Citations
More filters
Journal ArticleDOI
01 Feb 2013
TL;DR: A novel expert finding algorithm, ExpertRank, is proposed that evaluates expertise based on both document-based relevance and one's authority in his or her knowledge community and modified the PageRank algorithm to evaluate one's Authority so that it reduces the effect of certain biasing communication behavior in online communities.
Abstract: With increasing knowledge demands and limited availability of expertise and resources within organizations, professionals often rely on external sources when seeking knowledge. Online knowledge communities are Internet based virtual communities that specialize in knowledge seeking and sharing. They provide a virtual media environment where individuals with common interests seek and share knowledge across time and space. A large online community may have millions of participants who have accrued a large knowledge repository with millions of text documents. However, due to the low information quality of user-generated content, it is very challenging to develop an effective knowledge management system for facilitating knowledge seeking and sharing in online communities. Knowledge management literature suggests that effective knowledge management should make accessible not only written knowledge but also experts who are a source of information and can perform a given organizational or social function. Existing expert finding systems evaluate one's expertise based on either the contents of authored documents or one's social status within his or her knowledge community. However, very few studies consider both indicators collectively. In addition, very few studies focus on virtual communities where information quality is often poorer than that in organizational knowledge repositories. In this study we propose a novel expert finding algorithm, ExpertRank, that evaluates expertise based on both document-based relevance and one's authority in his or her knowledge community. We modify the PageRank algorithm to evaluate one's authority so that it reduces the effect of certain biasing communication behavior in online communities. We explore three different expert ranking strategies that combine document-based relevance and authority: linear combination, cascade ranking, and multiplication scaling. We evaluate ExpertRank using a popular online knowledge community. Experiments show that the proposed algorithm achieves the best performance when both document-based relevance and authority are considered.

186 citations

Journal ArticleDOI
Hao Wu1, Yijian Pei1, Bo Li1, Zongzhan Kang1, Xiaoxin Liu1, Hao Li1 
TL;DR: This paper examines if data fusion can be helpful for improving effectiveness of item recommendation in social tagging systems and proposes a hybrid linear combination (HLC) model that is more flexible and robust than traditional data fusion models.
Abstract: Collaborative tagging systems have been popular on the Web. However, information overload results in the increasing need for recommender services from users, and thus item recommendation has been one of the key issues in such systems. In this paper, we examine if data fusion can be helpful for improving effectiveness of item recommendation in these systems. For this, we first summarize the state-of-the-art recommendation methods which are classified into several categories according to their algorithmic principles. Then, we experiment with about 40 recommending components against the datasets from three social tagging systems-Delicious, Lastfm and CiteULike. Based on these, several heuristic data fusion models including rank-based and score-based are used to combine selected components. We also put forward a hybrid linear combination (HLC) model for fusing item recommendation. We use four kinds of evaluation metrics, which respectively consider accuracy, inner-diversity, inter-diversity and novelty, to systematically assess quality of recommendations obtained by various components or fusion models. Depending on experimental results, combining evidence from separate components can lead to performance improvement in the accuracy of recommendations, with a little or without loss of recommendation diversity and novelty, if separate components can suggest similar sets of relevant items but recommend different sets of non-relevant items. Particularly, fusing recommendation sets formed from different combinations of profile representations and similarity functions in user-based and item-based collaborative filtering can significantly improve recommendation accuracy. In addition, some other useful findings are also drawn: (i) Using the tag to represent users profiles or items profiles maybe not as good as profiling users with the item or profiling items with the user, however, exploiting tags in the topic models and random walks can notably improve the accuracy, diversity and novelty of recommendations; (ii) Generally, user-based collaborative filtering, item-based collaborative filtering and random walks methods are robust for the task of item recommendation in social tagging systems, thus can be chosen as the basic components of data fusion process; and (iii) The proposed method (HLC) is more flexible and robust than traditional data fusion models.

33 citations

Proceedings ArticleDOI
27 Jun 2018
TL;DR: The goal of this half day, intermediate-level, tutorial is to provide a methodological view of the theoretical foundations of fusion approaches, the numerous fusion methods that have been devised and a variety of applications for which fusion techniques have been applied.
Abstract: Fusion is an important and central concept in Information Retrieval. The goal of fusion methods is to merge different sources of information so as to address a retrieval task. For example, in the adhoc retrieval setting, fusion methods have been applied to merge multiple document lists retrieved for a query. The lists could be retrieved using different query representations, document representations, ranking functions and corpora. The goal of this half day, intermediate-level, tutorial is to provide a methodological view of the theoretical foundations of fusion approaches, the numerous fusion methods that have been devised and a variety of applications for which fusion techniques have been applied.

23 citations


Cites background from "Linear combination of component res..."

  • ...– The Fusion Hypothesis [14, 31, 32, 56, 57, 81, 94] – Classifier Combination [93] – Fusion Frameworks [3, 53, 55, 88, 96, 99, 100, 102] • Fusion in Practice – Score-based (e....

    [...]

Journal ArticleDOI
Shengli Wu1
TL;DR: A linear discriminant analysis (LDA) based approach to training weights is presented and the empirical investigation finds that Condorcet fusion is a good ranking-based method in good conditions, while weighted Condorcett fusion can make significant improvement over Condorcets fusion when the conditions are not favourable for CondorcET fusion.
Abstract: The Condorcet fusion is a distinctive fusion method and was found useful in information retrieval. Two basic requirements for the Condorcet fusion to improve retrieval effectiveness are: (1) all component systems involved should be more or less equally effective; and (2) each information retrieval system should be developed independently and thus each component result is more or less equally different from the others. These two requirements may not be satisfied in many cases, then weighted Condorcet becomes a good option. However, how to assign weights for the weighted Condorcet has not been investigated. In this paper, we present a linear discriminant analysis (LDA) based approach to training weights. Some properties of Condorcet fusion and weighted Condorcet fusion are discussed. Experiments are conducted with three groups of runs submitted to TREC to evaluate the performance of a group of data fusion methods. The empirical investigation finds that Condorcet fusion is a good ranking-based method in good conditions, while weighted Condorcet fusion can make significant improvement over Condorcet fusion when the conditions are not favourable for Condorcet fusion. The experiments also show that the proposed LDA weighting schema is effective and Condorcet fusion with LDA based weighting schema is more effective than all other data fusion methods involved.

17 citations


Cites background or methods from "Linear combination of component res..."

  • ...…Condorcet Weighted Condorcet Weight assignment Linear discriminant analysis 0306-4573/$ - see front matter 2012 Elsevier Ltd doi:10.1016/j.ipm.2012.02.007 E-mail address: s.wu1@ulster.ac.uk a b s t r a c t The Condorcet fusion is a distinctive fusion method and was found useful in…...

    [...]

  • ...The fusion technique can be useful for various tasks such as routing queries (Bigot, Chrisment, Dkaki, Hubert, & Mothe, 2011), people search (Macdonald, 2009), blog opinion retrieval (Wu, 2012a), automatic ranking a group of retrieval systems (Nuray & Can, 2006) and others....

    [...]

  • ...…& Shaw, 1994), CombMNZ (Fox et al., 1993; Fox & Shaw, 1994), the linear combination method (Bartell, Cottrell, & Belew, 1994; Thompson, 1993; Vogt & Cottrell, 1998; Vogt & Cottrell, 1999; Wu, 2012b; Wu, Bi, Zeng, & Han, 2009), the correlation methods (Wu & McClean, 2006a) are score-based methods....

    [...]

  • ...…a series of power function of performance was investigated in Wu et al. (2009), a weighting schema that considers both system performance and dissimilarity between systems was investigated in Wu and McClean (2006a), and a multiple linear regression based approach was investigated in Wu (2012b)....

    [...]

  • ...…Modlin, & Rao, 1993; Fox & Shaw, 1994), CombMNZ (Fox et al., 1993; Fox & Shaw, 1994), the linear combination methods (Vogt & Cottrell, 1998, 1999; Wu, 2012b), the Borda count (Aslam & Montague, 2001), the Bayesian fusion (Aslam & Montague, 2001), the Condorcet fusion (Montague & Aslam, 2002),…...

    [...]

Journal ArticleDOI
TL;DR: Assessment of the impact of personality in the accurate prediction of followees showed that an accurate appreciation of such predictive factors tied to a quantitative analysis of personality is crucial for guiding the search of potential followees, and thus, enhance recommendations.

16 citations

References
More filters
Proceedings ArticleDOI
01 Apr 2001
TL;DR: A set of techniques for the rank aggregation problem is developed and compared to that of well-known methods, to design rank aggregation techniques that can be used to combat spam in Web searches.
Abstract: We consider the problem of combining ranking results from various sources. In the context of the Web, the main applications include building meta-search engines, combining ranking functions, selecting documents based on multiple criteria, and improving search precision through word associations. We develop a set of techniques for the rank aggregation problem and compare their performance to that of well-known methods. A primary goal of our work is to design rank aggregation techniques that can e ectively combat \spam," a serious problem in Web searches. Experiments show that our methods are simple, e cient, and e ective.

1,982 citations


"Linear combination of component res..." refers methods in this paper

  • ...In information retrieval, the data fusion technique has been investigated by many researchers and quite a few data fusion methods such as CombSum [9,10], CombMNZ [9,10], Borda count [2], Condorcet fusion [19], the correlation method [30], Markov chain-based methods [5,21], the linear combination method [3,26,27,29], the multiple criteria approach [8], and others [6,14,16,23,33], have been presented and investigated....

    [...]

  • ...In those aforementioned methods, CombSum [9,10], CombMNZ [9,10], the correlation method [30], the linear combination method [3,26,27,29] are score-based methods, while Condorcet fusion [19], Markov chain-based methods [5,21], and the multiple criteria approach [8] are rank-based methods....

    [...]

Proceedings Article
01 Jan 1994
TL;DR: This paper describes one method that has been shown to increase performance by combining the similarity values from five different retrieval runs using both vector space and P-norm extended boolean retrieval methods.
Abstract: The TREC-2 project at Virginai Tech focused on methods for combining the evidence from multiple retrieval runs to improve performance over any single retrieval method. This paper describes one such method that has been shown to increase performance by combining the similarity values from five different retrieval runs using both vector space and P-norm extended boolean retrieval methods

1,106 citations


"Linear combination of component res..." refers methods in this paper

  • ...In information retrieval, the data fusion technique has been investigated by many researchers and quite a few data fusion methods such as CombSum [9,10], CombMNZ [9,10], Borda count [2], Condorcet fusion [19], the correlation method [30], Markov chain-based methods [5,21], the linear combination method [3,26,27,29], the multiple criteria approach [8], and others [6,14,16,23,33], have been presented and investigated....

    [...]

  • ...In those aforementioned methods, CombSum [9,10], CombMNZ [9,10], the correlation method [30], the linear combination method [3,26,27,29] are score-based methods, while Condorcet fusion [19], Markov chain-based methods [5,21], and the multiple criteria approach [8] are rank-based methods....

    [...]

  • ...Apart from the linear combination method with the trained weights by multiple Regression (referred to as LCR), CombSum [9,10], CombMNZ [9,10], PosFuse [16], MAPFuse [16], SegFuse [23], the linear combination method with performance level weighting (referred to as LCP [3,24]), and the linear combination method with performance square weighting (referred to as LCP2, [29]) are also involved in the experiment....

    [...]

Proceedings ArticleDOI
01 Sep 2001
TL;DR: The experimental results show that metasearch algorithms based on the Borda and Bayesian models usually outperform the best input system and are competitive with, and often outperform, existing metAsearch strategies.
Abstract: Given the ranked lists of documents returned by multiple search engines in response to a given query, the problem ofmetasearchis to combine these lists in a way which optimizes the performance of the combination. This paper makes three contributions to the problem of metasearch: (1) We describe and investigate a metasearch model based on an optimal democratic voting procedure, the Borda Count; (2) we describe and investigate a metasearch model based on Bayesian inference; and (3) we describe and investigate a model for obtaining upper bounds on the performance of metasearch algorithms. Our experimental results show that metasearch algorithms based on the Borda and Bayesian models usually outperform the best input system and are competitive with, and often outperform, existing metasearch strategies. Finally, our initial upper bounds demonstrate that there is much to learn about the limits of the performance of metasearch.

747 citations


"Linear combination of component res..." refers background or methods in this paper

  • ..., [2,24]) took a simple performance level policy....

    [...]

  • ...Different models [1,2,4,17,20] have been investigated for such a purpose....

    [...]

  • ...In information retrieval, the data fusion technique has been investigated by many researchers and quite a few data fusion methods such as CombSum [9,10], CombMNZ [9,10], Borda count [2], Condorcet fusion [19], the correlation method [30], Markov chain-based methods [5,21], the linear combination method [3,26,27,29], the multiple criteria approach [8], and others [6,14,16,23,33], have been presented and investigated....

    [...]

  • ...Weighted Borda count [2] is a variation of performance level weighting, in which all ranked documents are assigned scores corresponding to their respective Borda counts, and then the linear combination method with the performance level weighting is used for fusion....

    [...]

  • ...For example, Borda count [2] works like this: for a ranked list of t documents, the first document in the list is given a score of t, the second document in the list is given a score of t−1,....

    [...]

Journal ArticleDOI
TL;DR: This paper addresses two crucial issues which have been considered to be a 'black art' in classification tasks ever since the introduction of stacked generalization: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input.
Abstract: Stacked generalization is a general method of using a high-level model to combine lower-level models to achieve greater predictive accuracy In this paper we address two crucial issues which have been considered to be a 'black art' in classification tasks ever since the introduction of stacked generalization in 1992 by Wolpert: the type of generalizer that is suitable to derive the higher-level model, and the kind of attributes that should be used as its input We find that best results are obtained when the higher-level model combines the confidence (and not just the predictions) of the lower-level ones We demonstrate the effectiveness of stacked generalization for combining three different types of learning algorithms for classification tasks We also compare the performance of stacked generalization with majority vote and published results of arcing and bagging

662 citations


"Linear combination of component res..." refers background in this paper

  • ...For example, for the classifier ensemble problem, researchers find that stacking with multi-response linear regression is a very successful meta-model [22,25]....

    [...]

Proceedings ArticleDOI
01 Jul 1997
TL;DR: This paper analyzes why improvements can be achieved with evidence combination, and proposes a combining method whose properties coincide with the rationale, and investigates the effect of using rank instead of similarity on retrieval effectiveness.
Abstract: It hsa been known that different representations of a query ret rieve different sets of documents. Recent work suggests that significant improvements in retrieval performance can be achieved by combining multiple representations of an information need. However, little effort has been made to understand the reason why combining multiple sources of evidence improves retrieval effectiveness. In this paper we analyze why improvements can be achieved with evidence combination, and investigate how evidence should be combined. We describe a rationale for multiple evidence combination, and propose a combining method whose properties coincide wit h the rationale. We also investigate e the effect of using rank instead of similarity on retrieval effectiveness.

653 citations


"Linear combination of component res..." refers methods in this paper

  • ...This normalization method, referred to as 0–1 linear score normalization method later in this paper, was used by Lee [15] and others in their experiments....

    [...]