TL;DR: This paper focuses on the data fusion approach to information search, in which each component search model contributes a result and all the results are combined by a fusion algorithm.
Abstract: Introduction. In the big data age, we have to deal with a tremendous amount of information, which can be collected from various types of sources. For information search systems such as Web search engines or online digital libraries, the collection of documents becomes larger and larger. For some queries, an information search system needs to retrieve a large number of documents. On the other hand, very often people are only willing to visit no more than a few top-ranked documents. Therefore, how to develop an information search system with desirable efficiency and effectiveness is a research problem. Method. In this paper, we focus on the data fusion approach to information search, in which each component search model contributes a result and all the results are combined by a fusion algorithm. Through empirical study, we are able to find a feasible combination method that balances effectiveness and efficiency in the context of data fusion. Analysis. It is a multi-optimisation problem that aims to balance effectiveness and efficiency. To support this, we need to understand how these two factors affect each other and to what extent. Results. Using some groups of historical runs from TREC to carry out the experiment, we find that using much less information (e.g., less than 10% of the documents in the experiment), good efficiency is achievable with marginal loss on effectiveness. Conclusions. We consider that the findings from our experiment are informative and this can be used as a guideline for providing more efficient search service in the big data environment.
04 Jun 2014
TL;DR: In this article, an information retrieval data fusion method based on retrieval result diversification is proposed, which can improve the validity and diversity of infused results and can also be applied to different types of infusion problems such as documents, pictures, medical records and the like.
Abstract: The invention discloses an information retrieval data fusion method based on retrieval result diversification. The method includes the following steps that suppose that totally t information retrieval systems exist, the same database is searched by the t information retrieval systems for the same inquiry, and t results are obtained; the number of times of a file, occurring in other results, of any result is counted; the difference value of each retrieval result i (1<=i<=t) serves as the difference weight; the use performance index ERR-IA20 is used for evaluation, an obtained performance value serves as the performance weight of each information retrieval system; the difference weight and the performance weight are combined, the comprehensive weight of each information retrieval system is calculated; the method is repeatedly used in one group of inquiries, the final weight of each information retrieval system is the average value obtained in all the inquiries; retrieval result infusion is performed on the calculated final weights with a linear combination method. The information retrieval data fusion method can improve the validity and the diversity of infused results and can also be applied to different types of infusion problems such as documents, pictures, medical records and the like.
••27 Jun 2018
TL;DR: The goal of this half day, intermediate-level, tutorial is to provide a methodological view of the theoretical foundations of fusion approaches, the numerous fusion methods that have been devised and a variety of applications for which fusion techniques have been applied.
Abstract: Fusion is an important and central concept in Information Retrieval. The goal of fusion methods is to merge different sources of information so as to address a retrieval task. For example, in the adhoc retrieval setting, fusion methods have been applied to merge multiple document lists retrieved for a query. The lists could be retrieved using different query representations, document representations, ranking functions and corpora. The goal of this half day, intermediate-level, tutorial is to provide a methodological view of the theoretical foundations of fusion approaches, the numerous fusion methods that have been devised and a variety of applications for which fusion techniques have been applied.
TL;DR: This paper presents, in this paper, the resolution of conflict at the instance level into two stages: references reconciliation and data fusion, and defines first the conflicts classification, the strategies for dealing with conflicts and the implementing conflict management strategies.
Abstract: With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds of data. Unfortunately, existing data is not proper due to the existence of the same information in different sources, as well as erroneous and incomplete data. The aim of data integration systems is to offer to a user a unique interface to query a number of sources. A key challenge of such systems is to deal with conflicting information from the same source or from different sources. We present, in this paper, the resolution of conflict at the instance level into two stages: references reconciliation and data fusion. The reference reconciliation methods seek to decide if two data descriptions are references to the same entity in reality. We define the principles of reconciliation method then we distinguish the methods of reference reconciliation, first on how to use the descriptions of references, then the way to acquire knowledge. We finish this section by discussing some current data reconciliation issues that are the subject of current research. Data fusion in turn, has the objective to merge duplicates into a single representation while resolving conflicts between the data. We define first the conflicts classification, the strategies for dealing with conflicts and the implementing conflict management strategies. We present then, the relational operators and data fusion techniques. Likewise, we finish this section by discussing some current data fusion issues that are the subject of current research.
11 Mar 2015
TL;DR: In this article, a data integration method supporting the diversification of information retrieval results is proposed. The method is mainly based on a complementary weight allocation strategy covered by a sub-theme.
Abstract: The invention discloses a data integration method supporting the diversification of information retrieving results. The method is mainly based on a complementary weight allocation strategy covered by a sub-theme. The calculation of the complementary weight mainly comprises the following steps of providing t information retrieving systems, retrieving a corresponding result r1, r2,...,rt from a same database by each information retrieving system for a given inquiry q; establishing a super result r on the basis of two results ri and rj; then evaluating the ri, rj and r by utilizing a performance index to obtain performance values, respectively recording the performance values as p , p and p , calculating the complementation degree of ri corresponding to rj according to the performance value, calculating the complementary weight ci of the calculation result ri (i is more than or equal to 1 and less than or equal to t), acquiring the complementary weight, and directly utilizing the complementary weight for the linear combination or as a part of the linear combined weight. By adopting the method, the novelty can be considered on the basis of diversification, the complementation degree of a result to the integrity can be quantified, and the method can be used for integrating various types such as texts, pictures and the like.