scispace - formally typeset
Search or ask a question
Author

Chunlan Huang

Bio: Chunlan Huang is an academic researcher from Jiangsu University. The author has contributed to research in topics: Search engine & Adversarial information retrieval. The author has an hindex of 1, co-authored 4 publications receiving 3 citations.

Papers
More filters
Journal Article
TL;DR: This paper focuses on the data fusion approach to information search, in which each component search model contributes a result and all the results are combined by a fusion algorithm.
Abstract: Introduction. In the big data age, we have to deal with a tremendous amount of information, which can be collected from various types of sources. For information search systems such as Web search engines or online digital libraries, the collection of documents becomes larger and larger. For some queries, an information search system needs to retrieve a large number of documents. On the other hand, very often people are only willing to visit no more than a few top-ranked documents. Therefore, how to develop an information search system with desirable efficiency and effectiveness is a research problem. Method. In this paper, we focus on the data fusion approach to information search, in which each component search model contributes a result and all the results are combined by a fusion algorithm. Through empirical study, we are able to find a feasible combination method that balances effectiveness and efficiency in the context of data fusion. Analysis. It is a multi-optimisation problem that aims to balance effectiveness and efficiency. To support this, we need to understand how these two factors affect each other and to what extent. Results. Using some groups of historical runs from TREC to carry out the experiment, we find that using much less information (e.g., less than 10% of the documents in the experiment), good efficiency is achievable with marginal loss on effectiveness. Conclusions. We consider that the findings from our experiment are informative and this can be used as a guideline for providing more efficient search service in the big data environment.

2 citations

Proceedings ArticleDOI
11 Sep 2014
TL;DR: Using 3 groups of historical runs from TREC for the experiment, it is found that with the weights trained by weighted linear regression, the linear combination method can achieve good results in effectiveness and efficiency.
Abstract: In the big data age, we have to deal with tremendous amount of information, which is collected from various types of sources. For information retrieval systems, the collection of documents becomes larger and larger. For some query, an information retrieval system needs to retrieve a large number of documents as the result to the query. In reality, very often people mainly care about some top-ranked documents rather than the complete long list of documents. In such a situation, how to develop a retrieval system with desirable efficiency and effectiveness is a research problem. In this paper, we focus on the data fusion approach to information retrieval, in which each component retrieval system contributes a result and all the results are combined by a combination method. The goal of this research is to find a feasible combination method that is able to balance effectiveness and efficiency. Using 3 groups of historical runs from TREC for the experiment, we find that with the weights trained by weighted linear regression, the linear combination method can achieve good results in effectiveness and efficiency.

1 citations

Patent
11 Mar 2015
TL;DR: In this article, a data integration method supporting the diversification of information retrieval results is proposed. The method is mainly based on a complementary weight allocation strategy covered by a sub-theme.
Abstract: The invention discloses a data integration method supporting the diversification of information retrieving results. The method is mainly based on a complementary weight allocation strategy covered by a sub-theme. The calculation of the complementary weight mainly comprises the following steps of providing t information retrieving systems, retrieving a corresponding result r1, r2,...,rt from a same database by each information retrieving system for a given inquiry q; establishing a super result r on the basis of two results ri and rj; then evaluating the ri, rj and r by utilizing a performance index to obtain performance values, respectively recording the performance values as p , p and p , calculating the complementation degree of ri corresponding to rj according to the performance value, calculating the complementary weight ci of the calculation result ri (i is more than or equal to 1 and less than or equal to t), acquiring the complementary weight, and directly utilizing the complementary weight for the linear combination or as a part of the linear combined weight. By adopting the method, the novelty can be considered on the basis of diversification, the complementation degree of a result to the integrity can be quantified, and the method can be used for integrating various types such as texts, pictures and the like.
Patent
22 Oct 2014
TL;DR: In this article, a document score normalization method for the diversity of information retrieval results is proposed, where a method based on document ranking positions is utilized for normalizing scores, and the normalization score of the document is obtained by calculating the value of 1-0.2*1n(rank+1).
Abstract: The invention discloses a document score normalization method for the diversity of information retrieval results. A method based on document ranking positions is utilized for normalizing scores. Supposing the document ranking position is rank, and the normalization score of the document is obtained by calculating the value of 1-0.2*1n(rank+1). The document score normalization method for the diversity of the information retrieval results is applicable to diversified targets of information retrieval results, enables scores of documents to be provided with better comparability and can be applied to data fusion of information retrieval results, distributed information retrieval and the like.

Cited by
More filters
Proceedings ArticleDOI
27 Jun 2018
TL;DR: The goal of this half day, intermediate-level, tutorial is to provide a methodological view of the theoretical foundations of fusion approaches, the numerous fusion methods that have been devised and a variety of applications for which fusion techniques have been applied.
Abstract: Fusion is an important and central concept in Information Retrieval. The goal of fusion methods is to merge different sources of information so as to address a retrieval task. For example, in the adhoc retrieval setting, fusion methods have been applied to merge multiple document lists retrieved for a query. The lists could be retrieved using different query representations, document representations, ranking functions and corpora. The goal of this half day, intermediate-level, tutorial is to provide a methodological view of the theoretical foundations of fusion approaches, the numerous fusion methods that have been devised and a variety of applications for which fusion techniques have been applied.

23 citations

Journal ArticleDOI
TL;DR: This paper presents, in this paper, the resolution of conflict at the instance level into two stages: references reconciliation and data fusion, and defines first the conflicts classification, the strategies for dealing with conflicts and the implementing conflict management strategies.
Abstract: With the progress of new technologies of information and communication, more and more producers of data exist. On the other hand, the web forms a huge support of all these kinds of data. Unfortunately, existing data is not proper due to the existence of the same information in different sources, as well as erroneous and incomplete data. The aim of data integration systems is to offer to a user a unique interface to query a number of sources. A key challenge of such systems is to deal with conflicting information from the same source or from different sources. We present, in this paper, the resolution of conflict at the instance level into two stages: references reconciliation and data fusion. The reference reconciliation methods seek to decide if two data descriptions are references to the same entity in reality. We define the principles of reconciliation method then we distinguish the methods of reference reconciliation, first on how to use the descriptions of references, then the way to acquire knowledge. We finish this section by discussing some current data reconciliation issues that are the subject of current research. Data fusion in turn, has the objective to merge duplicates into a single representation while resolving conflicts between the data. We define first the conflicts classification, the strategies for dealing with conflicts and the implementing conflict management strategies. We present then, the relational operators and data fusion techniques. Likewise, we finish this section by discussing some current data fusion issues that are the subject of current research.

8 citations

Journal ArticleDOI
01 Jan 2017
TL;DR: This investigation attempts to discover the mixes utilizing the Modified Checkpoint based Apriori Algorithm the mixes of the factors which makes the most mishaps and attempt to think of them as mishaps.
Abstract: Accidents especially to look at the railroad catastrophes happen dependably in India yet then can be diminished on the off chance that they examine the clarification behind misfortunes. In our investigation we are thinking about a couple of parts like Road which are associated with the Junctions where the accidents happened, environment in which the setback happened, day time or evening time when the episode happens and more factors. Additionally, consequently they attempt to discover the mixes utilizing the Modified Checkpoint based Apriori Algorithm the mixes of the factors which makes the most mishaps and attempt think of them as.

1 citations