scispace - formally typeset
Search or ask a question
Patent

Data integration method supporting diversification of information retrieving results

TL;DR: In this article, a data integration method supporting the diversification of information retrieval results is proposed. The method is mainly based on a complementary weight allocation strategy covered by a sub-theme.
Abstract: The invention discloses a data integration method supporting the diversification of information retrieving results. The method is mainly based on a complementary weight allocation strategy covered by a sub-theme. The calculation of the complementary weight mainly comprises the following steps of providing t information retrieving systems, retrieving a corresponding result r1, r2,...,rt from a same database by each information retrieving system for a given inquiry q; establishing a super result r on the basis of two results ri and rj; then evaluating the ri, rj and r by utilizing a performance index to obtain performance values, respectively recording the performance values as p , p and p , calculating the complementation degree of ri corresponding to rj according to the performance value, calculating the complementary weight ci of the calculation result ri (i is more than or equal to 1 and less than or equal to t), acquiring the complementary weight, and directly utilizing the complementary weight for the linear combination or as a part of the linear combined weight. By adopting the method, the novelty can be considered on the basis of diversification, the complementation degree of a result to the integrity can be quantified, and the method can be used for integrating various types such as texts, pictures and the like.
References
More filters
Proceedings ArticleDOI
01 Apr 2012
TL;DR: This paper revisits the classical problem of multi-query optimization in the context of RDF/SPARQL and proposes heuristic algorithms that partition the input batch of queries into groups such that each group of queries can be optimized together.
Abstract: This paper revisits the classical problem of multi-query optimization in the context of RDF/SPARQL. We show that the techniques developed for relational and semi-structured data/query languages are hard, if not impossible, to be extended to account for RDF data model and graph query patterns expressed in SPARQL. In light of the NP-hardness of the multi-query optimization for SPARQL, we propose heuristic algorithms that partition the input batch of queries into groups such that each group of queries can be optimized together. An essential component of the optimization incorporates an efficient algorithm to discover the common sub-structures of multiple SPARQL queries and an effective cost model to compare candidate execution plans. Since our optimization techniques do not make any assumption about the underlying SPARQL query engine, they have the advantage of being portable across different RDF stores. The extensive experimental studies, performed on three popular RDF stores, show that the proposed techniques are effective, efficient and scalable.

147 citations

Patent
20 Feb 2008
TL;DR: A system and method for a comparative web search engines, search result summarization, web snippet processing, comparison analysis, information visualization, meta-clustering, and quantitative evaluation of web snippet quality are disclosed in this article.
Abstract: A system and method for a comparative web search engines, search result summarization, web snippet processing, comparison analysis, information visualization, meta-clustering, and quantitative evaluation of web snippet quality are disclosed. The present invention extends the capabilities of web searching and informational retrieval by providing a succinct comparative summary of search results at either the object or thematic levels.

62 citations

Patent
14 Oct 2008
TL;DR: In this paper, the authors describe techniques for query processing in a multi-site search engine, where each site of a multisite search engine indexes a set of assigned web resources and each site calculates a site-specific upper bound ranking score on the contribution of the term to the search engine ranking function for a query containing the term.
Abstract: Techniques for query processing in a multi-site search engine are described. During an indexing phase, each site of a multi-site search engine indexes a set of assigned web resources and each site calculates, for each term in the set of assigned web resources, a site-specific upper bound ranking score on the contribution of the term to the search engine ranking function for a query containing the term. During a propagation phase, all sites exchange their site-specific upper bound ranking scores with each other. In response to a site receiving a query, the site determines the set of locally matching resources and compares the ranking score of a locally matching resource with the site-specific upper bound ranking scores for the terms of the query that were received during the propagation phase and determines whether to communicate the query to other sites. By exchanging appropriately defined site-specific upper bound ranking scores, the site initially receiving the query can determine whether the locally matching resources would be identical to the resources obtained from a single-site search system without having to communicate the query to each of the other sites.

17 citations

Patent
19 Feb 2014
TL;DR: In this paper, a method for testing combination properties of evaluation indexes of search engines comprises the following steps of selecting more than two datasets from datasets provided by a TREC (tracking radar electronic component) by using a testing device, calculating the score values of query results queried by the search engines in a dataset sequentially according to an evaluation index by using the testing device; pairwise matching the score value of all the search engine in the dataset, and performing analysis calculation by using two-tailed t examination according to matched results and a set threshold value.
Abstract: A method for testing combination properties of evaluation indexes of search engines comprises the following steps of selecting more than two datasets from datasets provided by a TREC (tracking radar electronic component) by using a testing device; calculating the score values of query results queried by the search engines in a dataset sequentially according to an evaluation index by using the testing device; pairwise matching the score values of all the search engines in the dataset; performing analysis calculation by using two-tailed t examination according to matched results and a set threshold value by using the testing device, and judging whether difference between searching qualities of each two search engines is obvious or not; and calculating the proportion of the matched results with the obvious difference in all the matched results after obtaining t examination values between all the matched results by using the testing device. By using the method, t examination is applied to calculation on stability and sensitivity of the evaluation indexes; and the evaluation index with the optimal overall characteristic can be obtained by only calculating a value.

6 citations

Patent
22 May 2013
TL;DR: In this paper, an information retrieval self-adaption data fusion method was proposed, which can guarantee the effectiveness of fused results and is suitable for information retrieval data fusion, even on the condition of small data volume.
Abstract: The invention discloses an information retrieval self-adaption data fusion method. For a group of member retrieval systems L (1<=i<=t), the method comprises the following steps: 1, calculating the difference degree of results corresponding to any two retrieval systems; 2, calculating the weight of differentiation of each system L (1<=i<=t) according to the conclusion of the step 1; 3, utilizing a performance square weighting scheme to calculate the performance weight of each system; 4, calculating the final weight of each system according to the conclusions of the step 2 and the step 3; and 5, enabling retrieved result fusion of the weight calculated in the step 4 to be conducted in a linear combination method. According to the information retrieval self-adaption data fusion method, performance of a retrieval model of each member is considered in the weight updating method, and differences among retrieval models of the members are also considered. Weight updating only needs a few data, such as results produced by each inquiry. Even on the condition of a small data volume, the information retrieval self-adaption data fusion method can guarantee effectiveness of fused results, and is suitable for information retrieval self-adaption data fusion.

3 citations