scispace - formally typeset
Search or ask a question

Showing papers by "Ye Yuan published in 2013"


Journal ArticleDOI
TL;DR: A filtering-and-verification strategy based on a probabilistic keyword index, PKIndex, which offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way to improve the search efficiency.
Abstract: As a popular search mechanism, keyword search has been applied to retrieve useful data in documents, texts, graphs, and even relational databases. However, so far, there is no work on keyword search over uncertain graph data even though the uncertain graphs have been widely used in many real applications, such as modeling road networks, influential detection in social networks, and data analysis on PPI networks. Therefore, in this paper, we study the problem of top-k keyword search over uncertain graph data. Following the similar answer definition for keyword search over deterministic graphs, we consider a subtree in the uncertain graph as an answer to a keyword query if 1) it contains all the keywords; 2) it has a high score (defined by users or applications) based on keyword matching; and 3) it has low uncertainty. Keyword search over deterministic graphs is already a hard problem as stated in [1], [2], [3]. Due to the existence of uncertainty, keyword search over uncertain graphs is much harder. Therefore, to improve the search efficiency, we employ a filtering-and-verification strategy based on a probabilistic keyword index, PKIndex. For each keyword, we offline compute path-based top-k probabilities, and attach these values to PKIndex in an optimal, compressed way. In the filtering phase, we perform existence, path-based and tree-based probabilistic pruning phases, which filter out most false subtrees. In the verification, we propose a sampling algorithm to verify the candidates. Extensive experimental results demonstrate the effectiveness of the proposed algorithms.

48 citations


Journal ArticleDOI
TL;DR: An Extreme Learning Machine (ELM) based Multi-modality Classifier Combination Framework (MCCF) to improve the accuracy of semantic concept detection and achieve performance at extremely high speed is proposed.

22 citations


Proceedings ArticleDOI
27 Jun 2013
TL;DR: A filter-refine two phases approach in MapReduce that translates the probabilistic skyline query into two decomposable computations for obtaining the final results and develops the optimized probabilism skyline query processing algorithm to prune the unpromising data both in filter and refine phases.
Abstract: As a popular parallel programming model, how to process probabilistic skyline query over uncertain data in MapReduce framework is becoming an urgent problem to be resolved. In MapReduce framework, implementing probabilistic skyline query is nontrivial since the probabilistic skyline query is not decomposable. Therefore, in this paper, we propose a filter-refine two phases approach in MapReduce that translates the probabilistic skyline query into two decomposable computations for obtaining the final results. Firstly, we describe the whole processing procedure of filter-refine, and then propose an efficient probabilistic skyline query processing algorithm in MapReduce. Furthermore, to reduce the computation and communication cost, we develop the optimized probabilistic skyline query processing algorithm to prune the unpromising data both in filter and refine phases. Finally, we conduct extensive experiments on synthetic data to verify the effectiveness and efficiency of the proposed filter-refine approach with various experimental settings.

9 citations


Book ChapterDOI
Keyan Cao1, Donghong Han1, Guoren Wang1, Yachao Hu1, Ye Yuan1 
04 Apr 2013
TL;DR: A new outlier concept on uncertain data stream based on possible worlds is proposed to meet the demand of limited storage and real-time processing, and an efficient range query method based on SM-tree(Statistics M-tree), to reduce some redundant calculation.
Abstract: Outlier detection plays an important role in fraud detection, sensor net, computer network management and many other areas. Now the flow property and uncertainty of data are more and more apparent, outlier detection on uncertain data stream has become a new research topic. Firstly, we propose a new outlier concept on uncertain data stream based on possible worlds. Then an outlier detection method on uncertain data stream is proposed to meet the demand of limited storage and real-time processing. Next, a dynamic storage structure is designed for outlier detection on uncertain data stream over sliding window, to meet the demands of limited storage and real-time response. Furthermore, an efficient range query method based on SM-tree(Statistics M-tree) is proposed to reduce some redundant calculation. Finally, the performance of our method is verified through a large number of simulation experiments. The experimental results show that our method is an effective way to solve the problem of outlier detection on uncertain data stream, and it could significantly reduce the execution time and storage space.

3 citations