scispace - formally typeset
Search or ask a question

Showing papers in "Frontiers of Computer Science in 2016"


Journal ArticleDOI
TL;DR: This literature review can serve as a good reference for researchers in the areas of scene text detection and recognition and identify state-of-the-art algorithms, and predict potential research directions in the future.
Abstract: Text, as one of the most influential inventions of humanity, has played an important role in human life, so far from ancient times. The rich and precise information embodied in text is very useful in a wide range of vision-based applications, therefore text detection and recognition in natural scenes have become important and active research topics in computer vision and document analysis. Especially in recent years, the community has seen a surge of research efforts and substantial progresses in these fields, though a variety of challenges (e.g. noise, blur, distortion, occlusion and variation) still remain. The purposes of this survey are three-fold: 1) introduce up-to-date works, 2) identify state-of-the-art algorithms, and 3) predict potential research directions in the future. Moreover, this paper provides comprehensive links to publicly available resources, including benchmark datasets, source codes, and online demos. In summary, this literature review can serve as a good reference for researchers in the areas of scene text detection and recognition.

369 citations


Journal ArticleDOI
TL;DR: In this paper, a multi-channels deep convolutional neural networks (MC-DCNN) is proposed for multivariate time series classification, which first learns features from individual univariate time-series in each channel, and combines information from all channels as feature representation at the final layer.
Abstract: Time series classification is related to many different domains, such as health informatics, finance, and bioinformatics. Due to its broad applications, researchers have developed many algorithms for this kind of tasks, e.g., multivariate time series classification. Among the classification algorithms, k-nearest neighbor (k-NN) classification (particularly 1-NN) combined with dynamic time warping (DTW) achieves the state of the art performance. The deficiency is that when the data set grows large, the time consumption of 1-NN with DTWwill be very expensive. In contrast to 1-NN with DTW, it is more efficient but less effective for feature-based classification methods since their performance usually depends on the quality of hand-crafted features. In this paper, we aim to improve the performance of traditional feature-based approaches through the feature learning techniques. Specifically, we propose a novel deep learning framework, multi-channels deep convolutional neural networks (MC-DCNN), for multivariate time series classification. This model first learns features from individual univariate time series in each channel, and combines information from all channels as feature representation at the final layer. Then, the learnt features are applied into a multilayer perceptron (MLP) for classification. Finally, the extensive experiments on real-world data sets show that our model is not only more efficient than the state of the art but also competitive in accuracy. This study implies that feature learning is worth to be investigated for the problem of time series classification.

168 citations


Journal ArticleDOI
TL;DR: A comprehensive survey on string similarity search and join is presented and widely-used similarity functions to quantify the similarity are introduced, including approximate entity extraction, type-ahead search, and approximate substring matching.
Abstract: String similarity search and join are two important operations in data cleaning and integration, which extend traditional exact search and exact join operations in databases by tolerating the errors and inconsistencies in the data. They have many real-world applications, such as spell checking, duplicate detection, entity resolution, and webpage clustering. Although these two problems have been extensively studied in the recent decade, there is no thorough survey. In this paper, we present a comprehensive survey on string similarity search and join. We first give the problem definitions and introduce widely-used similarity functions to quantify the similarity. We then present an extensive set of algorithms for string similarity search and join. We also discuss their variants, including approximate entity extraction, type-ahead search, and approximate substring matching. Finally, we provide some open datasets and summarize some research challenges and open problems.

126 citations


Journal ArticleDOI
Zhi-Hua Zhou1
TL;DR: Current machine learning techniques have achieved great success; however, there are many deficiencies, particularly in real tasks where decision reliability is crucial and rigorous judgment by human beings are critical.
Abstract: Current machine learning techniques have achieved great success; however, there are many deficiencies. First, to train a strong model, a large amount of training examples are required, whereas collecting the data, particularly data with labels, is expensive or even difficult in many real tasks. Second, once a model has been trained, if environment changes, which often happens in real tasks, the model can hardly perform well or even become useless. Third, the trained models are usually black-boxes, whereas people usually want to know what have been learned by the models, particularly in real tasks where decision reliability is crucial and rigorous judgment by human beings are critical.

99 citations


Journal ArticleDOI
TL;DR: This review considers centralized solutions, distributed solutions, and the techniques that have been developed for querying linked data for RDF data management following different approaches.
Abstract: RDF is increasingly being used to encode data for the semantic web and data exchange. There have been a large number of works that address RDF data management following different approaches. In this paper we provide an overview of these works. This review considers centralized solutions (what are referred to as warehousing approaches), distributed solutions, and the techniques that have been developed for querying linked data. In each category, further classifications are provided that would assist readers in understanding the identifying characteristics of different approaches.

94 citations


Journal ArticleDOI
TL;DR: This paper adds discriminant information into CCA by using random cross-view correlations between within-class samples and proposes a new method for multi-view dimensionality reduction called canonical random correlation analysis (RCA).
Abstract: Canonical correlation analysis (CCA) is one of the most well-known methods to extract features from multi-view data and has attracted much attention in recent years. However, classical CCA is unsupervised and does not take discriminant information into account. In this paper, we add discriminant information into CCA by using random cross-view correlations between within-class samples and propose a new method for multi-view dimensionality reduction called canonical random correlation analysis (RCA). In RCA, two approaches for randomly generating cross-view correlation samples are developed on the basis of bootstrap technique. Furthermore, kernel RCA (KRCA) is proposed to extract nonlinear correlations between different views. Experiments on several multi-view data sets show the effectiveness of the proposed methods.

57 citations


Journal ArticleDOI
TL;DR: The algorithm offered is evaluated using the CloudSim simulator and finds that it offers better performance in comparison with other algorithms in terms of mean response time, effective network usage, load balancing, replication frequency, and storage usage.
Abstract: Cloud computing is becoming a very popular word in industry and is receiving a large amount of attention from the research community. Replica management is one of the most important issues in the cloud, which can offer fast data access time, high data availability and reliability. By keeping all replicas active, the replicas may enhance system task successful execution rate if the replicas and requests are reasonably distributed. However, appropriate replica placement in a large-scale, dynamically scalable and totally virtualized data centers is much more complicated. To provide cost-effective availability, minimize the response time of applications and make load balancing for cloud storage, a new replica placement is proposed. The replica placement is based on five important parameters: mean service time, failure probability, load variance, latency and storage usage. However, replication should be used wisely because the storage size of each site is limited. Thus, the site must keep only the important replicas.We also present a new replica replacement strategy based on the availability of the file, the last time the replica was requested, number of access, and size of replica. We evaluate our algorithm using the CloudSim simulator and find that it offers better performance in comparison with other algorithms in terms of mean response time, effective network usage, load balancing, replication frequency, and storage usage.

55 citations


Journal ArticleDOI
TL;DR: This article introduces the readers to VOT and its applications in other domains, different issues which arise in it, various classical as well as contemporary approaches for object tracking, 4) evaluation methodologies for VOT, and 5) online resources, i.e., annotated datasets and source code available for various tracking techniques.
Abstract: Visual object tracking (VOT) is an important subfield of computer vision. It has widespread application domains, and has been considered as an important part of surveillance and security system. VOA facilitates finding the position of target in image coordinates of video frames.While doing this, VOA also faces many challenges such as noise, clutter, occlusion, rapid change in object appearances, highly maneuvered (complex) object motion, illumination changes. In recent years, VOT has made significant progress due to availability of low-cost high-quality video cameras as well as fast computational resources, and many modern techniques have been proposed to handle the challenges faced by VOT. This article introduces the readers to 1) VOT and its applications in other domains, 2) different issues which arise in it, 3) various classical as well as contemporary approaches for object tracking, 4) evaluation methodologies for VOT, and 5) online resources, i.e., annotated datasets and source code available for various tracking techniques.

52 citations


Journal ArticleDOI
Dan Hao1, Lu Zhang1, Hong Mei1
TL;DR: The achievements of test-case prioritization from five aspects are reviewed: prioritization algorithms, coverage criteria, measurement, practical concerns involved, and application scenarios.
Abstract: Test-case prioritization, proposed at the end of last century, aims to schedule the execution order of test cases so as to improve test effectiveness. In the past years, test-case prioritization has gained much attention, and has significant achievements in five aspects: prioritization algorithms, coverage criteria, measurement, practical concerns involved, and application scenarios. In this article, we will first review the achievements of test-case prioritization from these five aspects and then give our perspectives on its challenges.

48 citations


Journal ArticleDOI
Shuai Ma1, Jia Li1, Chunming Hu1, Xuelian Lin1, Jinpeng Huai1 
TL;DR: In this article, the authors argue that big graph search is the one filling the gap between traditional relational and XML models, and give an analysis of graph search from an evolutionary point of view, followed by the evidences from both industry and academia.
Abstract: On one hand, compared with traditional relational and XML models, graphs have more expressive power and are widely used today. On the other hand, various applications of social computing trigger the pressing need of a new search paradigm. In this article, we argue that big graph search is the one filling this gap. We first introduce the application of graph search in various scenarios. We then formalize the graph search problem, and give an analysis of graph search from an evolutionary point of view, followed by the evidences from both the industry and academia. After that, we analyze the difficulties and challenges of big graph search. Finally, we present three classes of techniques towards big graph search: query techniques, data techniques and distributed computing techniques.

40 citations


Journal ArticleDOI
TL;DR: iGraph is designed, an incremental graph processing system for dynamic graph with its continuous updates, and experimental results show that for real life datasets, iGraph outperforms the original GraphX in respect of graph update and graph computation.
Abstract: With the popularity of social network, the demand for real-time processing of graph data is increasing. However, most of the existing graph systems adopt a batch processing mode, therefore the overhead of maintaining and processing of dynamic graph is significantly high. In this paper, we design iGraph, an incremental graph processing system for dynamic graph with its continuous updates. The contributions of iGraph include: 1) a hash-based graph partition strategy to enable fine-grained graph updates; 2) a vertexbased graph computing model to support incremental data processing; 3) detection and rebalance methods of hotspot to address the workload imbalance problem during incremental processing. Through the general-purpose API, iGraph can be used to implement various graph processing algorithms such as PageRank. We have implemented iGraph on Apache Spark, and experimental results show that for real life datasets, iGraph outperforms the original GraphX in respect of graph update and graph computation.

Journal ArticleDOI
Kang Li1, Fazhi He1, Xiao Chen1
TL;DR: An online compressive feature selection algorithm (CFS) based on the CT framework that selects the features which have the largest margin when using them to classify positive samples and negative samples and defines a random learning rate to update them slowly.
Abstract: Recently, compressive tracking (CT) has been widely proposed for its efficiency, accuracy and robustness on many challenging sequences. Its appearance model employs non-adaptive random projections that preserve the structure of the image feature space. A very sparse measurement matrix is used to extract features by multiplying it with the feature vector of the image patch. An adaptive Bayes classifier is trained using both positive samples and negative samples to separate the target from background. On the CT framework, however, some features used for classification have weak discriminative abilities, which reduces the accuracy of the strong classifier. In this paper, we present an online compressive feature selection algorithm(CFS) based on the CT framework. It selects the features which have the largest margin when using them to classify positive samples and negative samples. For features that are not selected, we define a random learning rate to update them slowly. It makes those weak classifiers preserve more target information, which relieves the drift when the appearance of the target changes heavily. Therefore, the classifier trained with those discriminative features couples its score in many challenging sequences, which leads to a more robust tracker. Numerous experiments show that our tracker could achieve superior result beyond many state-of-the-art trackers.

Journal ArticleDOI
TL;DR: This paper reviews and categorizes the sketch-based modeling systems in four aspects: the input, the knowledge they use, the modeling approach and the output, and discusses about inherent challenges and open problems for researchers in the future.
Abstract: As 3D technology, including computer graphics, virtual reality and 3D printing, has been rapidly developed in the past years, 3D models are gaining an increasingly huge demand. Traditional 3D modeling platforms such as Maya and ZBrush, utilize "windows, icons, menus, pointers" (WIMP) interface paradigms for fine-grained control to construct detailed models. However, the modeling progress can be tedious and frustrating and thus too hard for a novice user or even a well trained artist. Therefore, a more intuitive interface is needed. Sketch, an intuitive communication and modeling tool for human beings, becomes the first choice of modeling community. So far, various sketch-based modeling systems have been created and studied. In this paper, we attempt to show how these systems work and give a comprehensive survey.We review and categorize the systems in four aspects: the input, the knowledge they use, the modeling approach and the output. We also discuss about inherent challenges and open problems for researchers in the future.

Journal ArticleDOI
TL;DR: This work analyzes user sentiment orientation to topics based on emotional phrases extracted from their posted comments, and builds a similar-orientation network (SON) where each vertex represents a user account on a social media site.
Abstract: Users of social media sites can use more than one account. These identities have pseudo anonymous properties, and as such some users abuse multiple accounts to perform undesirable actions, such as posting false or misleading remarks comments that praise or defame the work of others. The detection of multiple user accounts that are controlled by an individual or organization is important. Herein, we define the problem as sockpuppet gang (SPG) detection. First, we analyze user sentiment orientation to topics based on emotional phrases extracted from their posted comments. Then we evaluate the similarity between sentiment orientations of user account pairs, and build a similar-orientation network (SON) where each vertex represents a user account on a social media site. In an SON, an edge exists only if the two user accounts have similar sentiment orientations to most topics. The boundary between detected SPGs may be indistinct, thus by analyzing account posting behavior features we propose a multiple random walk method to iteratively remeasure the weight of each edge. Finally, we adopt multiple community detection algorithms to detect SPGs in the network. User accounts in the same SPG are considered to be controlled by the same individual or organization. In our experiments on real world datasets, our method shows better performance than other contemporary methods.

Journal ArticleDOI
TL;DR: The feasibility of generating code fragment summaries by using supervised learning algorithms is investigated and is promising when employing mechanisms such as data-driven crowd enlistment improve the efficacy of existing code fragment classifiers.
Abstract: Recent studies have applied different approaches for summarizing software artifacts, and yet very few efforts have been made in summarizing the source code fragments available on web. This paper investigates the feasibility of generating code fragment summaries by using supervised learning algorithms.We hire a crowd of ten individuals from the same work place to extract source code features on a corpus of 127 code fragments retrieved from Eclipse and Net- Beans Official frequently asked questions (FAQs). Human annotators suggest summary lines. Our machine learning algorithms produce better results with the precision of 82% and performstatistically better than existing code fragment classifiers. Evaluation of algorithms on several statistical measures endorses our result. This result is promising when employing mechanisms such as data-driven crowd enlistment improve the efficacy of existing code fragment classifiers.

Journal ArticleDOI
TL;DR: This survey systematically considers sensing models, design issues and challenges in barrier coverage problem, and provides discussions on some extensions and variants of barrier coverage problems.
Abstract: For various applications, sensors are deployed to monitor belt regions to guarantee that every movement crossing a barrier of sensors will be detected in real-time with high accuracy and minimize the need for human support. The barrier coverage problem is introduced to model these requirements, and has been examined thoroughly in the past decades. In this survey, we state the problem definitions and systematically consider sensing models, design issues and challenges in barrier coverage problem. We also review representative algorithms in this survey. Furthermore, we provide discussions on some extensions and variants of barrier coverage problems.

Journal ArticleDOI
TL;DR: The present prototypewas able to reproduce a plausible output range for different crops in terms of both the dynamics and final values of model state variables such as assimilate production, organ biomass, leaf area and architecture.
Abstract: In the last decade, functional-structural plant modelling (FSPM) has become a more widely accepted paradigm in crop and tree production, as 3D models for the most important crops have been proposed. Given the wider portfolio of available models, it is now appropriate to enter the next level in FSPM development, by introducing more efficient methods for model development. This includes the consideration of model reuse (by modularisation), combination and comparison, and the enhancement of existing models. To facilitate this process, standards for design and communication need to be defined and established. We present a first step towards an efficient and general, i.e., not speciesspecific FSPM, presently restricted to annual or bi-annual plants, but with the potential for extension and further generalization. Model structure is hierarchical and object-oriented, with plant organs being the base-level objects and plant individual and canopy the higher-level objects. Modules for the majority of physiological processes are incorporated, more than in other platforms that have a similar aim (e.g., photosynthesis, organ formation and growth). Simulation runs with several general parameter sets adopted from the literature show that the present prototypewas able to reproduce a plausible output range for different crops (rapeseed, barley, etc.) in terms of both the dynamics and final values (at harvest time) of model state variables such as assimilate production, organ biomass, leaf area and architecture.

Journal ArticleDOI
TL;DR: It is shown that selecting the right combination of preprocessing methods has a considerable impact on the classification potential of a dataset and a significant relative improvement in predictive accuracy is obtained.
Abstract: The significance of the preprocessing stage in any data mining task is well known. Before attempting medical data classification, characteristics ofmedical datasets, including noise, incompleteness, and the existence of multiple and possibly irrelevant features, need to be addressed. In this paper, we show that selecting the right combination of preprocessing methods has a considerable impact on the classification potential of a dataset. The preprocessing operations considered include the discretization of numeric attributes, the selection of attribute subset(s), and the handling of missing values. The classification is performed by an ant colony optimization algorithm as a case study. Experimental results on 25 real-world medical datasets show that a significant relative improvement in predictive accuracy, exceeding 60% in some cases, is obtained.

Journal ArticleDOI
TL;DR: In this article, a systematic review of the recent advances and cutting-edge techniques for visual sentiment analysis is presented, in which detailed comparison as well as experimental evaluation are given over the cutting edge methods.
Abstract: Recent years have witnessed a rapid spread of multi-modality microblogs like Twitter and Sina Weibo composed of image, text and emoticon. Visual sentiment prediction of such microblog based social media has recently attracted ever-increasing research focus with broad application prospect. In this paper, we give a systematic review of the recent advances and cutting-edge techniques for visual sentiment analysis. To this end, in this paper we review the most recent works in this topic, in which detailed comparison as well as experimental evaluation are given over the cutting-edge methods. We further reveal and discuss the future trends and potential directions for visual sentiment prediction.

Journal ArticleDOI
TL;DR: Experimental results show that the proposed game theoretic framework helps to comprehend the information diffusion process better and can predict users’ forwarding behaviors with more accuracy than the previous studies.
Abstract: Social networks are fundamental mediums for diffusion of information and contagions appear at some node of the network and get propagated over the edges. Prior researches mainly focus on each contagion spreading independently, regardless of multiple contagions' interactions as they propagate at the same time. In the real world, simultaneous news and events usually have to compete for user's attention to get propagated. In some other cases, they can cooperate with each other and achieve more influences. In this paper, an evolutionary game theoretic framework is proposed to model the interactions among multiple contagions. The basic idea is that different contagions in social networks are similar to the multiple organisms in a population, and the diffusion process is as organisms interact and then evolve from one state to another. This framework statistically learns the payoffs as contagions interacting with each other and builds the payoff matrix. Since learning payoffs for all pairs of contagions IS almost impossible (quadratic in the number of contagions), a contagion clustering method is proposed in order to decrease the number of parameters to fit, which makes our approach efficient and scalable. To verify the proposed framework, we conduct experiments by using real-world information spreading dataset of Digg. Experimental results show that the proposed game theoretic framework helps to comprehend the information diffusion process better and can predict users' forwarding behaviors with more accuracy than the previous studies. The analyses of evolution dynamics of contagions and evolutionarily stable strategy reveal whether a contagion can be promoted or suppressed by others in the diffusion process.

Journal ArticleDOI
TL;DR: This paper proposes a novel approach MADM for batch mode multi-label active learning that exploits representativeness and diversity in both the feature and label space by matching the distribution between labeled and unlabeled data.
Abstract: Multi-label learning is an effective framework for learning with objects that have multiple semantic labels, and has been successfully applied into many real-world tasks. In contrast with traditional single-label learning, the cost of labeling a multi-label example is rather high, thus it becomes an important task to train an effectivemulti-label learning model with as few labeled examples as possible. Active learning, which actively selects the most valuable data to query their labels, is the most important approach to reduce labeling cost. In this paper, we propose a novel approach MADM for batch mode multi-label active learning. On one hand, MADM exploits representativeness and diversity in both the feature and label space by matching the distribution between labeled and unlabeled data. On the other hand, it tends to query predicted positive instances, which are expected to be more informative than negative ones. Experiments on benchmark datasets demonstrate that the proposed approach can reduce the labeling cost significantly.

Journal ArticleDOI
TL;DR: The importance of routing algorithms in mesh NoC is highlighted, and representative routing algorithms with respective features are reviewed and summarized to shed light upon the future work of NoC routing algorithms.
Abstract: With the rapid development of semiconductor industry, the number of cores integrated on chip increases quickly, which brings tough challenges such as bandwidth, scalability and power into on-chip interconnection. Under such background, Network-on-Chip (NoC) is proposed and gradually replacing the traditional on-chip interconnections such as sharing bus and crossbar. For the convenience of physical layout, mesh is the most used topology in NoC design. Routing algorithm, which decides the paths of packets, has significant impact on the latency and throughput of network. Thus routing algorithm plays a vital role in a wellperformed network. This study mainly focuses on the routing algorithms of mesh NoC. By whether taking network information into consideration in routing decision, routing algorithms of NoC can be roughly classified into oblivious routing and adaptive routing. Oblivious routing costs less without adaptiveness while adaptive routing is on the contrary. To combine the advantages of oblivious and adaptive routing algorithm, half-adaptive algorithms were proposed. In this paper, the concepts, taxonomy and features of routing algorithms of NoC are introduced. Then the importance of routing algorithms in mesh NoC is highlighted, and representative routing algorithms with respective features are reviewed and summarized. Finally, we try to shed light upon the future work of NoC routing algorithms.

Journal ArticleDOI
TL;DR: A novel automated segmentation method based on an eight-neighbor region growing algorithm with left-right scanning and four-corner rotating and scanning to address the incomplete problem in pulmonary parenchyma segmentation based on the traditional methods.
Abstract: To address the incomplete problem in pulmonary parenchyma segmentation based on the traditional methods, a novel automated segmentation method based on an eight-neighbor region growing algorithm with left-right scanning and four-corner rotating and scanning is proposed in this paper. The proposed method consists of four main stages: image binarization, rough segmentation of lung, image denoising and lung contour refining. First, the binarization of images is done and the regions of interest are extracted. After that, the rough segmentation of lung is performed through a general region growing method. Then the improved eight-neighbor region growing is used to remove noise for the upper, middle, and bottom region of lung. Finally, corrosion and expansion operations are utilized to smooth the lung boundary. The proposed method was validated on chest positron emission tomography-computed tomography (PET-CT) data of 30 cases from a hospital in Shanxi, China. Experimental results show that our method can achieve an average volume overlap ratio of 96.21 ± 0.39% with the manual segmentation results. Compared with the existing methods, the proposed algorithm segments the lung in PET-CT images more efficiently and accurately.

Journal ArticleDOI
TL;DR: This paper systematically review the existing VM provisioning schemes and classify them in three main categories, and discusses the features and research status of each category, and introduces two recent solutions, VMThunder and VMThunder+, both of which can provision hundreds of VMs in seconds.
Abstract: The scale of global data center market has been explosive in recent years. As the market grows, the demand for fast provisioning of the virtual resources to support elastic, manageable, and economical computing over the cloud becomes high. Fast provisioning of large-scale virtual machines (VMs), in particular, is critical to guarantee quality of service (QoS). In this paper, we systematically review the existing VM provisioning schemes and classify them in three main categories. We discuss the features and research status of each category, and introduce two recent solutions, VMThunder and VMThunder+, both of which can provision hundreds of VMs in seconds.

Journal ArticleDOI
TL;DR: The results show that IPSETFUL can help identify most of the faults in the program with the given test suite and can save much effort in inspecting unfaulty program statements compared with the existing spectrum based fault localization techniques and the relevant state of the art technique.
Abstract: Fault localization is an important and challenging task during software testing. Among techniques studied in this field, program spectrum based fault localization is a promising approach. To perform spectrum based fault localization, a set of test oracles should be provided, and the effectiveness of fault localization depends highly on the quality of test oracles. Moreover, their effectiveness is usually affected when multiple simultaneous faults are present. Faced with multiple faults it is difficult for developers to determine when to stop the fault localization process. To address these issues, we propose an iterative fault localization process, i.e., an iterative process of selecting test cases for effective fault localization (IPSETFUL), to identify as many faults as possible in the program until the stopping criterion is satisfied. It is performed based on a concept lattice of program spectrum (CLPS) proposed in our previous work. Based on the labeling approach of CLPS, program statements are categorized as dangerous statements, safe statements, and sensitive statements. To identify the faults, developers need to check the dangerous statements. Meantime, developers need to select a set of test cases covering the dangerous or sensitive statements from the original test suite, and a new CLPS is generated for the next iteration. The same process is proceeded in the same way. This iterative process ends until there are no failing tests in the test suite and all statements on the CLPS become safe statements. We conduct an empirical study on several subject programs, and the results show that IPSETFUL can help identifymost of the faults in the program with the given test suite. Moreover, it can save much effort in inspecting unfaulty program statements compared with the existing spectrum based fault localization techniques and the relevant state of the art technique.

Journal ArticleDOI
TL;DR: This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data sets and develops an approximate algorithm with compensation factor so that relatively large quantity of data can be processed quickly.
Abstract: A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data sets and makes the following contributions: 1) the concept of the top-k probabilistic prevalent co-locations based on a possible world model is defined; 2) a framework for discovering the top-k probabilistic prevalent co-locations is set up; 3) a matrix method is proposed to improve the computation of the prevalence probability of a top-k candidate, and two pruning rules of the matrix block are given to accelerate the search for exact solutions; 4) a polynomial matrix is developed to further speed up the top-k candidate refinement process; 5) an approximate algorithm with compensation factor is introduced so that relatively large quantity of data can be processed quickly. The efficiency of our proposed algorithms as well as the accuracy of the approximation algorithms is evaluated with an extensive set of experiments using both synthetic and real uncertain data sets.

Journal ArticleDOI
TL;DR: A preprocessing strategy to remove the mismatching graph pairs with significant differences and a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method.
Abstract: Graphs have been widely used for complex data representation in many real applications, such as social network, bioinformatics, and computer vision. Therefore, graph similarity join has become imperative for integrating noisy and inconsistent data from multiple data sources. The edit distance is commonly used to measure the similarity between graphs. The graph similarity join problem studied in this paper is based on graph edit distance constraints. To accelerate the similarity join based on graph edit distance, in the paper, we make use of a preprocessing strategy to remove the mismatching graph pairs with significant differences. Then a novel method of building indexes for each graph is proposed by grouping the nodes which can be reached in k hops for each key node with structure conservation, which is the k-hop tree based indexing method. As for each candidate pair, we propose a similarity computation algorithm with boundary filtering, which can be applied with good efficiency and effectiveness. Experiments on real and synthetic graph databases also confirm that our method can achieve good join quality in graph similarity join. Besides, the join process can be finished in polynomial time.

Journal ArticleDOI
TL;DR: In this paper, a fault tolerant neural network interval type-2 fuzzy sliding mode controller (FTNNIT2FSMC) is proposed for 6 DOF octorotor helicopter control in presence of actuator and sensor faults.
Abstract: In this paper, a robust controller for a six degrees of freedom (6 DOF) octorotor helicopter control is proposed in presence of actuator and sensor faults. Neural networks (NN), interval type-2 fuzzy logic control (IT2FLC) approach and sliding mode control (SMC) technique are used to design a controller, named fault tolerant neural network interval type-2 fuzzy sliding mode controller (FTNNIT2FSMC), for each subsystem of the octorotor helicopter. The proposed control scheme allows avoiding difficult modeling, attenuating the chattering effect of the SMC, reducing the number of rules for the fuzzy controller, and guaranteeing the stability and the robustness of the system. The simulation results show that the FTNNIT2FSMC can greatly alleviate the chattering effect, tracking well in presence of actuator and sensor faults.

Journal ArticleDOI
TL;DR: A comprehensive survey on event based analysis over social multimedia data, including event enrichment, detection, and categorization is provided, which introduces each paradigm and summarizes related research efforts.
Abstract: Recent years have witnessed the rapid growth of social multimedia data available over the Internet. The age of huge amount of media collection provides users facilities to share and access data, while it also demands the revolution of data management techniques, since the exponential growth of social multimedia requires more scalable, effective and robust technologies to manage and index them. The event is one of the most important cues to recall people's past memory. The reminder value of an event makes it extremely helpful in organizing data. The study of event based analysis on social multimedia data has drawn intensive attention in research community. In this article, we provide a comprehensive survey on event based analysis over social multimedia data, including event enrichment, detection, and categorization. We introduce each paradigm and summarize related research efforts. In addition, we also suggest the emerging trends in this research area.

Journal ArticleDOI
TL;DR: To overcome the small sample problem in hyperspectral image classification, correlation of spectral bands is fully utilized to generate multiple new sub-samples from each original sample.
Abstract: Because the labor needed to manually label a huge training sample set is usually not available, the problem of hyperspectral image classification often suffers from a lack of labeled training samples. At the same time, hyperspectral data represented in a large number of bands are usually highly correlated. In this paper, to overcome the small sample problem in hyperspectral image classification, correlation of spectral bands is fully utilized to generate multiple new sub-samples from each original sample. The number of labeled training samples is thus increased several times. Experiment results demonstrate that the proposed method has an obvious advantage when the number of labeled samples is small.