scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 2017"


Journal ArticleDOI
TL;DR: An R package, visreg, is introduced for the convenient visualization of this relationship between an outcome and an explanatory variable via short, simple function calls and provides pointwise condence bands and partial residuals to allow assessment of variability as well as outliers and other deviations from modeling assumptions.
Abstract: Regression models allow one to isolate the relationship between the outcome and an explanatory variable while the other variables are held constant. Here, we introduce an R package, visreg, for the convenient visualization of this relationship via short, simple function calls. In addition to estimates of this relationship, the package also provides pointwise condence bands and partial residuals to allow assessment of variability as well as outliers and other deviations from modeling assumptions. The package provides several options for visualizing models with interactions, including lattice plots, contour plots, and both static and interactive perspective plots. The implementation of the package is designed to be fully object-oriented and interface seamlessly with R’s rich collection of model classes, allowing a consistent interface for visualizing not only linear models, but generalized linear models, proportional hazards models, generalized additive models, robust regression models, and many more.

682 citations


Proceedings ArticleDOI
13 Aug 2017
TL;DR: This work proposes a new approach to detect outliers in streaming univariate time series based on Extreme Value Theory that does not require to hand-set thresholds and makes no assumption on the distribution: the main parameter is only the risk, controlling the number of false positives.
Abstract: Anomaly detection in time series has attracted considerable attention due to its importance in many real-world applications including intrusion detection, energy management and finance. Most approaches for detecting outliers rely on either manually set thresholds or assumptions on the distribution of data according to Chandola, Banerjee and Kumar. Here, we propose a new approach to detect outliers in streaming univariate time series based on Extreme Value Theory that does not require to hand-set thresholds and makes no assumption on the distribution: the main parameter is only the risk, controlling the number of false positives. Our approach can be used for outlier detection, but more generally for automatically setting thresholds, making it useful in wide number of situations. We also experiment our algorithms on various real-world datasets which confirm its soundness and efficiency.

265 citations


Journal ArticleDOI
TL;DR: It is empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting.
Abstract: We empirically show that Bayesian inference can be inconsistent under misspecification in simple linear regression problems, both in a model averaging/selection and in a Bayesian ridge regression setting. We use the standard linear model, which assumes homoskedasticity, whereas the data are heteroskedastic (though, significantly, there are no outliers). As sample size increases, the posterior puts its mass on worse and worse models of ever higher dimension. This is caused by hypercompression, the phenomenon that the posterior puts its mass on distributions that have much larger KL divergence from the ground truth than their average, i.e. the Bayes predictive distribution. To remedy the problem, we equip the likelihood in Bayes' theorem with an exponent called the learning rate, and we propose the SafeBayesian method to learn the learning rate from the data. SafeBayes tends to select small learning rates, and regularizes more, as soon as hypercompression takes place. Its results on our data are quite encouraging.

227 citations


Journal ArticleDOI
Yang Zhao1, Peng Liu1, Zhenpo Wang1, Lei Zhang1, Jichao Hong1 
TL;DR: Applying the neural network algorithm, this paper combines fault and defect diagnosis results with big data statistical regulation to construct a more complete battery system fault diagnosis model.

226 citations


Journal ArticleDOI
TL;DR: A simple and effective density-based outlier detection approach with local kernel density estimation (KDE) and a Relative Density-based Outlier Score (RDOS) is introduced to measure local outlierness of objects.

201 citations


Journal ArticleDOI
TL;DR: An algorithm to decrease the size of the training set for kNN regression(DISKR) by firstly removing the outlier instances that impact the performance of regressor, and then sorts the left instances by the difference on output among instances and their nearest neighbors.

196 citations


Journal ArticleDOI
TL;DR: This paper proposes a k-means-type algorithm that is able to provide data clustering and outlier detection simultaneously by incorporating an additional cluster into the objective function and designs an iterative procedure to optimize the objectivefunction of the proposed algorithm and establish the convergence of the Iterative procedure.

167 citations


Proceedings ArticleDOI
01 Oct 2017
TL;DR: In particular, this paper showed that the complexity of learning a Gaussian mixture model is exponential in the dimension of the latent space, and showed that statistical query algorithms can be implemented in polynomial time.
Abstract: We describe a general technique that yields the first Statistical Query lower bounds} fora range of fundamental high-dimensional learning problems involving Gaussian distributions. Our main results are for the problems of (1) learning Gaussian mixture models (GMMs), and (2) robust (agnostic) learning of a single unknown Gaussian distribution. For each of these problems, we show a super-polynomial gap} between the (information-theoretic)sample complexity and the computational complexity of any} Statistical Query algorithm for the problem. Statistical Query (SQ) algorithms are a class of algorithms that are only allowed to query expectations of functions of the distribution rather than directly access samples. This class of algorithms is quite broad: a wide range of known algorithmic techniques in machine learning are known to be implementable using SQs.Moreover, for the unsupervised learning problems studied in this paper, all known algorithms with non-trivial performance guarantees are SQ or are easily implementable using SQs. Our SQ lower bound for Problem (1)is qualitatively matched by known learning algorithms for GMMs. At a conceptual level, this result implies that – as far as SQ algorithms are concerned – the computational complexity of learning GMMs is inherently exponential in the dimension of the latent space} – even though there is no such information-theoretic barrier. Our lower bound for Problem (2) implies that the accuracy of the robust learning algorithm in \cite{DiakonikolasKKLMS16} is essentially best possible among all polynomial-time SQ algorithms. On the positive side, we also give a new (SQ) learning algorithm for Problem (2) achievingthe information-theoretically optimal accuracy, up to a constant factor, whose running time essentially matches our lower bound. Our algorithm relies on a filtering technique generalizing \cite{DiakonikolasKKLMS16} that removes outliers based on higher-order tensors.Our SQ lower bounds are attained via a unified moment-matching technique that is useful in other contexts and may be of broader interest. Our technique yields nearly-tight lower bounds for a number of related unsupervised estimation problems. Specifically, for the problems of (3) robust covariance estimation in spectral norm, and (4) robust sparse mean estimation, we establish a quadratic statistical–computational tradeoff} for SQ algorithms, matching known upper bounds. Finally, our technique can be used to obtain tight sample complexitylower bounds for high-dimensional testing} problems. Specifically, for the classical problem of robustly testing} an unknown mean (known covariance) Gaussian, our technique implies an information-theoretic sample lower bound that scales linearly} in the dimension. Our sample lower bound matches the sample complexity of the corresponding robust learning} problem and separates the sample complexity of robust testing from standard (non-robust) testing. This separation is surprising because such a gap does not exist for the corresponding learning problem.

153 citations


Journal ArticleDOI
24 Mar 2017-PLOS ONE
TL;DR: Though it has been commonly accepted that there is no single best accuracy measure, it is suggested that UMBRAE could be a good choice to evaluate forecasting methods, especially for cases where measures based on geometric mean of relative errors, such as the geometric mean relative absolute error, are preferred.
Abstract: Many accuracy measures have been proposed in the past for time series forecasting comparisons. However, many of these measures suffer from one or more issues such as poor resistance to outliers and scale dependence. In this paper, while summarising commonly used accuracy measures, a special review is made on the symmetric mean absolute percentage error. Moreover, a new accuracy measure called the Unscaled Mean Bounded Relative Absolute Error (UMBRAE), which combines the best features of various alternative measures, is proposed to address the common issues of existing measures. A comparative evaluation on the proposed and related measures has been made with both synthetic and real-world data. The results indicate that the proposed measure, with user selectable benchmark, performs as well as or better than other measures on selected criteria. Though it has been commonly accepted that there is no single best accuracy measure, we suggest that UMBRAE could be a good choice to evaluate forecasting methods, especially for cases where measures based on geometric mean of relative errors, such as the geometric mean relative absolute error, are preferred.

150 citations


Journal ArticleDOI
TL;DR: CoP is the first robust PCA algorithm that is simultaneously non-iterative, provably robust to both unstructuring and structured outliers, and can tolerate a large number of unstructured outliers.
Abstract: This paper presents a remarkably simple, yet powerful, algorithm termed coherence pursuit (CoP) to robust principal component analysis (PCA). As inliers lie in a low-dimensional subspace and are mostly correlated, an inlier is likely to have strong mutual coherence with a large number of data points. By contrast, outliers either do not admit low-dimensional structures or form small clusters. In either case, an outlier is unlikely to bear strong resemblance to a large number of data points. Given that, CoP sets an outlier apart from an inlier by comparing their coherence with the rest of the data points. The mutual coherences are computed by forming the Gram matrix of the normalized data points. Subsequently, the sought subspace is recovered from the span of the subset of the data points that exhibit strong coherence with the rest of the data. As CoP only involves one simple matrix multiplication, it is significantly faster than the state-of-the-art robust PCA algorithms. We derive analytical performance guarantees for CoP under different models for the distributions of inliers and outliers in both noise-free and noisy settings. CoP is the first robust PCA algorithm that is simultaneously non-iterative, provably robust to both unstructured and structured outliers, and can tolerate a large number of unstructured outliers.

142 citations


Journal ArticleDOI
TL;DR: Numerical studies and an industrial application of process network planning demonstrate that, the proposed data-driven approach can effectively utilize useful information with massive data, and better hedge against uncertainties and yield less conservative solutions.

Posted Content
TL;DR: In this paper, the adaptive Huber regression is proposed for robust estimation and inference of regression parameters in both low and high dimensions, where the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness.
Abstract: Big data can easily be contaminated by outliers or contain variables with heavy-tailed distributions, which makes many conventional methods inadequate. To address this challenge, we propose the adaptive Huber regression for robust estimation and inference. The key observation is that the robustification parameter should adapt to the sample size, dimension and moments for optimal tradeoff between bias and robustness. Our theoretical framework deals with heavy-tailed distributions with bounded $(1+\delta)$-th moment for any $\delta > 0$. We establish a sharp phase transition for robust estimation of regression parameters in both low and high dimensions: when $\delta \geq 1$, the estimator admits a sub-Gaussian-type deviation bound without sub-Gaussian assumptions on the data, while only a slower rate is available in the regime $0<\delta< 1$. Furthermore, this transition is smooth and optimal. In addition, we extend the methodology to allow both heavy-tailed predictors and observation noise. Simulation studies lend further support to the theory. In a genetic study of cancer cell lines that exhibit heavy-tailedness, the proposed methods are shown to be more robust and predictive.

Journal ArticleDOI
TL;DR: A cluster-based data analysis framework is proposed using recursive principal component analysis (R-PCA), which can aggregate the redundant data and detect the outliers in the meantime and efficiently aggregates the correlated sensor data with high recovery accuracy.
Abstract: Internet of Things (IoT) is emerging as the underlying technology of our connected society, which enables many advanced applications. In IoT-enabled applications, information of application surroundings is gathered by networked sensors, especially wireless sensors due to their advantage of infrastructure-free deployment. However, the pervasive deployment of wireless sensor nodes generate massive amount of sensor data, and data outliers are frequently incurred due to the dynamic nature of wireless channels. As operation of IoT systems relies on sensor data, data redundancy and data outliers could significantly reduce the effectiveness of IoT applications or even mislead systems into unsafe conditions. In this paper, a cluster-based data analysis framework is proposed using recursive principal component analysis (R-PCA), which can aggregate the redundant data and detect the outliers in the meantime. More specifically, at a cluster head, spatially correlated sensor data collected from cluster members are aggregated by extracting the principal components (PCs), and potential data outliers are determined by the abnormal squared prediction error score, which is defined as the square of residual value after extraction of PCs. With R-PCA, the parameters of PCA model can be recursively updated to adapt to the changes in IoT systems. Cluster-based data analysis framework also releases the computational and processing burdens on sensor nodes. Practical databases-based simulations have confirmed that the proposed framework efficiently aggregates the correlated sensor data with high recovery accuracy. The data outlier detection accuracy is also improved by the proposed method compared to other existing algorithms.

Journal ArticleDOI
TL;DR: In this article, a new outlier-robust Student's t based Gaussian approximate filter is proposed to address the heavy-tailed process and measurement noises induced by the outlier measurements of velocity and range in cooperative localization of autonomous underwater vehicles (AUVs).
Abstract: In this paper, a new outlier-robust Student's t based Gaussian approximate filter is proposed to address the heavy-tailed process and measurement noises induced by the outlier measurements of velocity and range in cooperative localization of autonomous underwater vehicles (AUVs). The state vector, scale matrices, and degrees of freedom (DOF) parameters are jointly estimated based on the variational Bayesian approach by using the constructed Student's t based hierarchical Gaussian state-space model. The performances of the proposed filter and existing filters are tested in the cooperative localization of an AUV through a lake trial. Experimental results illustrate that the proposed filter has better localization accuracy and robustness than existing state-of-the-art outlier-robust filters.

Journal Article
TL;DR: In this paper, the authors propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods, based on splitting the data into nonoverlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures.
Abstract: We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups, evaluating the posterior distribution given each independent subgroup, and then combining the resulting measures. The main novelty of our approach is the proposed aggregation step, which is based on the evaluation of a median in the space of probability measures equipped with a suitable collection of distances that can be quickly and efficiently evaluated in practice. We present both theoretical and numerical evidence illustrating the improvements achieved by our method.

Book
12 Apr 2017
TL;DR: This book discusses a variety of methods for outlier ensembles and organizes them by the specific principles with which accuracy improvements are achieved, and covers the techniques with which such methods can be made more effective.
Abstract: This book discusses a variety of methods for outlier ensembles and organizes them by the specific principles with which accuracy improvements are achieved. In addition, it covers the techniques with which such methods can be made more effective. A formal classification of these methods is provided, and the circumstances in which they work well are examined. The authors cover how outlier ensembles relate (both theoretically and practically) to the ensemble techniques used commonly for other data mining problems like classification. The similarities and (subtle) differences in the ensemble techniques for the classification and outlier detection problems are explored. These subtle differences do impact the design of ensemble algorithms for the latter problem. This book can be used for courses in data mining and related curricula. Many illustrative examples and exercises are provided in order to facilitate classroom teaching. A familiarity is assumed to the outlier detection problem and also to generic problem of ensemble analysis in classification. This is because many of the ensemble methods discussed in this book are adaptations from their counterparts in the classification domain. Some techniques explained in this book, such as wagging, randomized feature weighting, and geometric subsampling, provide new insights that are not available elsewhere. Also included is an analysis of the performance of various types of base detectors and their relative effectiveness. The book is valuable for researchers and practitioners for leveraging ensemble methods into optimal algorithmic design.

Journal ArticleDOI
TL;DR: This survey describes a comprehensive overview of existing outlier detection techniques specifically used for the wireless sensor networks and presents a comparative table used as a guideline to select which technique is adequate for the application in terms of characteristics such as detection mode, architectural structure and correlation extraction.

Posted Content
TL;DR: This work introduces a criterion, resilience, which allows properties of a dataset to be robustly computed, even in the presence of a large fraction of arbitrary additional data, and provides new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded kth moments.
Abstract: We introduce a criterion, resilience, which allows properties of a dataset (such as its mean or best low rank approximation) to be robustly computed, even in the presence of a large fraction of arbitrary additional data. Resilience is a weaker condition than most other properties considered so far in the literature, and yet enables robust estimation in a broader variety of settings. We provide new information-theoretic results on robust distribution learning, robust estimation of stochastic block models, and robust mean estimation under bounded $k$th moments. We also provide new algorithmic results on robust distribution learning, as well as robust mean estimation in $\ell_p$-norms. Among our proof techniques is a method for pruning a high-dimensional distribution with bounded $1$st moments to a stable "core" with bounded $2$nd moments, which may be of independent interest.

Journal ArticleDOI
TL;DR: A data-level solution has been offered to the concerned problem with novelty in effective elimination of majority instances without losing valuable information by amalgamating aspects of outlier and redundancy detection to the baseline system.

Journal ArticleDOI
TL;DR: A hybrid semi-supervised anomaly detection model for high-dimensional data that consists of a deep autoencoder (DAE) and an ensemble k-nearest neighbor graphs- (K-NNG-) based anomaly detector is proposed.
Abstract: Anomaly detection, which aims to identify observations that deviate from a nominal sample, is a challenging task for high-dimensional data. Traditional distance-based anomaly detection methods compute the neighborhood distance between each observation and suffer from the curse of dimensionality in high-dimensional space; for example, the distances between any pair of samples are similar and each sample may perform like an outlier. In this paper, we propose a hybrid semi-supervised anomaly detection model for high-dimensional data that consists of two parts: a deep autoencoder (DAE) and an ensemble -nearest neighbor graphs- (-NNG-) based anomaly detector. Benefiting from the ability of nonlinear mapping, the DAE is first trained to learn the intrinsic features of a high-dimensional dataset to represent the high-dimensional data in a more compact subspace. Several nonparametric KNN-based anomaly detectors are then built from different subsets that are randomly sampled from the whole dataset. The final prediction is made by all the anomaly detectors. The performance of the proposed method is evaluated on several real-life datasets, and the results confirm that the proposed hybrid model improves the detection accuracy and reduces the computational complexity.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: A new outlier detection method that combines tools from sparse representation with random walks on a graph and establishes a connection between inliers/outliers and essential/inessential states of the Markov chain, which allows us to detect outliers by using random walks.
Abstract: Many computer vision tasks involve processing large amounts of data contaminated by outliers, which need to be detected and rejected. While outlier detection methods based on robust statistics have existed for decades, only recently have methods based on sparse and low-rank representation been developed along with guarantees of correct outlier detection when the inliers lie in one or more low-dimensional subspaces. This paper proposes a new outlier detection method that combines tools from sparse representation with random walks on a graph. By exploiting the property that data points can be expressed as sparse linear combinations of each other, we obtain an asymmetric affinity matrix among data points, which we use to construct a weighted directed graph. By defining a suitable Markov Chain from this graph, we establish a connection between inliers/outliers and essential/inessential states of the Markov chain, which allows us to detect outliers by using random walks. We provide a theoretical analysis that justifies the correctness of our method under geometric and connectivity assumptions. Experimental results on image databases demonstrate its superiority with respect to state-of-the-art sparse and low-rank outlier detection methods.

Journal ArticleDOI
TL;DR: Comparisons to other robust randomised neural modelling techniques, including the probabilistic robust learning algorithm for neural networks with random weights and improved RVFL networks, indicate that the proposed RSCNs with KDE perform favourably and demonstrate good potential for real-world applications.

Proceedings ArticleDOI
01 Jul 2017
TL;DR: This work proposes an outlier-robust tensor principle component analysis method for simultaneous low-rank tensor recovery and outlier detection and develops a fast randomized algorithm that requires small sampling size yet can substantially accelerate OR-TPCA without performance drop.
Abstract: Low-rank tensor analysis is important for various real applications in computer vision. However, existing methods focus on recovering a low-rank tensor contaminated by Gaussian or gross sparse noise and hence cannot effectively handle outliers that are common in practical tensor data. To solve this issue, we propose an outlier-robust tensor principle component analysis (OR-TPCA) method for simultaneous low-rank tensor recovery and outlier detection. For intrinsically low-rank tensor observations with arbitrary outlier corruption, OR-TPCA is the first method that has provable performance guarantee for exactly recovering the tensor subspace and detecting outliers under mild conditions. Since tensor data are naturally high-dimensional and multi-way, we further develop a fast randomized algorithm that requires small sampling size yet can substantially accelerate OR-TPCA without performance drop. Experimental results on four tasks: outlier detection, clustering, semi-supervised and supervised learning, clearly demonstrate the advantages of our method.

Book ChapterDOI
04 Oct 2017
TL;DR: A local model of the ID of smooth functions is proposed and it is shown that under appropriate smoothness conditions, the cumulative distribution function of a distance distribution can be completely characterized by an equivalent notion of data discriminability.
Abstract: Researchers have long considered the analysis of similarity applications in terms of the intrinsic dimensionality (ID) of the data. This theory paper is concerned with a generalization of a discrete measure of ID, the expansion dimension, to the case of smooth functions in general, and distance distributions in particular. A local model of the ID of smooth functions is first proposed and then explained within the well-established statistical framework of extreme value theory (EVT). Moreover, it is shown that under appropriate smoothness conditions, the cumulative distribution function of a distance distribution can be completely characterized by an equivalent notion of data discriminability. As the local ID model makes no assumptions on the nature of the function (or distribution) other than continuous differentiability, its extreme generality makes it ideally suited for the non-parametric or unsupervised learning tasks that often arise in similarity applications. An extension of the local ID model is also provided that allows the local assessment of the rate of change of function growth, which is then shown to have potential implications for the detection of inliers and outliers.

Journal ArticleDOI
TL;DR: In this paper, the interquartile range (IQR) of magnitude measurements is used to detect variability on time-scales from minutes to decades, and it can be complemented by the ratio of the lightcurve variance to the mean square successive difference, 1/h, which is efficient in detecting variability on timescales longer than the typical time interval between observations.
Abstract: Photometric measurements are prone to systematic errors presenting a challenge to low-amplitude variability detection. In search for a general-purpose variability detection technique able to recover a broad range of variability types including currently unknown ones, we test 18 statistical characteristics quantifying scatter and/or correlation between brightness measurements. We compare their performance in identifying variable objects in seven time series data sets obtained with telescopes ranging in size from a telephoto lens to 1m-class and probing variability on time-scales from minutes to decades. The test data sets together include lightcurves of 127539 objects, among them 1251 variable stars of various types and represent a range of observing conditions often found in ground-based variability surveys. The real data are complemented by simulations. We propose a combination of two indices that together recover a broad range of variability types from photometric data characterized by a wide variety of sampling patterns, photometric accuracies, and percentages of outlier measurements. The first index is the interquartile range (IQR) of magnitude measurements, sensitive to variability irrespective of a time-scale and resistant to outliers. It can be complemented by the ratio of the lightcurve variance to the mean square successive difference, 1/h, which is efficient in detecting variability on time-scales longer than the typical time interval between observations. Variable objects have larger 1/h and/or IQR values than non-variable objects of similar brightness. Another approach to variability detection is to combine many variability indices using principal component analysis. We present 124 previously unknown variable stars found in the test data.

Posted Content
TL;DR: A novel framework to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN), which can significantly outperform baseline methods when applied for detecting network outliers.
Abstract: In this paper, we propose a novel framework, called Semi-supervised Embedding in Attributed Networks with Outliers (SEANO), to learn a low-dimensional vector representation that systematically captures the topological proximity, attribute affinity and label similarity of vertices in a partially labeled attributed network (PLAN). Our method is designed to work in both transductive and inductive settings while explicitly alleviating noise effects from outliers. Experimental results on various datasets drawn from the web, text and image domains demonstrate the advantages of SEANO over state-of-the-art methods in semi-supervised classification under transductive as well as inductive settings. We also show that a subset of parameters in SEANO is interpretable as outlier score and can significantly outperform baseline methods when applied for detecting network outliers. Finally, we present the use of SEANO in a challenging real-world setting -- flood mapping of satellite images and show that it is able to outperform modern remote sensing algorithms for this task.

Journal ArticleDOI
01 Mar 2017
TL;DR: This work proposes a simple local search-based algorithm for k-means clustering with outliers and proves that this algorithm achieves constant-factor approximate solutions and can be combined with known sketching techniques to scale to large data sets.
Abstract: We study the problem of k-means clustering in the presence of outliers. The goal is to cluster a set of data points to minimize the variance of the points assigned to the same cluster, with the freedom of ignoring a small set of data points that can be labeled as outliers. Clustering with outliers has received a lot of attention in the data processing community, but practical, efficient, and provably good algorithms remain unknown for the most popular k-means objective.Our work proposes a simple local search-based algorithm for k-means clustering with outliers. We prove that this algorithm achieves constant-factor approximate solutions and can be combined with known sketching techniques to scale to large data sets. Using empirical evaluation on both synthetic and large-scale real-world data, we demonstrate that the algorithm dominates recently proposed heuristic approaches for the problem.

Journal ArticleDOI
TL;DR: A novel deep structured framework to solve the challenging sequential outlier detection problem using autoencoder models to capture the intrinsic difference between outliers and normal instances and integrating the models to recurrent neural networks that allow the learning to make use of previous context as well as make the learners more robust to warp along the time axis is introduced.
Abstract: Unsupervised outlier detection is a vital task and has high impact on a wide variety of applications domains, such as image analysis and video surveillance. It also gains long-standing attentions and has been extensively studied in multiple research areas. Detecting and taking action on outliers as quickly as possible are imperative in order to protect network and related stakeholders or to maintain the reliability of critical systems. However, outlier detection is difficult due to the one class nature and challenges in feature construction. Sequential anomaly detection is even harder with more challenges from temporal correlation in data, as well as the presence of noise and high dimensionality. In this paper, we introduce a novel deep structured framework to solve the challenging sequential outlier detection problem. We use autoencoder models to capture the intrinsic difference between outliers and normal instances and integrate the models to recurrent neural networks that allow the learning to make use of previous context as well as make the learners more robust to warp along the time axis. Furthermore, we propose to use a layerwise training procedure, which significantly simplifies the training procedure and hence helps achieve efficient and scalable training. In addition, we investigate a fine-tuning step to update all parameters set by incorporating the temporal correlation in the sequence. We further apply our proposed models to conduct systematic experiments on five real-world benchmark data sets. Experimental results demonstrate the effectiveness of our model, compared with other state-of-the-art approaches.

Journal ArticleDOI
TL;DR: In this article, the authors derive a robust multi-label active learning algorithm based on an MCC by merging uncertainty and representativeness, and propose an efficient alternating optimization method to solve it.
Abstract: Multi-label learning draws great interests in many real world applications. It is a highly costly task to assign many labels by the oracle for one instance. Meanwhile, it is also hard to build a good model without diagnosing discriminative labels. Can we reduce the label costs and improve the ability to train a good model for multi-label learning simultaneously? Active learning addresses the less training samples problem by querying the most valuable samples to achieve a better performance with little costs. In multi-label active learning, some researches have been done for querying the relevant labels with less training samples or querying all labels without diagnosing the discriminative information. They all cannot effectively handle the outlier labels for the measurement of uncertainty. Since maximum correntropy criterion (MCC) provides a robust analysis for outliers in many machine learning and data mining algorithms, in this paper, we derive a robust multi-label active learning algorithm based on an MCC by merging uncertainty and representativeness, and propose an efficient alternating optimization method to solve it. With MCC, our method can eliminate the influence of outlier labels that are not discriminative to measure the uncertainty. To make further improvement on the ability of information measurement, we merge uncertainty and representativeness with the prediction labels of unknown data. It cannot only enhance the uncertainty but also improve the similarity measurement of multi-label data with labels information. Experiments on benchmark multi-label data sets have shown a superior performance than the state-of-the-art methods.

Journal ArticleDOI
TL;DR: The experimental results indicate that the proposed hybrid model has the best forecasting performance in the comparisons of all the involved mainstream wind speed forecasting models.