scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 2021"


Journal ArticleDOI
TL;DR: In this paper, a taxonomy is presented based on the main aspects that characterize an outlier detection technique in the context of time series, and a structured and comprehensive state-of-the-art on unsupervised anomaly detection techniques is provided.
Abstract: Recent advances in technology have brought major breakthroughs in data collection, enabling a large amount of data to be gathered over time and thus generating time series. Mining this data has become an important task for researchers and practitioners in the past few years, including the detection of outliers or anomalies that may represent errors or events of interest. This review aims to provide a structured and comprehensive state-of-the-art on unsupervised outlier detection techniques in the context of time series. To this end, a taxonomy is presented based on the main aspects that characterize an outlier detection technique.

302 citations


Journal ArticleDOI
TL;DR: In this article, Monte Carlo simulations were used to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation.
Abstract: When data are not normally distributed, researchers are often uncertain whether it is legitimate to use tests that assume Gaussian errors, or whether one has to either model a more specific error structure or use randomization techniques. Here we use Monte Carlo simulations to explore the pros and cons of fitting Gaussian models to non-normal data in terms of risk of type I error, power and utility for parameter estimation. We find that Gaussian models are robust to non-normality over a wide range of conditions, meaning that p values remain fairly reliable except for data with influential outliers judged at strict alpha levels. Gaussian models also performed well in terms of power across all simulated scenarios. Parameter estimates were mostly unbiased and precise except if sample sizes were small or the distribution of the predictor was highly skewed. Transformation of data before analysis is often advisable and visual inspection for outliers and heteroscedasticity is important for assessment. In strong contrast, some non-Gaussian models and randomization techniques bear a range of risks that are often insufficiently known. High rates of false-positive conclusions can arise for instance when overdispersion in count data is not controlled appropriately or when randomization procedures ignore existing non-independencies in the data. Hence, newly developed statistical methods not only bring new opportunities, but they can also pose new threats to reliability. We argue that violating the normality assumption bears risks that are limited and manageable, while several more sophisticated approaches are relatively error prone and particularly difficult to check during peer review. Scientists and reviewers who are not fully aware of the risks might benefit from preferentially trusting Gaussian mixed models in which random effects account for non-independencies in the data.

123 citations


Proceedings ArticleDOI
20 Jun 2021
TL;DR: PointDSC as mentioned in this paper proposes a non-local feature aggregation module, weighted by both feature and spatial coherence, for feature embedding of the input correspondences, and formulate a differentiable spectral matching module, supervised by pairwise spatial compatibility, to estimate the inlier confidence of each correspondence from the embedded features.
Abstract: Removing outlier correspondences is one of the critical steps for successful feature-based point cloud registration. Despite the increasing popularity of introducing deep learning techniques in this field, spatial consistency, which is essentially established by a Euclidean transformation between point clouds, has received almost no individual attention in existing learning frameworks. In this paper, we present PointDSC, a novel deep neural network that explicitly incorporates spatial consistency for pruning outlier correspondences. First, we propose a nonlocal feature aggregation module, weighted by both feature and spatial coherence, for feature embedding of the input correspondences. Second, we formulate a differentiable spectral matching module, supervised by pairwise spatial compatibility, to estimate the inlier confidence of each correspondence from the embedded features. With modest computation cost, our method outperforms the state-of-the-art hand- crafted and learning-based outlier rejection approaches on several real-world datasets by a significant margin. We also show its wide applicability by combining PointDSC with different 3D local descriptors. [code release]

112 citations


Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed an L₁-and-L₂-norm-oriented latent factor (L³F) model, which adopts twofold ideas: aggregating norm's robustness and norm's stability to form its loss and adaptively adjusting weights of L ₁ and L ³F in its loss.
Abstract: A recommender system (RS) is highly efficient in filtering people's desired information from high-dimensional and sparse (HiDS) data. To date, a latent factor (LF)-based approach becomes highly popular when implementing a RS. However, current LF models mostly adopt single distance-oriented Loss like an L₂ norm-oriented one, which ignores target data's characteristics described by other metrics like an L₁ norm-oriented one. To investigate this issue, this article proposes an L₁-and-L₂-norm-oriented LF (L³F) model. It adopts twofold ideas: 1) aggregating L₁ norm's robustness and L₂ norm's stability to form its Loss and 2) adaptively adjusting weights of L₁ and L₂ norms in its Loss. By doing so, it achieves fine aggregation effects with L₁ norm-oriented Loss's robustness and L₂ norm-oriented Loss's stability to precisely describe HiDS data with outliers. Experimental results on nine HiDS datasets generated by real systems show that an L³F model significantly outperforms state-of-the-art models in prediction accuracy for missing data of an HiDS dataset. Its computational efficiency is also comparable with the most efficient LF models. Hence, it has good potential for addressing HiDS data from real applications.

109 citations


Journal ArticleDOI
TL;DR: In this paper, a parameter-dependent set-membership filter was proposed for linear time-varying systems with norm-bounded noises and impulsive measurement outliers.
Abstract: This paper is concerned with the set-membership filtering problem for a class of linear time-varying systems with norm-bounded noises and impulsive measurement outliers. A new representation is proposed to model the measurement outlier by an impulsive signal whose minimum interval length (i.e., the minimum duration between two adjacent impulsive signals) and minimum norm (i.e., the minimum of the norms of all impulsive signals) are larger than certain thresholds that are adjustable according to engineering practice. In order to guarantee satisfactory filtering performance, a so-called parameter-dependent set-membership filter is put forward that is capable of generating a time-varying ellipsoidal region containing the true system state. First, a novel outlier detection strategy is developed, based on a dedicatedly constructed input-output model, to examine whether the received measurement is corrupted by an outlier. Then, through the outcome of the outlier detection, the gain matrix of the desired filter and the corresponding ellipsoidal region are calculated by solving two recursive difference equations. Furthermore, the ultimate boundedness issue on the time-varying ellipsoidal region is thoroughly investigated. Finally, a simulation example is provided to demonstrate the effectiveness of our proposed parameter-dependent set-membership filtering strategy.

87 citations


Journal ArticleDOI
TL;DR: A two-phase deep learning model is proposed and constructed for high-performance short-term wind direction forecasting and comparisons with benchmark prediction models show that the proposed network achieves superior performance.

84 citations


Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a smooth $L 1 -norm-oriented latent factor (SL-LF) model, which is more robust to outlier data.
Abstract: High-dimensional and sparse (HiDS) matrices commonly arise in various industrial applications, e.g., recommender systems (RSs), social networks, and wireless sensor networks. Since they contain rich information, how to accurately represent them is of great significance. A latent factor (LF) model is one of the most popular and successful ways to address this issue. Current LF models mostly adopt $L_{2}$ -norm-oriented Loss to represent an HiDS matrix, i.e., they sum the errors between observed data and predicted ones with $L_{2}$ -norm. Yet $L_{2}$ -norm is sensitive to outlier data. Unfortunately, outlier data usually exist in such matrices. For example, an HiDS matrix from RSs commonly contains many outlier ratings due to some heedless/malicious users. To address this issue, this work proposes a smooth $L_{1}$ -norm-oriented latent factor (SL-LF) model. Its main idea is to adopt smooth $L_{1}$ -norm rather than $L_{2}$ -norm to form its Loss, making it have both strong robustness and high accuracy in predicting the missing data of an HiDS matrix. Experimental results on eight HiDS matrices generated by industrial applications verify that the proposed SL-LF model not only is robust to the outlier data but also has significantly higher prediction accuracy than state-of-the-art models when they are used to predict the missing data of HiDS matrices.

84 citations


Journal ArticleDOI
TL;DR: The feasibility of detection of delamination is experimentally demonstrated, whose size is comparable to the ultrasonic wavelength with probability of detection better than 90% using <1% of the total number of samples required for conventional imaging, even under conditions wherein the SNR is as low as 5 dB.

75 citations


Posted Content
TL;DR: The proposed OpenGAN shows that a carefully selected GAN-discriminator on some real outlier data already achieves the state-of-the-art, and augments the available set of real open training examples with adversarially synthesized "fake" data, showing that Open-GAN significantly outperforms prior open-set methods.
Abstract: Real-world machine learning systems need to analyze novel testing data that differs from the training data. In K-way classification, this is crisply formulated as open-set recognition, core to which is the ability to discriminate open-set data outside the K closed-set classes. Two conceptually elegant ideas for open-set discrimination are: 1) discriminatively learning an open-vs-closed binary discriminator by exploiting some outlier data as the open-set, and 2) unsupervised learning the closed-set data distribution with a GAN and using its discriminator as the open-set likelihood function. However, the former generalizes poorly to diverse open test data due to overfitting to the training outliers, which unlikely exhaustively span the open-world. The latter does not work well, presumably due to the instable training of GANs. Motivated by the above, we propose OpenGAN, which addresses the limitation of each approach by combining them with several technical insights. First, we show that a carefully selected GAN-discriminator on some real outlier data already achieves the state-of-the-art. Second, we augment the available set of real open training examples with adversarially synthesized "fake" data. Third and most importantly, we build the discriminator over the features computed by the closed-world K-way networks. Extensive experiments show that OpenGAN significantly outperforms prior open-set methods.

72 citations


Journal ArticleDOI
TL;DR: A novel loss function is proposed that gives rise to a novel method, Outlier Exposure with Confidence Control (OECC), which achieves superior results in out-of-distribution detection with OE both on image and text classification tasks without requiring access to OOD samples.

67 citations


Journal ArticleDOI
TL;DR: A new adversarial network for simultaneous classification and fault detection is proposed and the discriminator of this model is designed to handle the generated faulty samples to prevent outlier and overfitting.

Journal ArticleDOI
TL;DR: A statistical similarity measure is introduced to quantify the similarity between two random vectors to develop a novel outlier-robust Kalman filtering framework and the approximation errors and the stability of the proposed filter are analyzed and discussed.
Abstract: In this article, a statistical similarity measure is introduced to quantify the similarity between two random vectors. The measure is, then, employed to develop a novel outlier-robust Kalman filtering framework. The approximation errors and the stability of the proposed filter are analyzed and discussed. To implement the filter, a fixed-point iterative algorithm and a separate iterative algorithm are given, and their local convergent conditions are also provided, and their comparisons have been made. In addition, selection of the similarity function is considered, and four exemplary similarity functions are established, from which the relations between our new method and existing outlier-robust Kalman filters are revealed. Simulation examples are used to illustrate the effectiveness and potential of the new filtering scheme.

Journal ArticleDOI
TL;DR: A novel parameter-dependent filtering approach is proposed to protect the filtering performance from IMOs by using a special outlier detection scheme, which is developed based on a particular input–output model.
Abstract: This paper is concerned with the ultimately bounded filtering problem for linear time-delay systems subject to norm-bounded disturbances and impulsive measurement outliers (IMOs). The considered IMOs are modeled by a sequence of impulsive signals with certain known minimum norm (i.e. the minimum of the norms of all impulsive signals). In order to characterize the occasional occurrence of IMOs, a sequence of independently and identically distributed random variables is introduced to depict the interval lengths (i.e. the durations between two adjacent IMOs) of the outliers. In order to achieve satisfactory filtering performance, a novel parameter-dependent filtering approach is proposed to protect the filtering performance from IMOs by using a special outlier detection scheme, which is developed based on a particular input-output model. The ultimate boundedness (in mean square) of the filtering error is investigated by using the stochastic analysis technique and Lyapunov-functional-like method. The desired filter gain matrix is derived through solving a constrained optimization problem. A simulation example is provided to demonstrate the effectiveness of our proposed

Posted Content
TL;DR: PatchCore as discussed by the authors uses a maximally representative memory bank of nominal patch-features, which achieves competitive inference times while achieving state-of-the-art performance for both detection and localization.
Abstract: Being able to spot defective parts is a critical component in large-scale industrial manufacturing. A particular challenge that we address in this work is the cold-start problem: fit a model using nominal (non-defective) example images only. While handcrafted solutions per class are possible, the goal is to build systems that work well simultaneously on many different tasks automatically. The best peforming approaches combine embeddings from ImageNet models with an outlier detection model. In this paper, we extend on this line of work and propose PatchCore, which uses a maximally representative memory bank of nominal patch-features. PatchCore offers competitive inference times while achieving state-of-the-art performance for both detection and localization. On the standard dataset MVTec AD, PatchCore achieves an image-level anomaly detection AUROC score of $99.1\%$, more than halving the error compared to the next best competitor. We further report competitive results on two additional datasets and also find competitive results in the few samples regime.

Journal ArticleDOI
TL;DR: In this article, a review of robust M-estimators in various knowledge areas is presented, including the Weighted Least Squares estimator, the Contaminated Normal estimator (quasi-robust), the Huber estimator and the Smith estimator.

Journal ArticleDOI
TL;DR: A novel self-paced dynamic infinite mixture model is presented to infer the dynamics of EEG fatigue signals and shows better performance in automatically identifying a pilot's brain workload.
Abstract: Current brain cognitive models are insufficient in handling outliers and dynamics of electroencephalogram (EEG) signals. This article presents a novel self-paced dynamic infinite mixture model to infer the dynamics of EEG fatigue signals. The instantaneous spectrum features provided by ensemble wavelet transform and Hilbert transform are extracted to form four fatigue indicators. The covariance of log likelihood of the complete data is proposed to accurately identify similar components and dynamics of the developed mixture model. Compared with its seven peers, the proposed model shows better performance in automatically identifying a pilot's brain workload.

Proceedings ArticleDOI
01 Jun 2021
TL;DR: Patch2Pix as discussed by the authors proposes a new perspective to estimate correspondences in a detect-to-refine manner, where they first predict patch-level match proposals and then refine them.
Abstract: The classical matching pipeline used for visual localization typically involves three steps: (i) local feature detection and description, (ii) feature matching, and (iii) outlier rejection. Recently emerged correspondence networks propose to perform those steps inside a single network but suffer from low matching resolution due to the memory bottle-neck. In this work, we propose a new perspective to estimate correspondences in a detect-to-refine manner, where we first predict patch-level match proposals and then refine them. We present Patch2Pix, a novel refinement network that refines match proposals by regressing pixel-level matches from the local regions defined by those proposals and jointly rejecting outlier matches with confidence scores. Patch2Pix is weakly supervised to learn correspondences that are consistent with the epipolar geometry of an input image pair. We show that our refinement network significantly improves the performance of correspondence networks on image matching, homography estimation, and localization tasks. In addition, we show that our learned refinement generalizes to fully-supervised methods without retraining, which leads us to state-of-the-art localization performance. The code is available at https://github.com/GrumpyZhou/patch2pix.

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed the Clustering with outlier removal (COR) algorithm, where the original space is transformed into a binary space via generating basic partitions, and an auxiliary binary matrix is introduced so that COR completely and efficiently solves the challenging problem via a unified K-means.
Abstract: Cluster analysis and outlier detection are two continuously rising topics in data mining area, which in fact connect to each other deeply. Cluster structure is vulnerable to outliers; inversely, outliers are the points belonging to none of any clusters. Unfortunately, most existing studies do not notice the coupled relationship between these two tasks and handle them separately. In this article, we consider the joint cluster analysis and outlier detection problem, and propose the Clustering with Outlier Removal (COR) algorithm. Specifically, the original space is transformed into a binary space via generating basic partitions. We employ Holoentropy to measure the compactness of each cluster without involving several outlier candidates. To provide a neat and efficient solution, an auxiliary binary matrix is introduced so that COR completely and efficiently solves the challenging problem via a unified K-means— with theoretical supports. Extensive experimental results on numerous data sets in various domains demonstrate the effectiveness and efficiency of COR significantly over state-of-the-art methods in terms of cluster validity and outlier detection. Some key factors including the basic partition number and generation strategy in COR with an application on abnormal flight trajectory detection are further analyzed for practical use.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a mean-shift outlier detector, which replaces every object by the mean of its k-nearest neighbors and detects outliers based on the distance shifted.

Journal ArticleDOI
TL;DR: This work proposes to use a certain confidence-dependent saturation function to mitigate the side effects from the measurement outliers on the estimation error dynamics (EEDs) to ensure that the corresponding EED achieves the asymptotic stability with a prescribed $H_\infty $ performance index.
Abstract: In this brief, a new outlier-resistant state estimation (SE) problem is addressed for a class of recurrent neural networks (RNNs) with mixed time-delays. The mixed time delays comprise both discrete and distributed delays that occur frequently in signal transmissions among artificial neurons. Measurement outputs are sometimes subject to abnormal disturbances (resulting probably from sensor aging/outages/faults/failures and unpredictable environmental changes) leading to measurement outliers that would deteriorate the estimation performance if directly taken into the innovation in the estimator design. We propose to use a certain confidence-dependent saturation function to mitigate the side effects from the measurement outliers on the estimation error dynamics (EEDs). Through using a combination of Lyapunov–Krasovskii functional and inequality manipulations, a delay-dependent criterion is established for the existence of the outlier-resistant state estimator ensuring that the corresponding EED achieves the asymptotic stability with a prescribed $H_\infty $ performance index. Then, the explicit characterization of the estimator gain is obtained by solving a convex optimization problem. Finally, numerical simulation is carried out to demonstrate the usefulness of the derived theoretical results.

Journal ArticleDOI
TL;DR: A saturation function is employed in the filter structure to constrain the innovations contaminated by the measurement outliers, thereby maintaining satisfactory filtering performance, and the exponential boundedness of the filtering error dynamics is analyzed in the mean square sense.
Abstract: In this article, a new outlier-resistant recursive filtering problem (RF) is studied for a class of multisensor multirate networked systems under the weighted try-once-discard (WTOD) protocol. The sensors are sampled with a period that is different from the state updating period of the system. In order to lighten the communication burden and alleviate the network congestions, the WTOD protocol is implemented in the sensor-to-filter channel to schedule the order of the data transmission of the sensors. In the case of the measurement outliers, a saturation function is employed in the filter structure to constrain the innovations contaminated by the measurement outliers, thereby maintaining satisfactory filtering performance. By resorting to the solution to a matrix difference equation, an upper bound is first obtained on the covariance of the filtering error, and the gain matrix of the filter is then characterized to minimize the derived upper bound. Furthermore, the exponential boundedness of the filtering error dynamics is analyzed in the mean square sense. Finally, the usefulness of the proposed outlier-resistant RF scheme is verified by simulation examples.

Journal ArticleDOI
TL;DR: Wang et al. as mentioned in this paper employed stacking technique of ensemble learning to establish rockburst prediction models, which exhibits unique advantages especially when using imbalanced data, and the impact of class imbalance on the prediction accuracy and fitting effect of models was quantitatively discussed.
Abstract: Rockburst is a common dynamic geological hazard, severely restricting the development and utilization of underground space and resources. As the depth of excavation and mining increases, rockburst tends to occur frequently. Hence, it is necessary to carry out a study on rockburst prediction. Due to the nonlinear relationship between rockburst and its influencing factors, artificial intelligence was introduced. However, the collected data were typically imbalanced. Single algorithms trained by such data have low recognition for minority classes. In order to handle the problem, this paper employed stacking technique of ensemble learning to establish rockburst prediction models. In total, 246 sets of data were collected. In the preprocessing stage, three data mining techniques including principal component analysis, local outlier factor and expectation maximization algorithm were used for dimension reduction, outlier detection and outlier substitution, respectively. Then, the pre-processed data were split into a training set (75%) and a test set (25%) with stratified sampling. Based on the four classical single intelligent algorithms, namely k-nearest neighbors (KNN), support vector machine (SVM), deep neural network (DNN) and recurrent neural network (RNN), four ensemble models (KNN–RNN, SVM–RNN, DNN–RNN and KNN–SVM–DNN–RNN) were built by stacking technique of ensemble learning. The prediction performance of eight models was evaluated, and the differences between single models and ensemble models were analyzed. Additionally, a sensitivity analysis was conducted, revealing the importance of input variables on the models. Finally, the impact of class imbalance on the prediction accuracy and fitting effect of models was quantitatively discussed. The results showed that stacking technique of ensemble learning provides a new and promising way for rockburst prediction, which exhibits unique advantages especially when using imbalanced data.

Journal ArticleDOI
01 Apr 2021
TL;DR: An architectural scheme for designing a threat intelligence technique for web attacks to address these challenges through a four-step methodology and demonstrates that the proposed scheme outperforms four other competing machine learning mechanisms in terms of detection rate and false alarm rates.
Abstract: Web application attacks constitute considerable security threats to computer networks and end users. Existing threat detection methods are mostly designed on signature-based approaches which cannot recognize zero-day vulnerabilities. Moreover, with the minimal availability of real-world web attack data, the effectiveness of such approaches is limited further. In this paper, we propose an architectural scheme for designing a threat intelligence technique for web attacks to address these challenges through a four-step methodology: 1) collecting web attack data by crawling websites and accumulating network traffic for representing this data as feature vectors; 2) dynamically extracting important features using the Association Rule Mining (ARM) algorithm; 3 ) using these extracted features to simulate web attack data; and 4) proposing a new Outlier Gaussian Mixture (OGM) technique for detecting known as well as zero-day attacks based on the anomaly detection methodology. The performance of the scheme is appraised using two well-known datasets, namely, the Web Attack and UNSW-NB15 datasets. The empirical evaluations demonstrate that the proposed scheme outperforms four other competing machine learning mechanisms in terms of detection rate and false alarm rates on both the original as well as simulated web data.

Journal ArticleDOI
TL;DR: Simulations show that the effectiveness of the proposed variational Bayesian adaptive Kalman filter with inaccurate noise covariances in the presence of outliers environments is effective.
Abstract: In this paper, a novel variational Bayesian (VB) adaptive Kalman filter with inaccurate nominal process and measurement noise covariances in the presence of outliers is proposed. The probability density functions of state transition and measurement likelihood are modeled as Gaussian-Gamma mixture distributions. The state, process and measurement noise covariances are jointly inferred by the VB technique. Computer simulations show that the proposed method has better filtering accuracy than existing state-of-the-art filters under outlier environments.

Journal ArticleDOI
15 Sep 2021-Energy
TL;DR: A combined architecture of Multivariate Long Short Term Memory (MLSTM) is proposed with Mahalanobis and Z-score transformations to improve the data to uncorrelated and standardized variance, thus making data more suitable for regression analysis.

Journal ArticleDOI
TL;DR: In this article, an outlier detection method based on clustering and local outlier factor (LOF) is proposed to detect electricity theft attacks in the smart grid, where customers whose load profiles are far from the cluster centers are selected as outlier candidates.
Abstract: As one of the key components of smart grid, advanced metering infrastructure (AMI) provides an immense number of data, making technologies such as data mining more suitable for electricity theft detection. However, due to the unbalanced dataset in the field of electricity theft, many AI-based methods such as deep learning are prone to under-fitting. To evade this problem and to detect as many types of theft attacks as possible, an outlier detection method based on clustering and local outlier factor (LOF) is proposed in this study. We firstly analyze the load profiles with $k$ -means. Then, customers whose load profiles are far from the cluster centers are selected as outlier candidates. After that, the LOF is utilized to calculate the anomaly degrees of outlier candidates. Corresponding framework for practical application is then designed. Finally, numerical experiments based on realistic dataset show the good performance of the presented method.

Journal ArticleDOI
TL;DR: The proposed wind power probability density forecasting method, based on cubic spline interpolation and support vector quantile regression (CSI-SVQR), not only efficiently eliminates the outliers of wind power but also provides the probability density function, offering a complete description ofWind power generation fluctuation.

Journal ArticleDOI
TL;DR: A self-adaptive mixture similarity function based on geometric distance and S-divergence is introduced for uncertain data clustering and it is demonstrated that the proposed method consistently defeats the state-of-the-art clustering algorithms.
Abstract: Nowadays, multi-view clustering is drawn more and more attention in the area of machine learning because real-world datasets frequently consist of multiple views. Moreover, it provides complementary and consensus information across multiple views. So, owing to the efficacy of revealing the concealed patterns in uncertain data, multiple views are considered in this study. But, a multi-view clustering algorithm is not alone sufficient to increase accuracy. A similarity measure is equally important in uncertain data clustering. However, existing similarity functions for clustering uncertain data afflict with several problems. Geometric distance-based similarity function cannot correctly capture the change between uncertain data with their distributions when they are massively location-wise overlapped. On the other hand, the divergence-based similarity function cannot discriminate against the change between various duos of absolutely disjointed uncertain data. Thus, a self-adaptive mixture similarity function based on geometric distance and S-divergence is introduced for uncertain data clustering. The proposed similarity function is integrated with k-medoids based multi-view clustering. The proposed method reduces the effect of outliers and noises since it uses the threshold-based residual objective function in k-medoids. Finally, extensive experimental results on synthetic and real-world uncertain datasets illustrate that the proposed method consistently defeats the state-of-the-art clustering algorithms. Experimental results also demonstrate the effectiveness and robustness of the proposed method against noise and outliers.

Journal ArticleDOI
TL;DR: This work model anomalies as persistent outliers and propose to detect them via a cumulative sum-like algorithm via an asymptotic lower bound and an ascyptotic approximation for the average false alarm period of the proposed algorithm.
Abstract: Timely detection of abrupt anomalies is crucial for real-time monitoring and security of modern systems producing high-dimensional data. With this goal, we propose effective and scalable algorithms. Proposed algorithms are nonparametric as both the nominal and anomalous multivariate data distributions are assumed unknown. We extract useful univariate summary statistics and perform anomaly detection in a single-dimensional space. We model anomalies as persistent outliers and propose to detect them via a cumulative sum-like algorithm. In case the observed data have a low intrinsic dimensionality, we find a submanifold in which the nominal data are embedded and evaluate whether the sequentially acquired data persistently deviate from the nominal submanifold. Further, in the general case, we determine an acceptance region for nominal data via Geometric Entropy Minimization and evaluate whether the sequentially observed data persistently fall outside the acceptance region. We provide an asymptotic lower bound and an asymptotic approximation for the average false alarm period of the proposed algorithm. Moreover, we provide a sufficient condition to asymptotically guarantee that the decision statistic of the proposed algorithm does not diverge in the absence of anomalies. Experiments illustrate the effectiveness of the proposed schemes in quick and accurate anomaly detection in high-dimensional settings.

Journal ArticleDOI
Zhong Yuan1, Hongmei Chen1, Tianrui Li1, Jia Liu1, Shu Wang1 
TL;DR: A hybrid feature outlier detection method based on fuzzy information entropy by using fuzzy approximate space with fuzzy similarity relation is constructed and the FIEOD algorithm is compared with the main outlier Detection algorithms on public data.