scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 2009"


Journal ArticleDOI
TL;DR: The use of a similar approach for other kinds of correlated offset (such as overall measurement bias or regional offsets in the calibration curve) is discussed and the implementation of these methods in OxCal v 4.0 is presented.
Abstract: The wide availability of precise radiocarbon dates has allowed researchers in a number of disciplines to address chronological questions at a resolution which was not possible 10 or 20 years ago. The use of Bayesian statistics for the analysis of groups of dates is becoming a common way to integrate all of the 14C evidence together. However, the models most often used make a number of assumptions that may not always be appropriate. In particular, there is an assumption that all of the 14C measurements are correct in their context and that the original 14C concentration of the sample is properly represented by the calibration curve. In practice, in any analysis of dates some are usually rejected as obvious outliers. However, there are Bayesian statistical methods which can be used to perform this rejection in a more objective way (Christen 1994b), but these are not often used. This paper discusses the underlying statistics and application of these methods, and extensions of them, as they are implemented in OxCal v 4.1. New methods are presented for the treatment of outliers, where the problems lie principally with the context rather than the 14C measurement. There is also a full treatment of outlier analysis for samples that are all of the same age, which takes account of the uncertainty in the calibration curve. All of these Bayesian approaches can be used either for outlier detection and rejection or in a model averaging approach where dates most likely to be outliers are downweighted. Another important subject is the consistent treatment of correlated uncertainties between a set of measurements and the calibration curve. This has already been discussed by Jones and Nicholls (2001) in the case of marine reservoir offsets. In this paper, the use of a similar approach for other kinds of correlated offset (such as overall measurement bias or regional offsets in the calibration curve) is discussed and the implementation of these methods in OxCal v 4.0 is presented.

917 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose a new definition of depth for functional observations based on the graphic representation of the curves, which establishes the centrality of an observation and provides a natural center-outward ordering of the sample curves.
Abstract: The statistical analysis of functional data is a growing need in many research areas. In particular, a robust methodology is important to study curves, which are the output of many experiments in applied statistics. As a starting point for this robust analysis, we propose, analyze, and apply a new definition of depth for functional observations based on the graphic representation of the curves. Given a collection of functions, it establishes the “centrality” of an observation and provides a natural center-outward ordering of the sample curves. Robust statistics, such as the median function or a trimmed mean function, can be defined from this depth definition. Its finite-dimensional version provides a new depth for multivariate data that is computationally feasible and useful for studying high-dimensional observations. Thus, this new depth is also suitable for complex observations such as microarray data, images, and those arising in some recent marketing and financial studies. Natural properties of these ...

534 citations


Proceedings ArticleDOI
02 Nov 2009
TL;DR: A local density based outlier detection method providing an outlier "score" in the range of [0, 1] that is directly interpretable as a probability of a data object for being an outliest.
Abstract: Many outlier detection methods do not merely provide the decision for a single data object being or not being an outlier but give also an outlier score or "outlier factor" signaling "how much" the respective data object is an outlier. A major problem for any user not very acquainted with the outlier detection method in question is how to interpret this "factor" in order to decide for the numeric score again whether or not the data object indeed is an outlier. Here, we formulate a local density based outlier detection method providing an outlier "score" in the range of [0, 1] that is directly interpretable as a probability of a data object for being an outlier.

481 citations


Proceedings ArticleDOI
01 Jan 2009
TL;DR: RANSAC (Random Sample Consensus) has been popular in regression problem with samples contaminated with outliers, but there are a few survey and performance analysis on them.
Abstract: RANSAC (Random Sample Consensus) has been popular in regression problem with samples contaminated with outliers. It has been a milestone of many researches on robust estimators, but there are a few survey and performance analysis on them. This paper categorizes them on their objectives: being accurate, being fast, and being robust. Performance evaluation performed on line fitting with various data distribution. Planar homography estimation was utilized to present performance in real data.

449 citations


Book ChapterDOI
19 Apr 2009
TL;DR: A novel Local Distance-based Outlier Factor (LDOF) to measure the outlier-ness of objects in scattered datasets which is less sensitive to parameter values and compares favorably to classical KNN and LOF based outlier detection.
Abstract: Detecting outliers which are grossly different from or inconsistent with the remaining dataset is a major challenge in real-world KDD applications. Existing outlier detection methods are ineffective on scattered real-world datasets due to implicit data patterns and parameter setting issues. We define a novel Local Distance-based Outlier Factor (LDOF) to measure the outlier-ness of objects in scattered datasets which addresses these issues. LDOF uses the relative location of an object to its neighbours to determine the degree to which the object deviates from its neighbourhood. We present theoretical bounds on LDOF's false-detection probability. Experimentally, LDOF compares favorably to classical KNN and LOF based outlier detection. In particular it is less sensitive to parameter values.

356 citations


Book ChapterDOI
19 Apr 2009
TL;DR: This work proposes an original outlier detection schema that detects outliers in varying subspaces of a high dimensional feature space and shows that it is superior to existing full-dimensional approaches and scales well to high dimensional databases.
Abstract: We propose an original outlier detection schema that detects outliers in varying subspaces of a high dimensional feature space. In particular, for each object in the data set, we explore the axis-parallel subspace spanned by its neighbors and determine how much the object deviates from the neighbors in this subspace. In our experiments, we show that our novel subspace outlier detection is superior to existing full-dimensional approaches and scales well to high dimensional databases.

282 citations


Journal ArticleDOI
TL;DR: Time series modeling, i.e. auto-regressive models, is used in conjunction with Mahalanobis distance-based outlier detection algorithms to identify different types of structural changes on different test structures in the context of SHM using different laboratory structures.

250 citations


01 Jan 2009
TL;DR: The STAMP book introduces structural time series models and the way in which they can be used to model a wide range of series and includes extensions and improvements for Multivariate Models.
Abstract: STAMP™ stands for Structural Time series Analyser, Modeller and Predictor. It is a menu-driven system designed to model, describe and predict time series. It is based on structural time series models. These models are set up in terms of components such as trends, seasonals and cycles, which have a direct interpretation. Estimation is carried out using state space methods and Kalman filtering. STAMP 8.2 for OxMetrics 6 handles time series with missing values. Explanatory variables with time varying coefficients and interventions can be included. Version 8 includes extensions and improvements for Multivariate Models: select components by equation, select regressors and interventions by equation, separate dependence structures for each component, wide choice of variance matrices, higher order multivariate components, missing observations allowed, forecasting, exact likelihood computation, automatic outlier and break detection, fixing parameters is made easy. Among the special features of STAMP are interactive model selection, a wide range of diagnostics, easy creation of model based forecasts, spectral filters, observation weight functions, and batch facilities. The STAMP book introduces structural time series models and the way in which they can be used to model a wide range of series.

238 citations


Proceedings ArticleDOI
12 May 2009
TL;DR: It is shown that by exploiting the nonholonomic constraints of wheeled vehicles it is possible to use a restrictive motion model which allows us to parameterize the motion with only 1 feature correspondence, which results in the most efficient algorithms for removing outliers.
Abstract: This paper presents a system capable of recovering the trajectory of a vehicle from the video input of a single camera at a very high frame-rate. The overall frame-rate is limited only by the feature extraction process, as the outlier removal and the motion estimation steps take less than 1 millisecond with a normal laptop computer. The algorithm relies on a novel way of removing the outliers of the feature matching process.We show that by exploiting the nonholonomic constraints of wheeled vehicles it is possible to use a restrictive motion model which allows us to parameterize the motion with only 1 feature correspondence. Using a single feature correspondence for motion estimation is the lowest model parameterization possible and results in the most efficient algorithms for removing outliers. Here we present two methods for outlier removal. One based on RANSAC and the other one based on histogram voting. We demonstrate the approach using an omnidirectional camera placed on a vehicle during a peak time tour in the city of Zurich. We show that the proposed algorithm is able to cope with the large amount of clutter of the city (other moving cars, buses, trams, pedestrians, sudden stops of the vehicle, etc.). Using the proposed approach, we cover one of the longest trajectories ever reported in real-time from a single omnidirectional camera and in cluttered urban scenes, up to 3 kilometers.

233 citations


Journal ArticleDOI
TL;DR: This work develops an approach that is flexible with respect to the outlier definition, computes the result in-network to reduce both bandwidth and energy consumption, uses only single-hop communication, thus permitting very simple node failure detection and message reliability assurance mechanisms, and seamlessly accommodates dynamic updates to data.
Abstract: To address the problem of unsupervised outlier detection in wireless sensor networks, we develop an approach that (1) is flexible with respect to the outlier definition, (2) computes the result in-network to reduce both bandwidth and energy usage,(3) only uses single hop communication thus permitting very simple node failure detection and message reliability assurance mechanisms (e.g., carrier-sense), and (4) seamlessly accommodates dynamic updates to data. We examine performance using simulation with real sensor data streams. Our results demonstrate that our approach is accurate and imposes a reasonable communication load and level of power consumption.

201 citations


Journal ArticleDOI
01 Jan 2009
TL;DR: A set of novel algorithms which are called sequenceMiner that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners are presented.
Abstract: We present a set of novel algorithms which we call sequenceMiner that detect and characterize anomalies in large sets of high-dimensional symbol sequences that arise from recordings of switch sensors in the cockpits of commercial airliners. While the algorithms that we present are general and domain-independent, we focus on a specific problem that is critical to determining the system-wide health of a fleet of aircraft. The approach taken uses unsupervised clustering of sequences using the normalized length of the longest common subsequence as a similarity measure, followed by detailed outlier analysis to detect anomalies. In this method, an outlier sequence is defined as a sequence that is far away from the cluster center. We present new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence is deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence when compared to more normal sequences. In the final section of the paper, we demonstrate the effectiveness of sequenceMiner for anomaly detection on a real set of discrete-sequence data from a fleet of commercial airliners. We show that sequenceMiner discovers actionable and operationally significant safety events. We also compare our innovations with standard hidden Markov models, and show that our methods are superior.

Journal ArticleDOI
TL;DR: A new definition for outliers is presented: cluster-based outlier, which is meaningful and provides importance to the local data behavior, and how to detect outliers by the clustering algorithm LDBSCAN which is capable of finding clusters and assigning LOF.
Abstract: Outlier detection has important applications in the field of data mining, such as fraud detection, customer behavior analysis, and intrusion detection. Outlier detection is the process of detecting the data objects which are grossly different from or inconsistent with the remaining set of data. Outliers are traditionally considered as single points; however, there is a key observation that many abnormal events have both temporal and spatial locality, which might form small clusters that also need to be deemed as outliers. In other words, not only a single point but also a small cluster can probably be an outlier. In this paper, we present a new definition for outliers: cluster-based outlier, which is meaningful and provides importance to the local data behavior, and how to detect outliers by the clustering algorithm LDBSCAN (Duan et al. in Inf. Syst. 32(7):978–986, 2007) which is capable of finding clusters and assigning LOF (Breunig et al. in Proceedings of the 2000 ACM SIG MOD International Conference on Manegement of Data, ACM Press, pp. 93–104, 2000) to single points.

Book
26 May 2009
TL;DR: This book proposes robust alternatives to common methods used in statistics in general and in biostatistics in particular and illustrates their use on many biomedical datasets, with a particular emphasis put on practical data analysis.
Abstract: Robust statistics is an extension of classical statistics that specifically takes into account the concept that the underlying models used to describe data are only approximate. Its basic philosophy is to produce statistical procedures which are stable when the data do not exactly match the postulated models as it is the case for example with outliers. Robust Methods in Biostatistics proposes robust alternatives to common methods used in statistics in general and in biostatistics in particular and illustrates their use on many biomedical datasets. The methods introduced include robust estimation, testing, model selection, model check and diagnostics. They are developed for the following general classes of models: Linear regression. Generalized linear models. Linear mixed models. Marginal longitudinal data models. Cox survival analysis model. The methods are introduced both at a theoretical and applied level within the framework of each general class of models, with a particular emphasis put on practical data analysis. This book is of particular use for research students, applied statisticians and practitioners in the health field interested in more stable statistical techniques. An accompanying website provides R code for computing all of the methods described, as well as for analyzing all the datasets used in the book.

Journal ArticleDOI
TL;DR: A small sphere and large margin approach for novelty detection problems, where the majority of training data are normal examples and the training data also contain a small number of abnormal examples or outliers.
Abstract: We present a small sphere and large margin approach for novelty detection problems, where the majority of training data are normal examples. In addition, the training data also contain a small number of abnormal examples or outliers. The basic idea is to construct a hypersphere that contains most of the normal examples, such that the volume of this sphere is as small as possible, while at the same time the margin between the surface of this sphere and the outlier training data is as large as possible. This can result in a closed and tight boundary around the normal data. To build such a sphere, we only need to solve a convex optimization problem that can be efficiently solved with the existing software packages for training nu-support vector machines. Experimental results are provided to validate the effectiveness of the proposed algorithm.

Journal ArticleDOI
TL;DR: In this article, robust Mahalanobis distances were used to detect the presence of outliers in a sample of multivariate normal data. But the robustness of the robust Mahanobis distance was not evaluated.
Abstract: We use the forward search to provide robust Mahalanobis distances to detect the presence of outliers in a sample of multivariate normal data. Theoretical results on order statistics and on estimation in truncated samples provide the distribution of our test statistic. We also introduce several new robust distances with associated distributional results. Comparisons of our procedure with tests using other robust Mahalanobis distances show the good size and high power of our procedure. We also provide a unification of results on correction factors for estimation from truncated samples.

Journal ArticleDOI
TL;DR: The proposed Graph Transformation Matching method, relying on finding a consensus nearest-neighbour graph emerging from candidate matches, is successfully applied in the context of constructing mosaics of retinal images, where feature points are extracted from properly segmented binary images.

Proceedings ArticleDOI
29 Mar 2009
TL;DR: This study proposes a method for detecting temporal outliers with an emphasis on historical similarity trends between data points, and experiments show that this approach is effective and efficient.
Abstract: Outlier detection in vehicle traffic data is a practical problem that has gained traction lately due to an increasing capability to track moving vehicles in city roads. In contrast to other applications, this particular domain includes a very dynamic dimension: time. Many existing algorithms have studied the problem of outlier detection at a single instant in time. This study proposes a method for detecting temporal outliers with an emphasis on historical similarity trends between data points. Outliers are calculated from drastic changes in the trends. Experiments with real world traffic data show that this approach is effective and efficient.

Journal ArticleDOI
TL;DR: Results show that the use of representative training data can help the classifier to produce more accurate and reliable results, and confirm the value of visualization tools for the assessment of training pixels through decision boundary analysis.
Abstract: Image classification is a complex process affected by some uncertainties and decisions made by the researchers. The accuracy achieved by a supervised classification is largely dependent upon the training data provided by the analyst. The use of representative training data sets is of significant importance for the performance of all classification methods. However, this issue is more important for neural network classifiers since they take each sample into consideration in the training stage. The representativeness is related to the size and quality of the training data that are highly important in assessing the accuracy of the thematic maps derived from remotely sensed data. Quality analysis of training data helps to identify outlier and mixed pixels that can undermine the reliability and accuracy of a classification resulting from an incorrect class boundary definition. Training data selection can be thought of as an iterative process conducted to form a representative data set after some refinements. Unfortunately, in many applications the quality of the training data is not questioned, and the data set is directly employed in the training stage. In order to increase the representativeness of the training data, a two-stage approach is presented, and performance tests are conducted for a selected region. Multi-layer perceptron model trained with backpropagation learning algorithm is employed to classify major land cover/land use classes present in the study area, the city of Trabzon in Turkey. Results show that the use of representative training data can help the classifier to produce more accurate and reliable results. An improvement of several percent in classification accuracy can make significant effect on the quality of the classified image. Results also confirm the value of visualization tools for the assessment of training pixels through decision boundary analysis.

Journal ArticleDOI
TL;DR: It is shown that standard high-breakdown affine equivariant estimators propagate outliers and therefore show poor breakdown behavior under componentwise contamination when the dimension d is high.
Abstract: We investigate the performance of robust estimates of multivariate location under nonstandard data contamination models such as componentwise outliers (i.e., contamination in each variable is independent from the other variables). This model brings up a possible new source of statistical error that we call "propagation of outliers." This source of error is unusual in the sense that it is generated by the data processing itself and takes place after the data has been collected. We define and derive the influence function of robust multivariate location estimates under flexible contamination models and use it to investigate the effect of propagation of outliers. Furthermore, we show that standard high-breakdown affine equivariant estimators propagate outliers and therefore show poor breakdown behavior under componentwise contamination when the dimension d is high.

Journal ArticleDOI
TL;DR: A novel statistical depth, the kernelized spatial depth (KSD), which generalizes the spatial depth via positive definite kernels and based on the KSD, proposes a novel outlier detection algorithm, by which an observation with a depth value less than a threshold is declared as an outlier.
Abstract: Statistical depth functions provide from the deepest point a center-outward ordering of multidimensional data. In this sense, depth functions can measure the extremeness or outlyingness of a data point with respect to a given data set. Hence, they can detect outliers observations that appear extreme relative to the rest of the observations. Of the various statistical depths, the spatial depth is especially appealing because of its computational efficiency and mathematical tractability. In this article, we propose a novel statistical depth, the kernelized spatial depth (KSD), which generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD can capture the local structure of a data set while the spatial depth fails. We demonstrate this by the half-moon data and the ring-shaped data. Based on the KSD, we propose a novel outlier detection algorithm, by which an observation with a depth value less than a threshold is declared as an outlier. The proposed algorithm is simple in structure: the threshold is the only one parameter for a given kernel. It applies to a one-class learning setting, in which normal observations are given as the training data, as well as to a missing label scenario, where the training set consists of a mixture of normal observations and outliers with unknown labels. We give upper bounds on the false alarm probability of a depth-based detector. These upper bounds can be used to determine the threshold. We perform extensive experiments on synthetic data and data sets from real applications. The proposed outlier detector is compared with existing methods. The KSD outlier detector demonstrates a competitive performance.

Journal ArticleDOI
TL;DR: The choice of statistical test has a profound impact on the interpretation of data and the importance of robustness, in one sense, refers to the insensitivity of the estimator to outliers or violations in underlying assumptions.

Journal ArticleDOI
TL;DR: The MC method inherently provides a feasible way to detect different kinds of outliers by establishment of many cross‐predictive models and with the help of the distribution of predictive residuals such obtained, it seems to be able to reduce the risk caused by the masking effect.
Abstract: The crucial step of building a high performance QSAR/QSPR model is the detection of outliers in the model. Detecting outliers in a multivariate point cloud is not trivial, especially when several outliers coexist in the model. The classical identification methods do not always identify them, because they are based on the sample mean and covariance matrix influenced by the outliers. Moreover, existing methods only lay stress on some type of outliers but not all the outliers. To avoid these problems and detect all kinds of outliers simultaneously, we provide a new strategy based on Monte-Carlo cross-validation, which was termed as the MC method. The MC method inherently provides a feasible way to detect different kinds of outliers by establishment of many cross-predictive models. With the help of the distribution of predictive residuals such obtained, it seems to be able to reduce the risk caused by the masking effect. In addition, a new display is proposed, in which the absolute values of mean value of predictive residuals are plotted versus standard deviations of predictive residuals. The plot divides the data into normal samples, y direction outliers and X direction outliers. Several examples are used to demonstrate the detection ability of MC method through the comparison of different diagnostic methods.

Journal ArticleDOI
TL;DR: A robust PCA method is developed which is suitable for skewed data and illustrated on real data from economics, engineering, and finance, and confirmed by a simulation study.

Journal ArticleDOI
TL;DR: In this work a novel distance-based outlier detection algorithm, named DOLPHIN, working on disk-resident datasets and whose I/O cost corresponds to the cost of sequentially reading the input dataset file twice, is presented.
Abstract: In this work a novel distance-based outlier detection algorithm, named DOLPHIN, working on disk-resident datasets and whose I/O cost corresponds to the cost of sequentially reading the input dataset file twice, is presented.It is both theoretically and empirically shown that the main memory usage of DOLPHIN amounts to a small fraction of the dataset and that DOLPHIN has linear time performance with respect to the dataset size. DOLPHIN gains efficiency by naturally merging together in a unified schema three strategies, namely the selection policy of objects to be maintained in main memory, usage of pruning rules, and similarity search techniques. Importantly, similarity search is accomplished by the algorithm without the need of preliminarily indexing the whole dataset, as other methods do.The algorithm is simple to implement and it can be used with any type of data, belonging to either metric or nonmetric spaces. Moreover, a modification to the basic method allows DOLPHIN to deal with the scenario in which the available buffer of main memory is smaller than its standard requirements. DOLPHIN has been compared with state-of-the-art distance-based outlier detection algorithms, showing that it is much more efficient.

Journal ArticleDOI
TL;DR: A label-conditional classifier is developed and turn to be an alternative approach to the cost sensitive learning problem that relies on label-wise predefined confidence level and the target of minimizing the risk of misclassification is achieved.
Abstract: Most machine-learning classifiers output label predictions for new instances without indicating how reliable the predictions are. The applicability of these classifiers is limited in critical domains where incorrect predictions have serious consequences, like medical diagnosis. Further, the default assumption of equal misclassification costs is most likely violated in medical diagnosis. In this paper, we present a modified random forest classifier which is incorporated into the conformal predictor scheme. A conformal predictor is a transductive learning scheme, using Kolmogorov complexity to test the randomness of a particular sample with respect to the training sets. Our method show well-calibrated property that the performance can be set prior to classification and the accurate rate is exactly equal to the predefined confidence level. Further, to address the cost sensitive problem, we extend our method to a label-conditional predictor which takes into account different costs for misclassifications in different class and allows different confidence level to be specified for each class. Intensive experiments on benchmark datasets and real world applications show the resultant classifier is well-calibrated and able to control the specific risk of different class. The method of using RF outlier measure to design a nonconformity measure benefits the resultant predictor. Further, a label-conditional classifier is developed and turn to be an alternative approach to the cost sensitive learning problem that relies on label-wise predefined confidence level. The target of minimizing the risk of misclassification is achieved by specifying the different confidence level for different class.

Journal ArticleDOI
TL;DR: An expectation-maximization algorithm is presented for the maximum likelihood estimation of the model parameters in the presence of missing data and a contribution analysis method is proposed to identify which variables contribute the most to the occurrence of outliers, providing valuable information regarding the source of outlying data.

Proceedings ArticleDOI
26 May 2009
TL;DR: One-class support vector machine-based outlier detection techniques that sequentially update the model representing normal behavior of the sensed data and take advantage of spatial and temporal correlations that exist between sensor data to cooperatively identify outliers are proposed.
Abstract: Outlier detection in wireless sensor networks is essential to ensure data quality, secure monitoring and reliable detection of interesting and critical events. A key challenge for outlier detection in wireless sensor networks is to adaptively identify outliers in an online manner with a high accuracy while maintaining the resource consumption of the network to a minimum. In this paper, we propose one-class support vector machine-based outlier detection techniques that sequentially update the model representing normal behavior of the sensed data and take advantage of spatial and temporal correlations that exist between sensor data to cooperatively identify outliers. Experiments with both synthetic and real data show that our online outlier detection techniques achieve high detection accuracy and low false alarm rate.

Journal ArticleDOI
TL;DR: This paper introduces a novel hidden Markov model where the hidden state distributions are considered to be finite mixtures of multivariate Student's t-densities, and derives an algorithm for the model parameters estimation under a maximum likelihood framework, assuming full, diagonal, and factor-analyzed covariance matrices.
Abstract: Hidden Markov (chain) models using finite Gaussian mixture models as their hidden state distributions have been successfully applied in sequential data modeling and classification applications. Nevertheless, Gaussian mixture models are well known to be highly intolerant to the presence of untypical data within the fitting data sets used for their estimation. Finite Student's t-mixture models have recently emerged as a heavier-tailed, robust alternative to Gaussian mixture models, overcoming these hurdles. To exploit these merits of Student's t-mixture models in the context of a sequential data modeling setting, we introduce, in this paper, a novel hidden Markov model where the hidden state distributions are considered to be finite mixtures of multivariate Student's t-densities. We derive an algorithm for the model parameters estimation under a maximum likelihood framework, assuming full, diagonal, and factor-analyzed covariance matrices. The advantages of the proposed model over conventional approaches are experimentally demonstrated through a series of sequential data modeling applications.

Proceedings Article
07 Dec 2009
TL;DR: This work discusses the properties of a Gaussian process regression model with the Student-t likelihood and utilizes the Laplace approximation for approximate inference and compares the approach to a variational approximation and a Markov chain Monte Carlo scheme, which utilize the commonly used scale mixture representation of the student-t distribution.
Abstract: In the Gaussian process regression the observation model is commonly assumed to be Gaussian, which is convenient in computational perspective. However, the drawback is that the predictive accuracy of the model can be significantly compromised if the observations are contaminated by outliers. A robust observation model, such as the Student-t distribution, reduces the influence of outlying observations and improves the predictions. The problem, however, is the analytically intractable inference. In this work, we discuss the properties of a Gaussian process regression model with the Student-t likelihood and utilize the Laplace approximation for approximate inference. We compare our approach to a variational approximation and a Markov chain Monte Carlo scheme, which utilize the commonly used scale mixture representation of the Student-t distribution.

Journal ArticleDOI
TL;DR: It was found that no method could correctly exclude outliers 100% of the time, however, for a single outlier the outlier test achieved the highest rates of correct exclusion followed by the MM-estimator and the L1-norm.
Abstract: With more satellite systems becoming available there is currently a need for Receiver Autonomous Integrity Monitoring (RAIM) to exclude multiple outliers. While the single outlier test can be applied iteratively, in the field of statistics robust methods are preferred when multiple outliers exist. This study compares the outlier test and numerous robust methods with simulated GPS measurements to identify which methods have the greatest ability to correctly exclude outliers. It was found that no method could correctly exclude outliers 100% of the time. However, for a single outlier the outlier test achieved the highest rates of correct exclusion followed by the MM-estimator and the L 1 -norm. As the number of outliers increased MM-estimators and the L 1 -norm obtained the highest rates of normal exclusion, which were up to ten percent higher than the outlier test.