Showing papers on "Outlier published in 2003"

PDF

Open Access

Proceedings Article•DOI•

LOCI: fast outlier detection using the local correlation integral

[...]

Spiros Papadimitriou¹, Hiroyuki Kitagawa², Phillip B. Gibbons³, Christos Faloutsos¹•Institutions (3)

Carnegie Mellon University¹, University of Tsukuba², Intel³

05 Mar 2003

TL;DR: Experiments show that LOCI and aLOCI can automatically detect outliers and micro-clusters, without user-required cut-offs, and that they quickly spot both expected and unexpected outliers.

...read moreread less

Abstract: Outlier detection is an integral part of data mining and has attracted much attention recently [M. Breunig et al., (2000)], [W. Jin et al., (2001)], [E. Knorr et al., (2000)]. We propose a new method for evaluating outlierness, which we call the local correlation integral (LOCI). As with the best previous methods, LOCI is highly effective for detecting outliers and groups of outliers (a.k.a. micro-clusters). In addition, it offers the following advantages and novelties: (a) It provides an automatic, data-dictated cutoff to determine whether a point is an outlier-in contrast, previous methods force users to pick cut-offs, without any hints as to what cut-off value is best for a given dataset. (b) It can provide a LOCI plot for each point; this plot summarizes a wealth of information about the data in the vicinity of the point, determining clusters, micro-clusters, their diameters and their inter-cluster distances. None of the existing outlier-detection methods can match this feature, because they output only a single number for each point: its outlierness score, (c) Our LOCI method can be computed as quickly as the best previous methods, (d) Moreover, LOCI leads to a practically linear approximate method, aLOCI (for approximate LOCI), which provides fast highly-accurate outlier detection. To the best of our knowledge, this is the first work to use approximate computations to speed up outlier detection. Experiments on synthetic and real world data sets show that LOCI and aLOCI can automatically detect outliers and micro-clusters, without user-required cut-offs, and that they quickly spot both expected and unexpected outliers.

...read moreread less

903 citations

Journal Article•DOI•

Discovering cluster-based local outliers

[...]

Zengyou He¹, Xiaofei Xu¹, Shengchun Deng¹•Institutions (1)

Harbin Institute of Technology¹

01 Jun 2003-Pattern Recognition Letters

TL;DR: A measure for identifying the physical significance of an outlier is designed, which is called cluster-based local outlier factor (CBLOF), which is meaningful and provides importance to the local data behavior.

...read moreread less

817 citations

Book Chapter•DOI•

Locally Optimized RANSAC

[...]

Ondrej Chum¹, Jiri Matas², Jiri Matas¹, Josef Kittler²•Institutions (2)

Czech Technical University in Prague¹, University of Surrey²

10 Sep 2003

TL;DR: The locally optimized ransac makes no new assumptions about the data, on the contrary – it makes the above-mentioned assumption valid by applying local optimization to the solution estimated from the random sample.

...read moreread less

Abstract: A new enhancement of ransac, the locally optimized ransac (lo-ransac), is introduced. It has been observed that, to find an optimal solution (with a given probability), the number of samples drawn in ransac is significantly higher than predicted from the mathematical model. This is due to the incorrect assumption, that a model with parameters computed from an outlier-free sample is consistent with all inliers. The assumption rarely holds in practice. The locally optimized ransac makes no new assumptions about the data, on the contrary – it makes the above-mentioned assumption valid by applying local optimization to the solution estimated from the random sample.

...read moreread less

722 citations

Proceedings Article•DOI•

Mining distance-based outliers in near linear time with randomization and a simple pruning rule

[...]

Stephen D. Bay, Mark Schwabacher¹•Institutions (1)

Ames Research Center¹

24 Aug 2003

TL;DR: This work shows that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used.

...read moreread less

Abstract: Defining outliers by their distance to neighboring examples is a popular approach to finding unusual examples in a data set. Recently, much work has been conducted with the goal of finding fast algorithms for this task. We show that a simple nested loop algorithm that in the worst case is quadratic can give near linear time performance when the data is in random order and a simple pruning rule is used. We test our algorithm on real high-dimensional data sets with millions of examples and show that the near linear scaling holds over several orders of magnitude. Our average case analysis suggests that much of the efficiency is because the time to process non-outliers, which are the majority of examples, does not depend on the size of the data set.

...read moreread less

682 citations

Journal Article•DOI•

A Framework for Robust Subspace Learning

[...]

Fernando De la Torre¹, Michael J. Black²•Institutions (2)

La Salle University¹, Brown University²

01 Aug 2003-International Journal of Computer Vision

TL;DR: The theory of Robust Subspace Learning (RSL) for linear models within a continuous optimization framework based on robust M-estimation is developed and applies to a variety of linear learning problems in computer vision including eigen-analysis and structure from motion.

...read moreread less

Abstract: Many computer vision, signal processing and statistical problems can be posed as problems of learning low dimensional linear or multi-linear models These models have been widely used for the representation of shape, appearance, motion, etc, in computer vision applications Methods for learning linear models can be seen as a special case of subspace fitting One draw-back of previous learning methods is that they are based on least squares estimation techniques and hence fail to account for “outliers” which are common in realistic training sets We review previous approaches for making linear learning methods robust to outliers and present a new method that uses an intra-sample outlier process to account for pixel outliers We develop the theory of Robust Subspace Learning (RSL) for linear models within a continuous optimization framework based on robust M-estimation The framework applies to a variety of linear learning problems in computer vision including eigen-analysis and structure from motion Several synthetic and natural examples are used to develop and illustrate the theory and applications of robust subspace learning in computer vision

...read moreread less

673 citations

Proceedings Article•

A Novel Anomaly Detection Scheme Based on Principal Component Classifier

[...]

Mei-Ling Shyu¹, Shu-Ching Chen², Kanoksri Sarinnapakorn¹, Liwu Chang³•Institutions (3)

University of Miami¹, Florida International University², United States Department of the Navy³

01 Jan 2003

TL;DR: A novel scheme that uses robust principal component classifier in intrusion detection problems where the training data may be unsupervised and outperforms the nearest neighbor method, density-based local outliers (LOF) approach, and the outlier detection algorithm based on Canberra metric is proposed.

...read moreread less

Abstract: : This paper proposes a novel scheme that uses robust principal component classifier in intrusion detection problems where the training data may be unsupervised Assuming that anomalies can be treated as outliers, an intrusion predictive model is constructed from the major and minor principal components of the normal instances A measure of the difference of an anomaly from the normal instance is the distance in the principal component space The distance based on the major components that account for 50% of the total variation and the minor components whose eigenvalues less than 020 is shown to work well The experiments with KDD Cup 1999 data demonstrate that the proposed method achieves 9894% in recall and 9789% in precision with the false alarm rate 092% and outperforms the nearest neighbor method, density-based local outliers (LOF) approach, and the outlier detection algorithm based on Canberra metric

...read moreread less

574 citations

Journal Article•DOI•

Detecting outliers in frontier models : a simple approach.

[...]

Léopold Simar¹•Institutions (1)

Université catholique de Louvain¹

01 Nov 2003-Journal of Productivity Analysis

TL;DR: This paper summarizes the main results of Cazals et al. (2002) on robust nonparametric frontier estimators and proposes a methodology implementing the tool and shows how this tool can be used for detecting outliers when using the classical DEA/FDH estimators or any parametric techniques.

...read moreread less

Abstract: In frontier analysis, most of the nonparametric approaches (DEA, FDH) are based on envelopment ideas which suppose that with probability one, all the observed units belong to the attainable set. In these "deterministic'' frontier models, statistical theory is now mostly available (Simar and Wilson, 2000a). In the presence of superefficient outliers, envelopment estimators could behave dramatically since they are very sensitive to extreme observations. Some recent results from Cazals et al. (2002) on robust nonparametric frontier estimators may be used in order to detect outliers by de. ning a new DEA/FDH "deterministic'' type estimator which does not envelop all the data points and so is more robust to extreme data points. In this paper, we summarize the main results of Cazals et al. (2002) and we show how this tool can be used for detecting outliers when using the classical DEA/FDH estimators or any parametric techniques. We propose a methodology implementing the tool and we illustrate through some numerical examples with simulated and real data. The method should be used in a first step, as an exploratory data analysis, before using any frontier estimation.

...read moreread less

356 citations

Proceedings Article•DOI•

Time-series novelty detection using one-class support vector machines

[...]

J. Ma¹, Simon Perkins¹•Institutions (1)

Los Alamos National Laboratory¹

20 Jul 2003

TL;DR: A new algorithm for time-series novelty detection based on one-class support vector machines (SVMs) is proposed and a technique to combine intermediate results at different phase spaces is proposed in order to obtain robust detection results.

...read moreread less

Abstract: Time-series novelty detection, or anomaly detection, refers to the automatic identification of novel or abnormal events embedded in normal time-series points. Although it is a challenging topic in data mining, it has been acquiring increasing attention due to its huge potential for immediate applications. In this paper, a new algorithm for time-series novelty detection based on one-class support vector machines (SVMs) is proposed. The concepts of phase and projected phase spaces are first introduced, which allows us to convert a time-series into a set of vectors in the (projected) phase spaces. Then we interpret novel events in time-series as outliers of the "normal" distribution of the converted vectors in the (projected) phase spaces. One-class SVMs are employed as the outlier detectors. In order to obtain robust detection results, a technique to combine intermediate results at different phase spaces is also proposed. Experiments on both synthetic and measured data are presented to demonstrate the promising performance of the new algorithm.

...read moreread less

317 citations

Journal Article•DOI•

ICA using spacings estimates of entropy

[...]

Erik Learned-Miller¹, John W. Fisher²•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

01 Dec 2003-Journal of Machine Learning Research

TL;DR: A new algorithm for the independent components analysis (ICA) problem based on an efficient entropy estimator that is simple, computationally efficient, intuitively appealing, and outperforms other well known algorithms.

...read moreread less

Abstract: This paper presents a new algorithm for the independent components analysis (ICA) problem based on an efficient entropy estimator. Like many previous methods, this algorithm directly minimizes the measure of departure from independence according to the estimated Kullback-Leibler divergence between the joint distribution and the product of the marginal distributions. We pair this approach with efficient entropy estimators from the statistics literature. In particular, the entropy estimator we use is consistent and exhibits rapid convergence. The algorithm based on this estimator is simple, computationally efficient, intuitively appealing, and outperforms other well known algorithms. In addition, the estimator's relative insensitivity to outliers translates into superior performance by our ICA algorithm on outlier tests. We present favorable comparisons to the Kernel ICA, FAST-ICA, JADE, and extended Infomax algorithms in extensive simulations. We also provide public domain source code for our algorithms.

...read moreread less

283 citations

Journal Article•DOI•

Experimental validation of a structural health monitoring methodology: part i. novelty detection on a laboratory structure

[...]

Keith Worden¹, Graeme Manson¹, D. Allman²•Institutions (2)

University of Sheffield¹, Qinetiq²

09 Jan 2003-Journal of Sound and Vibration

TL;DR: In this paper, a structural health monitoring methodology for a wing box is presented. Butler et al. used novelty detection based on measured transmissibilities from the structure of the wing.

...read moreread less

237 citations

Journal Article•DOI•

A Unified Approach to Detecting Spatial Outliers

[...]

Shashi Shekhar¹, Chang-Tien Lu¹, Pusheng Zhang¹•Institutions (1)

University of Minnesota¹

01 Jun 2003-Geoinformatica

TL;DR: A general definition of S-outliers for spatial outliers is provided and the computation structure of spatial outlier detection methods is characterized and scalable algorithms are presented.

...read moreread less

Abstract: Spatial outliers represent locations which are significantly different from their neighborhoods even though they may not be significantly different from the entire population. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and implicit knowledge, such as local instability. In this paper, we first provide a general definition of S-outliers for spatial outliers. This definition subsumes the traditional definitions of spatial outliers. Second, we characterize the computation structure of spatial outlier detection methods and present scalable algorithms. Third, we provide a cost model of the proposed algorithms. Finally, we experimentally evaluate our algorithms using a Minneapolis-St. Paul (Twin Cities) traffic data set.

...read moreread less

Proceedings Article•DOI•

Algorithms for spatial outlier detection

[...]

Chang-Tien Lu, Dechang Chen¹, Yufeng Kou²•Institutions (2)

Uniformed Services University of the Health Sciences¹, Virginia Tech²

19 Nov 2003

TL;DR: This work formulates the spatial outlier detection problem in a general way and design algorithms which can accurately detect spatial outliers and demonstrates that their approaches can not only avoid detecting false spatial outiers but also find true spatial outlier ignored by existing methods.

...read moreread less

Abstract: A spatial outlier is a spatially referenced object whose non-spatial attribute values are significantly different from the values of its neighborhood. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and useful spatial patterns for further analysis. One drawback of existing methods is that normal objects tend to be falsely detected as spatial outliers when their neighborhood contains true spatial outliers. We propose a suite of spatial outlier detection algorithms to overcome this disadvantage. We formulate the spatial outlier detection problem in a general way and design algorithms which can accurately detect spatial outliers. In addition, using a real-world census data set, we demonstrate that our approaches can not only avoid detecting false spatial outliers but also find true spatial outliers ignored by existing methods.

...read moreread less

Journal Article•DOI•

Exploring process data with the use of robust outlier detection algorithms

[...]

Leo H. Chiang¹, Randy J. Pell¹, Mary Beth Seasholtz¹•Institutions (1)

Dow Chemical Company¹

01 Aug 2003-Journal of Process Control

TL;DR: Closest distance to center (CDC) is proposed in this paper as an alternative for outlier detection and better performance was obtained when CDC is incorporated with MVT, compared to using CDC and MVT alone.

...read moreread less

Journal Article•DOI•

Source localization in reverberant environments: modeling and statistical analysis

[...]

T. Gustafsson¹, Bhaskar D. Rao, Mohan M. Trivedi•Institutions (1)

Nira Dynamics AB¹

01 Nov 2003-IEEE Transactions on Speech and Audio Processing

TL;DR: A simple but useful statistical model is developed for the room transfer function of acoustical source localization methods when room reverberation is present and the so-called PHAT time-delay estimator is shown to be optimal among a class of cross-correlation based time- delay estimators.

...read moreread less

Abstract: Room reverberation is typically the main obstacle for designing robust microphone-based source localization systems. The purpose of the paper is to analyze the achievable performance of acoustical source localization methods when room reverberation is present. To facilitate the analysis, we apply well known results from room acoustics to develop a simple but useful statistical model for the room transfer function. The properties of the statistical model are found to correlate well with results from real data measurements. The room transfer function model is further applied to analyze the statistical properties of some existing methods for source localization. In this respect we consider especially the asymptotic error variance and the probability of an anomalous estimate. A noteworthy outcome of the analysis is that the so-called PHAT time-delay estimator is shown to be optimal among a class of cross-correlation based time-delay estimators. To verify our results on the error variance and the outlier probability we apply the image method for simulation of the room transfer function.

...read moreread less

Journal Article•DOI•

Robust singular value decomposition analysis of microarray data

[...]

Li Liu¹, Douglas M. Hawkins, Sujoy Ghosh², S. Stanley Young³•Institutions (3)

Research Triangle Park¹, GlaxoSmithKline², National Institutes of Health³

11 Nov 2003-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A robust analysis method is developed for the understanding of large-scale shifts in gene effects and the isolation of particular sample-by-gene effects that might be either unusual interactions or the result of experimental flaws.

...read moreread less

Abstract: In microarray data there are a number of biological samples, each assessed for the level of gene expression for a typically large number of genes There is a need to examine these data with statistical techniques to help discern possible patterns in the data Our technique applies a combination of mathematical and statistical methods to progressively take the data set apart so that different aspects can be examined for both general patterns and very specific effects Unfortunately, these data tables are often corrupted with extreme values (outliers), missing values, and non-normal distributions that preclude standard analysis We develop a robust analysis method to address these problems The benefits of this robust analysis will be both the understanding of large-scale shifts in gene effects and the isolation of particular sample-by-gene effects that might be either unusual interactions or the result of experimental flaws Our method requires a single pass and does not resort to complex ”cleaning” or imputation of the data table before analysis We illustrate the method with a commercial data set

...read moreread less

Journal Article•DOI•

Trimmed L-moments

[...]

Elsayed A. H. Elamir¹, Allan H. Seheult²•Institutions (2)

University of Southampton¹, Durham University²

28 Jul 2003-Computational Statistics & Data Analysis

TL;DR: It is shown that population trimmed L-moments assign zero weight to extreme observations, they are easy to compute, their sample variances and covariances can be obtained in closed form, and they are more robust than L-Moments are to the presence of outliers.

...read moreread less

Book•

Exploration and Analysis of DNA Microarray and Protein Array Data

[...]

Dhammika Amaratunga, Javier Cabrera

21 Oct 2003

TL;DR: This book presents a meta-modelling framework for estimating the level of uncertainty in the results of cDNA Microarray experiments, as well as some of the techniques used to assess the quality of these experiments.

...read moreread less

Abstract: Preface.1 A Brief Introduction.1.1 A Note on Exploratory Data Analysis.1.2 Computing Considerations and Software.1.3 A Brief Outline of the Book.2 Genomics Basics.2.1 Genes.2.2 DNA.2.3 Gene Expression.2.4 Hybridization Assays and Other Laboratory Techniques.2.5 The Human Genome.2.6 Genome Variations and Their Consequences.2.7 Genomics.2.8 The Role of Genomics in Pharmaceutical Research.2.9 Proteins.2.10 Bioinformatics.Supplementary Reading.Exercises.3 Microarrays.3.1 Types of Microarray Experiments.3.1.1 Experiment Type 1: Tissue-Specific Gene Expression.3.1.2 Experiment Type 2: Developmental Genetics.3.1.3 Experiment Type 3: Genetic Diseases.3.1.4 Experiment Type 4: Complex Diseases.3.1.5 Experiment Type 5: Pharmacological Agents.3.1.6 Experiment Type 6: Plant Breeding.3.1.7 Experiment Type 7: Environmental Monitoring.3.2 A Very Simple Hypothetical Microarray Experiment.3.3 A Typical Microarray Experiment.3.3.1 Microarray Preparation.3.3.2 Sample Preparation.3.3.3 The Hybridization Step.3.3.4 Scanning the Microarray.3.3.5 Interpreting the Scanned Image.3.4 Multichannel cDNA Microarrays.3.5 Oligonucleotide Arrays.3.6 Bead-Based Arrays.3.7 Confirmation of Microarray Results.Supplementary Reading and Electronic References.Exercises.4 Processing the Scanned Image.4.1 Converting the Scanned Image to the Spotted Image.4.1.1 Gridding.4.1.2 Segmentation.4.1.3 Quantification.4.2 Quality Assessment.4.2.1 Visualizing the Spotted Image.4.2.2 Numerical Evaluation of Array Quality.4.2.3 Spatial Problems.4.2.4 Spatial Randomness.4.2.5 Quality Control of Arrays.4.2.6 Assessment of Spot Quality.4.3 Adjusting for Background.4.3.1 Estimating the Background.4.3.2 Adjusting for the Estimated Background.4.4 Expression Level Calculation for Two-Channel cDNA Microarrays.4.5 Expression Level Calculation for Oligonucleotide Arrays.4.5.1 The Average Difference.4.5.2 A Weighted Average Difference.4.5.3 Perfect Matches Only.4.5.4 Background Adjustment Approach.4.5.5 Model-Based Approach.4.5.6 Absent-Present Calls.Supplementary Reading.Exercises.5 Preprocessing Microarray Data.5.1 Logarithmic Transformation.5.2 Variance Stabilizing Transformations.5.3 Sources of Bias.5.4 Normalization.5.5 Intensity-Dependent Normalization.5.5.1 Smooth Function Normalization.5.5.2 Quantile Normalization.5.5.3 Normalization of Oligonucleotide Arrays.5.5.4 Normalization of Two-Channel Arrays.5.5.5 Spatial Normalization.5.5.6 Stagewise Normalization.5.6 Judging the Success of a Normalization.5.7 Outlier Identification.5.7.1 Nonresistant Rules for Outlier Identification.5.7.2 Resistant Rules for Outlier Identification.5.8 Assessing Replicate Array Quality.Exercises.6 Summarization.6.1 Replication.6.2 Technical Replicates.6.3 Biological Replicates.6.4 Experiments with Both Technical and Biological Replicates.6.5 Multiple Oligonucleotide Arrays.6.6 Estimating Fold Change in Two-Channel Experiments.6.7 Bayes Estimation of Fold Change.Exercises.7 Two-Group Comparative Experiments.7.1 Basics of Statistical Hypothesis Testing.7.2 Fold Changes.7.3 The Two-Sample t Test.7.4 Diagnostic Checks.7.5 Robust t Tests.7.6 Randomization Tests.7.7 The Mann-Whitney-Wilcoxon Rank Sum Test.7.8 Multiplicity.7.8.1 A Pragmatic Approach to the Issue of Multiplicity.7.8.2 Simple Multiplicity Adjustments.7.8.3 Sequential Multiplicity Adjustments.7.9 The False Discovery Rate.7.9.1 The Positive False Discovery Rate.7.10 Small Variance-Adjusted t Tests and SAM.7.10.1 Modifying the t Statistic.7.10.2 Assesing Significance with the SAM t Statistic.7.10.3 Strategies for Using SAM.7.10.4 An Empirical Bayes Framework.7.10.5 Understanding the SAM Adjustment.7.11 Conditional t.7.12 Borrowing Strength across Genes.7.12.1 Simple Methods.7.12.2 A Bayesian Model.7.13 Two-Channel Experiments.7.13.1 The Paired Sample t Test and SAM.7.13.2 Borrowing Strength via Hierarchical Modeling.Supplementary Reading.Exercises.8 Model-Based Inference and Experimental Design Considerations.8.1 The F Test.8.2 The Basic Linear Model.8.3 Fitting the Model in Two Stages.8.4 Multichannel Experiments.8.5 Experimental Design Considerations.8.5.1 Comparing Two Varieties with Two-Channel Microarrays.8.5.2 Comparing Multiple Varieties with Two-Channel Microarrays.8.5.3 Single-Channel Microarray Experiments.8.6 Miscellaneous Issues.Supplementary Reading.Exercises.9 Pattern Discovery.9.1 Initial Considerations.9.2 Cluster Analysis.9.2.1 Dissimilarity Measures and Similarity Measures.9.2.2 Guilt by Association.9.2.3 Hierarchical Clustering.9.2.4 Partitioning Methods.9.2.5 Model-Based Clustering.9.2.6 Chinese Restaurant Clustering.9.2.7 Discussion.9.3 Seeking Patterns Visually.9.3.1 Principal Components Analysis.9.3.2 Factor Analysis.9.3.3 Biplots.9.3.4 Spectral Map Analysis.9.3.5 Multidimensional Scaling.9.3.6 Projection Pursuit.9.3.7 Data Visualization with the Grand Tour and Projection Pursuit.9.4 Two-Way Clustering.9.4.1 Block Clustering.9.4.2 Gene Shaving.9.4.3 The Plaid Model.Software Notes.Supplementary Reading.Exercises.10 Class Prediction.10.1 Initial Considerations.10.1.1 Misclassification Rates.10.1.2 Reducing the Number of Classifiers.10.2 Linear Discriminant Analysis.10.3 Extensions of Fisher's LDA.10.4 Nearest Neighbors.10.5 Recursive Partitioning.10.5.1 Classification Trees.10.5.2 Activity Region Finding.10.6 Neural Networks.10.7 Support Vector Machines.10.8 Integration of Genomic Information.10.8.1 Integration of Gene Expression Data and Molecular Structure Data.10.8.2 Pathway Inference.Software Notes.Supplementary Reading.Exercises.11 Protein Arrays.11.1 Introduction.11.2 Protein Array Experiments.11.3 Special Issues with Protein Arrays.11.4 Analysis.11.5 Using Antibody Antigen Arrays to Measure Protein Concentrations.Exercises.References.Author Index.Subject Index.

...read moreread less

Journal Article•DOI•

Support vector interval regression networks for interval regression analysis

[...]

Jin-Tsong Jeng, Chen-Chia Chuang, Shun-Feng Su¹•Institutions (1)

National Taiwan University of Science and Technology¹

01 Sep 2003-Fuzzy Sets and Systems

TL;DR: The convergence rate of SVIRNs is faster than the conventional networks with BP learning algorithms or with robust BPlearning algorithms for interval regression analysis, and a traditional back-propagation (BP) learning algorithm can be used to adjust the initial structure networks of SVirNs under training data sets without or with outliers.

...read moreread less

Journal Article•DOI•

Manhattan world: orientation and outlier detection by Bayesian inference

[...]

James Coughlan¹, Alan L. Yuille¹•Institutions (1)

Smith-Kettlewell Institute¹

01 May 2003-Neural Computation

TL;DR: This letter argues that many visual scenes are based on a Manhattan three-dimensional grid that imposes regularities on the image statistics, and constructs a Bayesian model that implements this assumption and estimates the viewer orientation relative to the Manhattan grid.

...read moreread less

Abstract: This letter argues that many visual scenes are based on a "Manhattan" three-dimensional grid that imposes regularities on the image statistics. We construct a Bayesian model that implements this assumption and estimates the viewer orientation relative to the Manhattan grid. For many images, these estimates are good approximations to the viewer orientation (as estimated manually by the authors). These estimates also make it easy to detect outlier structures that are unaligned to the grid. To determine the applicability of the Manhattan world model, we implement a null hypothesis model that assumes that the image statistics are independent of any three-dimensional scene structure. We then use the log-likelihood ratio test to determine whether an image satisfies the Manhattan world assumption. Our results show that if an image is estimated to be Manhattan, then the Bayesian model's estimates of viewer direction are almost always accurate (according to our manual estimates), and vice versa.

...read moreread less

Proceedings Article•DOI•

Robust shift and add approach to superresolution

[...]

Sina Farsiu¹, Dirk Robinson¹, Michael Elad², Peyman Milanfar¹•Institutions (2)

University of California, Santa Cruz¹, Stanford University²

19 Nov 2003

TL;DR: It is proved that additive Gaussian distribution is not a proper model for super-resolution noise and it is shown that Lp norm minimization results in a pixelwise weighted mean algorithm which requires the least possible amount of computation time and memory and produces a maximum likelihood solution.

...read moreread less

Abstract: In the last two decades, many papers have been published, proposing a variety of methods for multi-frame resolution enhancement. These methods, which have a wide range of complexity, memory and time requirements, are usually very sensitive to their assumed model of data and noise, often limiting their utility. Different implementations of the non-iterative Shift and Add concept have been proposed as very fast and effective super-resolution algorithms. The paper of Elad & Hel-Or 2001 provided an adequate mathematical justification for the Shift and Add method for the simple case of an additive Gaussian noise model. In this paper we prove that additive Gaussian distribution is not a proper model for super-resolution noise. Specifically, we show that Lp norm minimization (1≤p≤2) results in a pixelwise weighted mean algorithm which requires the least possible amount of computation time and memory and produces a maximum likelihood solution. We also justify the use of a robust prior information term based on bilateral filter idea. Finally, for the underdetermined case, where the number of non-redundant low-resolution frames are less than square of the resolution enhancement factor, we propose a method for detection and removal of outlier pixels. Our experiments using commercialdigital cameras show that our proposed super-resolution method provides significant improvements in both accuracy and efficiency.

...read moreread less

Journal Article•DOI•

Jack-knife technique for outlier detection and estimation of standard errors in PARAFAC models

[...]

Jordi Riu, Rasmus Bro

28 Jan 2003-Chemometrics and Intelligent Laboratory Systems

TL;DR: This study applies the so-called jack-knife technique to PARAFAC in order to find the associated standard errors to the parameter estimates from thePARAFAC model and shows the applicability of the method.

...read moreread less

Journal Article•DOI•

A mixture model and EM-based algorithm for class discovery, robust classification, and outlier rejection in mixed labeled/unlabeled data sets

[...]

David J. Miller¹, J. Browning¹•Institutions (1)

Pennsylvania State University¹

01 Nov 2003-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel mixture model is proposed which treats as observed data not only the feature vector and the class label, but also the fact of label presence/absence for each sample, to address problems involving both the known, and unknown classes.

...read moreread less

Abstract: Several authors have shown that, when labeled data are scarce, improved classifiers can be built by augmenting the training set with a large set of unlabeled examples and then performing suitable learning. These works assume each unlabeled sample originates from one of the (known) classes. Here, we assume each unlabeled sample comes from either a known or from a heretofore undiscovered class. We propose a novel mixture model which treats as observed data not only the feature vector and the class label, but also the fact of label presence/absence for each sample. Two types of mixture components are posited. "Predefined" components generate data from known classes and assume class labels are missing at random. "Nonpredefined" components only generate unlabeled data-i.e., they capture exclusively unlabeled subsets, consistent with an outlier distribution or new classes. The predefined/nonpredefined natures are data-driven, learned along with the other parameters via an extension of the EM algorithm. Our modeling framework addresses problems involving both the known,and unknown classes: (1) robust classifier design, (2) classification with rejections, and (3) identification of the unlabeled samples (and their components) from unknown classes. Case 3 is a step toward new class discovery. Experiments are reported for each application, including topic discovery for the Reuters domain. Experiments also demonstrate the value of label presence/absence data in learning accurate mixtures.

...read moreread less

Book Chapter•DOI•

A Comparison of Some New Measures of Skewness

[...]

Guy Brys¹, Mia Hubert², Anja Struyf¹•Institutions (2)

University of Antwerp¹, Catholic University of Leuven²

01 Jan 2003

TL;DR: In this paper, the authors proposed several new measures of skewness which are more robust against outlying values, and compared their properties using both real and simulated data, using both simulated and real data sets.

...read moreread less

Abstract: Asymmetry of a univariate continuous distribution is commonly described as skewness. The well-known classical skewness coefficient is based on the first three moments of the data set, and hence it is strongly affected by the presence of one or more outliers. In this paper we propose several new measures of skewness which are more robust against outlying values. Their properties are compared using both real and simulated data.

...read moreread less

Journal Article•DOI•

Searching for additive outliers in nonstationary time series

[...]

Pierre Perron¹, Gabriel Rodríguez¹•Institutions (1)

University of Ottawa¹

01 Mar 2003-Journal of Time Series Analysis

TL;DR: It is shown that Vogelsang's iterative method to detect outliers is incorrect and an alternative method based on first‐differenced data that has considerably more power is proposed that leads to unit root tests with more accurate finite sample size and robustness to departures from a unit root.

...read moreread less

Abstract: Recently, Vogelsang (1999) proposed a method to detect outliers which explicitly imposes the null hypothesis of a unit root It works in an iterative fashion to select multiple outlier in a given series We show, via simulations, that, under the null hypothesis of no outliers, it has the right size in finite samples to detect a single outlier but, when applied in an iterative fashion to select multiple outliers, it exhibits severe size distortions towards finding an excessive number of outliers We show that his iterative method is incorrect and derive the appropriate limiting distribution of the test at each step of the search Whether corrected or not, we also show that the outliers need to be very large for the method to have any decent power We propose an alternative method based on first-differenced data that has considerably more power We also show that our method to identify outliers leads to unit root tests with more accurate finite sample size and robustness to departures from a unit root The issues are illustrated using two US/Finland real-exchange rate series

...read moreread less

Journal Article•DOI•

A semi-direct approach to structure from motion

[...]

Hailin Jin¹, Paolo Favaro¹, Stefano Soatto²•Institutions (2)

Washington University in St. Louis¹, University of California, Los Angeles²

01 Oct 2003-The Visual Computer

TL;DR: In this article, the authors present an algorithm that integrates image-feature tracking and 3D motion estimation into a closed loop, while detecting and rejecting outlier regions that do not fit the model.

...read moreread less

Abstract: The problem of structure from motion is often decomposed into two steps: feature correspondence and three-dimensional reconstruction. This separation often causes gross errors when establishing correspondence fails. Therefore, we advocate the necessity to integrate visual information not only in time (i.e. across different views), but also in space, by matching regions --- rather than points --- using explicit photometric deformation models. We present an algorithm that integrates image-feature tracking and three-dimensional motion estimation into a closed loop, while detecting and rejecting outlier regions that do not fit the model. Due to occlusions and the causal nature of our algorithm, a drift in the estimates accumulates over time. We describe a method to perform global registration of local estimates of motion and structure by matching the appearance of feature regions stored over long time periods. We use image intensities to construct a score function that takes into account changes in brightness and contrast. Our algorithm is recursive and suitable for real-time implementation.

...read moreread less

Proceedings Article•DOI•

Enhancements on local outlier detection

[...]

A.L.M. Chiu¹, Ada Wai-Chee Fu¹•Institutions (1)

The Chinese University of Hong Kong¹

16 Jul 2003

TL;DR: This paper focuses on the density-based notion that discovers local outliers by means of the local outlier factor (LOF) formulation and three enhancement schemes over LOF are introduced, namely LOF' and LOF" and GridLOF.

...read moreread less

Abstract: Outliers, commonly referred to as exceptional cases, exist in many real-world databases. Detection of such outliers is important for many applications. In this paper, we focus on the density-based notion that discovers local outliers by means of the local outlier factor (LOF) formulation. Three enhancement schemes over LOF are introduced, namely LOF' and LOF" and GridLOF. Thorough explanation and analysis is given to demonstrate the abilities of LOF' in providing simpler and more intuitive meaning of local outlier-ness; LOF" in handling cases where LOF fails to work appropriately; and GridLOF in improving the efficiency and accuracy.

...read moreread less

Journal Article•DOI•

A robust PCR method for high‐dimensional regressors

[...]

Mia Hubert¹, Sabine Verboven²•Institutions (2)

Katholieke Universiteit Leuven¹, University of Antwerp²

01 Aug 2003-Journal of Chemometrics

TL;DR: In this article, the authors proposed a robust principal component regression (RPCR) method for multivariate calibration model, which combines principal component analysis (PCA) on the regressors with least square regression.

...read moreread less

Abstract: We consider the multivariate calibration model which assumes that the concentrations of several constituents of a sample are linearly related to its spectrum. Principal component regression (PCR) is widely used for the estimation of the regression parameters in this model. In the classical approach it combines principal component analysis (PCA) on the regressors with least squares regression. However, both stages yield very unreliable results when the data set contains outlying observations. We present a robust PCR (RPCR) method which also consists of two parts. First we apply a robust PCA method for high-dimensional data on the regressors, then we regress the response variables on the scores using a robust regression method. A robust RMSECV value and a robust R 2 value are proposed as exploratory tools to select the number of principal components. The prediction error is also estimated in a robust way. Moreover, we introduce several diagnostic plots which are helpful to visualize and classify the outliers. The robustness of RPCR is demonstrated through simulations and the analysis of a real data set.

...read moreread less

Journal Article•DOI•

Robustness against separation and outliers in logistic regression

[...]

Peter J. Rousseeuw, Andreas Christmann

28 Jul 2003-Computational Statistics & Data Analysis

TL;DR: The hidden logistic regression model (HLS) model as discussed by the authors is a generalization of the LSTM model, where the unobservable true responses are comparable to a hidden layer in a feed forward neural network.

...read moreread less

Journal Article•DOI•

Error statistics of asteroid optical astrometric observations

[...]

M. Carpino, Andrea Milani¹, Steven R. Chesley²•Institutions (2)

University of Pisa¹, Jet Propulsion Laboratory²

01 Dec 2003-Icarus

TL;DR: In this paper, the authors describe a method for carefully assessing the statistical performance of the various observatories that have produced asteroid astrometry, with the ultimate goal of using this statistical characterization to improve asteroid orbit determination.

...read moreread less

Proceedings Article•DOI•

Detecting spatial outliers with multiple attributes

[...]

Chang-Tien Lu, Dechang Chen¹, Yufeng Kou²•Institutions (2)

Uniformed Services University of the Health Sciences¹, Virginia Tech²

03 Nov 2003

TL;DR: This paper proposes two approaches to discover spatial outliers with multiple attributes, formulate the multi-attribute spatial outlier detection problem in a general way, provide two effective detection algorithms, and analyze their computation complexity.

...read moreread less

Abstract: A spatial outlier is a spatially referenced object whose non-spatial attribute values are significantly different from the values of its neighborhood. Identification of spatial outliers can lead to the discovery of unexpected, interesting, and useful spatial patterns for further analysis. Previous work in spatial outlier detection focuses on detecting spatial outliers with a single attribute. In the paper, we propose two approaches to discover spatial outliers with multiple attributes. We formulate the multi-attribute spatial outlier detection problem in a general way, provide two effective detection algorithms, and analyze their computation complexity. In addition, using a real-world census data, we demonstrate that our approaches can effectively identify local abnormality in large spatial data sets.

...read moreread less

Collapse