scispace - formally typeset


About: Outlier is a(n) research topic. Over the lifetime, 13823 publication(s) have been published within this topic receiving 291287 citation(s). more


Open accessBook
01 Jan 1987-
Abstract: 1. Introduction. 2. Simple Regression. 3. Multiple Regression. 4. The Special Case of One-Dimensional Location. 5. Algorithms. 6. Outlier Diagnostics. 7. Related Statistical Techniques. References. Table of Data Sets. Index. more

Topics: Outlier (64%), Robust regression (61%), Influential observation (58%) more

6,948 Citations

Journal ArticleDOI: 10.1145/335191.335388
16 May 2000-
Abstract: For many KDD applications, such as detecting criminal activities in E-commerce, finding the rare instances or the outliers, can be more interesting than finding the common patterns. Existing work in outlier detection regards being an outlier as a binary property. In this paper, we contend that for many scenarios, it is more meaningful to assign to each object a degree of being an outlier. This degree is called the local outlier factor (LOF) of an object. It is local in that the degree depends on how isolated the object is with respect to the surrounding neighborhood. We give a detailed formal analysis showing that LOF enjoys many desirable properties. Using real-world datasets, we demonstrate that LOF can be used to find outliers which appear to be meaningful, but can otherwise not be identified with existing approaches. Finally, a careful performance evaluation of our algorithm confirms we show that our approach of finding local outliers can be practical. more

Topics: Local outlier factor (64%), Outlier (58%), Anomaly detection (52%)

4,117 Citations

Open accessJournal ArticleDOI: 10.1023/B:AIRE.0000045502.10941.A9
Victoria J. Hodge1, Jim Austin1Institutions (1)
Abstract: Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review. more

Topics: Outlier (62%), Anomaly detection (58%)

2,897 Citations

Open accessJournal ArticleDOI: 10.1109/TPAMI.2012.88
Guangcan Liu1, Zhouchen Lin2, Shuicheng Yan3, Ju Sun4  +2 moreInstitutions (5)
Abstract: In this paper, we address the subspace clustering problem. Given a set of data samples (vectors) approximately drawn from a union of multiple subspaces, our goal is to cluster the samples into their respective subspaces and remove possible outliers as well. To this end, we propose a novel objective function named Low-Rank Representation (LRR), which seeks the lowest rank representation among all the candidates that can represent the data samples as linear combinations of the bases in a given dictionary. It is shown that the convex program associated with LRR solves the subspace clustering problem in the following sense: When the data is clean, we prove that LRR exactly recovers the true subspace structures; when the data are contaminated by outliers, we prove that under certain conditions LRR can exactly recover the row space of the original data and detect the outlier as well; for data corrupted by arbitrary sparse errors, LRR can also approximately recover the row space with theoretical guarantees. Since the subspace membership is provably determined by the row space, these further imply that LRR can perform robust subspace clustering and error correction in an efficient and effective way. more

Topics: Linear subspace (60%), Subspace topology (57%), Outlier (50%)

2,441 Citations

Open accessBook
01 Jan 1988-
Abstract: General Introduction Introduction History of Mixture Models Background to the General Classification Problem Mixture Likelihood Approach to Clustering Identifiability Likelihood Estimation for Mixture Models via EM Algorithm Start Values for EMm Algorithm Properties of Likelihood Estimators for Mixture Models Information Matrix for Mixture Models Tests for the Number of Components in a Mixture Partial Classification of the Data Classification Likelihood Approach to Clustering Mixture Models with Normal Components Likelihood Estimation for a Mixture of Normal Distribution Normal Homoscedastic Components Asymptotic Relative Efficiency of the Mixture Likelihood Approach Expected and Observed Information Matrices Assessment of Normality for Component Distributions: Partially Classified Data Assessment of Typicality: Partially Classified Data Assessment of Normality and Typicality: Unclassified Data Robust Estimation for Mixture Models Applications of Mixture Models to Two-Way Data Sets Introduction Clustering of Hemophilia Data Outliers in Darwin's Data Clustering of Rare Events Latent Classes of Teaching Styles Estimation of Mixing Proportions Introduction Likelihood Estimation Discriminant Analysis Estimator Asymptotic Relative Efficiency of Discriminant Analysis Estimator Moment Estimators Minimum Distance Estimators Case Study Homogeneity of Mixing Proportions Assessing the Performance of the Mixture Likelihood Approach to Clustering Introduction Estimators of the Allocation Rates Bias Correction of the Estimated Allocation Rates Estimated Allocation Rates of Hemophilia Data Estimated Allocation Rates for Simulated Data Other Methods of Bias Corrections Bias Correction for Estimated Posterior Probabilities Partitioning of Treatment Means in ANOVA Introduction Clustering of Treatment Means by the Mixture Likelihood Approach Fitting of a Normal Mixture Model to a RCBD with Random Block Effects Some Other Methods of Partitioning Treatment Means Example 1 Example 2 Example 3 Example 4 Mixture Likelihood Approach to the Clustering of Three-Way Data Introduction Fitting a Normal Mixture Model to Three-Way Data Clustering of Soybean Data Multidimensional Scaling Approach to the Analysis of Soybean Data References Appendix more

Topics: Mixture model (63%), Expectation–maximization algorithm (60%), Cluster analysis (59%) more

2,394 Citations

No. of papers in the topic in previous years

Top Attributes

Show by:

Topic's top 5 most impactful authors

Peter Filzmoser

48 papers, 3.6K citations

Peter J. Rousseeuw

44 papers, 11K citations

Marco Riani

30 papers, 807 citations

Christophe Croux

28 papers, 1K citations

Mia Hubert

24 papers, 1.5K citations

Network Information
Related Topics (5)
Prior probability

14.8K papers, 428.9K citations

90% related
Bayesian probability

26.5K papers, 817.9K citations

89% related
Posterior probability

13.7K papers, 475K citations

89% related
Mixture model

18.1K papers, 588.3K citations

89% related
Statistical hypothesis testing

19.5K papers, 1M citations

89% related