Showing papers on "Outlier published in 2007"

PDF

Open Access

Proceedings Article•DOI•

Incremental Local Outlier Detection for Data Streams

[...]

Dragoljub Pokrajac¹, Aleksandar Lazarevic, Longin Jan Latecki²•Institutions (2)

Delaware State University¹, Temple University²

04 Jun 2007

TL;DR: The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points in the data set.

...read moreread less

Abstract: Outlier detection has recently become an important problem in many industrial and financial applications. This problem is further complicated by the fact that in many cases, outliers have to be detected from data streams that arrive at an enormous pace. In this paper, an incremental LOF (local outlier factor) algorithm, appropriate for detecting outliers in data streams, is proposed. The proposed incremental LOF algorithm provides equivalent detection performance as the iterated static LOF algorithm (applied after insertion of each data record), while requiring significantly less computational time. In addition, the incremental LOF algorithm also dynamically updates the profiles of data points. This is a very important property, since data profiles may change over time. The paper provides theoretical evidence that insertion of a new data point as well as deletion of an old data point influence only limited number of their closest neighbors and thus the number of updates per such insertion/deletion does not depend on the total number of points TV in the data set. Our experiments performed on several simulated and real life data sets have demonstrated that the proposed incremental LOF algorithm is computationally efficient, while at the same time very successful in detecting outliers and changes of distributional behavior in various data stream applications

...read moreread less

397 citations

Journal Article•DOI•

Conditional Anomaly Detection

[...]

Xiuyao Song¹, Mingxi Wu¹, Chris Jermaine¹, Sanjay Ranka¹•Institutions (1)

University of Florida¹

01 May 2007-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A general purpose method called conditional anomaly detection for taking differences among attributes into account, and three different expectation-maximization algorithms for learning the model that is used in conditional anomalies detection are proposed.

...read moreread less

Abstract: When anomaly detection software is used as a data analysis tool, finding the hardest-to-detect anomalies is not the most critical task. Rather, it is often more important to make sure that those anomalies that are reported to the user are in fact interesting. If too many unremarkable data points are returned to the user labeled as candidate anomalies, the software can soon fall into disuse. One way to ensure that returned anomalies are useful is to make use of domain knowledge provided by the user. Often, the data in question includes a set of environmental attributes whose values a user would never consider to be directly indicative of an anomaly. However, such attributes cannot be ignored because they have a direct effect on the expected distribution of the result attributes whose values can indicate an anomalous observation. This paper describes a general purpose method called conditional anomaly detection for taking such differences among attributes into account, and proposes three different expectation-maximization algorithms for learning the model that is used in conditional anomaly detection. Experiments with more than 13 different data sets compare our algorithms with several other more standard methods for outlier or anomaly detection

...read moreread less

331 citations

Journal Article•DOI•

Robust statistics in data analysis — A review: Basic concepts

[...]

Michal Daszykowski¹, K. Kaczmarek¹, K. Kaczmarek², Y. Vander Heyden², Beata Walczak¹ - Show less +1 more•Institutions (2)

University of Silesia in Katowice¹, Vrije Universiteit Brussel²

15 Feb 2007-Chemometrics and Intelligent Laboratory Systems

TL;DR: In this paper some basic concepts of robust techniques are presented and their usefulness in chemometric data analysis is stressed.

...read moreread less

321 citations

Journal Article•DOI•

Error modelling, calibration and analysis of an AM–CW terrestrial laser scanner system

[...]

Derek D. Lichti¹•Institutions (1)

Curtin University¹

01 Jan 2007-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: Results from controlled testing show that significant improvement is achieved by using the proposed model in terms of both reducing the magnitude of observational residuals as well as the three-dimensional positioning accuracy of signalised points.

...read moreread less

Abstract: A rigorous method for terrestrial laser scanner self-calibration using a network of signalised points is presented. Exterior orientation, object point co-ordinates and additional parameters are estimated simultaneously by free network adjustment. Spherical co-ordinate observation equations are augmented with a set of additional parameters that model systematic errors in range, horizontal direction and elevation angle. The error models include both physically interpretable and empirically identified components. Though the focus is on one particular make and model of AM–CW scanner system, the Faro 880, the mathematical models are formulated in a general framework so their application to other instruments only requires selection of an appropriate set of additional parameters. Results from controlled testing show that significant improvement is achieved by using the proposed model in terms of both reducing the magnitude of observational residuals as well as the three-dimensional positioning accuracy of signalised points. Ten self-calibration datasets captured over the course of 13 months are used to examine short- and long-term additional parameter stability via standard hypothesis testing techniques. Detailed investigations into correlation mechanisms between model parameters accompany the self-calibration solution analyses. Other contributions include an observation model for incorporation of integrated inclinometer observations into the self-calibration solution and an effective a priori outlier removal method. The benefit of the former is demonstrated to be reduced correlation between exterior orientation and additional parameters, even if inclinometer precision is low. The latter is arrived at by detailed analysis of the influence of incidence angle on range.

...read moreread less

233 citations

Proceedings Article•DOI•

Detecting distance-based outliers in streams of data

[...]

Fabrizio Angiulli¹, Fabio Fassetti¹•Institutions (1)

University of Calabria¹

06 Nov 2007

TL;DR: In this work a method for detecting distance-based outliers in data streams is presented, where outlier queries are performed in order to detect anomalies in the current window using the sliding window model.

...read moreread less

Abstract: In this work a method for detecting distance-based outliers in data streams is presented. We deal with the sliding window model, where outlier queries are performed in order to detect anomalies in the current window. Two algorithms are presented. The first one exactly answers outlier queries, but has larger space requirements. The second algorithm is directly derived from the exact one, has limited memory requirements and returns an approximate answer based on accurate estimations with a statistical guarantee. Several experiments have been accomplished, confirming the effectiveness of the proposed approach and the high quality of approximate solutions.

...read moreread less

228 citations

Book Chapter•DOI•

Outlier Detection with Kernel Density Functions

[...]

Longin Jan Latecki¹, Aleksandar Lazarevic, Dragoljub Pokrajac²•Institutions (2)

Temple University¹, Delaware State University²

18 Jul 2007

TL;DR: A novel unsupervised algorithm for outlier detection with a solid statistical foundation is proposed, modifying a nonparametric density estimate with a variable kernel to yield a robust local density estimation.

...read moreread less

Abstract: Outlier detection has recently become an important problem in many industrial and financial applications. In this paper, a novel unsupervised algorithm for outlier detection with a solid statistical foundation is proposed. First we modify a nonparametric density estimate with a variable kernel to yield a robust local density estimation. Outliers are then detected by comparing the local density of each point to the local density of its neighbors. Our experiments performed on several simulated data sets have demonstrated that the proposed approach can outperform two widely used outlier detection algorithms (LOF and LOCI).

...read moreread less

225 citations

Journal Article•DOI•

Hydrological modelling of the Chaohe Basin in China: Statistical model formulation and Bayesian inference

[...]

Jing Yang¹, Peter Reichert¹, Karim C. Abbaspour¹, Hong Yang¹•Institutions (1)

Swiss Federal Institute of Aquatic Science and Technology¹

15 Jul 2007-Journal of Hydrology

TL;DR: In this paper, the authors developed a procedure to overcome the problem of non-identifiability of distributed parameters by introducing aggregate parameters and using Bayesian inference, and they demonstrated the good performance of this approach to uncertainty analysis, particularly with respect to the fulfilment of statistical assumptions of the error model.

...read moreread less

221 citations

Journal Article•DOI•

Automatic outlier detection for time series: an application to sensor data

[...]

Sabyasachi Basu, Martin Meckesheimer

08 Feb 2007-Knowledge and Information Systems

TL;DR: Two variations of a method that uses the median from a neighborhood of a data point and a threshold value to compare the difference between the median and the observed data value are proposed.

...read moreread less

Abstract: In this article we consider the problem of detecting unusual values or outliers from time series data where the process by which the data are created is difficult to model. The main consideration is the fact that data closer in time are more correlated to each other than those farther apart. We propose two variations of a method that uses the median from a neighborhood of a data point and a threshold value to compare the difference between the median and the observed data value. Both variations of the method are fast and can be used for data streams that occur in quick succession such as sensor data on an airplane.

...read moreread less

219 citations

Journal Article•DOI•

Algorithms for Projection–Pursuit robust principal component analysis

[...]

Christophe Croux¹, Peter Filzmoser², M. R. Oliveira•Institutions (2)

Katholieke Universiteit Leuven¹, Vienna University of Technology²

15 Jun 2007-Chemometrics and Intelligent Laboratory Systems

TL;DR: In this paper, a robust projection-pursuit-based method for principal component analysis (PCA) is proposed for the analysis of chemical data, where the number of variables is typically large.

...read moreread less

207 citations

Proceedings Article•DOI•

Outlier detection in sensor networks

[...]

Bo Sheng¹, Qun Li¹, Weizhen Mao¹, Wen Jin²•Institutions (2)

College of William & Mary¹, Simon Fraser University²

09 Sep 2007

TL;DR: A histogram-based method for outlier detection to reduce communication cost by collecting hints (in the form of a histogram) about the data distribution, and using the hints to filter out unnecessary data and identify potential outliers.

...read moreread less

Abstract: Outlier detection has many important applications in sensor networks, e.g., abnormal event detection, animal behavior change, etc. It is a difficult problem since global information about data distributions must be known to identify outliers. In this paper, we use a histogram-based method for outlier detection to reduce communication cost. Rather than collecting all the data in one location for centralized processing, we propose collecting hints (in the form of a histogram) about the data distribution, and using the hints to filter out unnecessary data and identify potential outliers. We show that this method can be used for detecting outliers in terms of two different definitions. Our simulation results show that the histogram method can dramatically reduce the communication cost.

...read moreread less

195 citations

Proceedings Article•DOI•

Detecting anomalous records in categorical datasets

[...]

Kaustav Das¹, Jeff Schneider¹•Institutions (1)

Carnegie Mellon University¹

12 Aug 2007

TL;DR: An alternative definition of anomalies is presented, and an approach of comparing against marginal distribution of attribute subsets is proposed that has a better performance over semi-synthetic as well as real world datasets.

...read moreread less

Abstract: We consider the problem of detecting anomalies in high aritycategorical datasets. In most applications, anomalies are defined as datapoints that are "abnormal". Quite often we have access to data which consists mostly of normal records, a long with a small percentage of unlabelled anomalous records. We are interested in the problem of unsupervised anomaly detection, where we use the unlabelled data for training, and detect records that do not follow the definition of normality.A standard approach is to create a model of normal data, and compare test records against it. A probabilistic approach builds a likelihood model from the training data. Records are tested for anomalies based on the complete record likelihood given the probability model. For categorical attributes, bayes nets give a standard representation of the likelihood. While this approach is good at finding outliers in the dataset, it often tends to detect records with attribute values that are rare. Sometimes, just detecting rare values of an attribute is not desired and such outliers are not considered as anomalies in that context. We present an alternative definition of anomalies, and propose an approach of comparing against marginal distribution of attribute subsets. We show that this is a more meaningful way of detecting anomalies, and has a better performance over semi-synthetic as well as real world datasets.

...read moreread less

Journal Article•DOI•

Face Recognition Using Total Margin-Based Adaptive Fuzzy Support Vector Machines

[...]

Yi-Hung Liu¹, Yen-Ting Chen¹•Institutions (1)

Chung Yuan Christian University¹

01 Jan 2007-IEEE Transactions on Neural Networks

TL;DR: Experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy and can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained.

...read moreread less

Abstract: This paper presents a new classifier called total margin-based adaptive fuzzy support vector machines (TAF-SVM) that deals with several problems that may occur in support vector machines (SVMs) when applied to the face recognition. The proposed TAF-SVM not only solves the overfitting problem resulted from the outlier with the approach of fuzzification of the penalty, but also corrects the skew of the optimal separating hyperplane due to the very imbalanced data sets by using different cost algorithm. In addition, by introducing the total margin algorithm to replace the conventional soft margin algorithm, a lower generalization error bound can be obtained. Those three functions are embodied into the traditional SVM so that the TAF-SVM is proposed and reformulated in both linear and nonlinear cases. By using two databases, the Chung Yuan Christian University (CYCU) multiview and the facial recognition technology (FERET) face databases, and using the kernel Fisher's discriminant analysis (KFDA) algorithm to extract discriminating face features, experimental results show that the proposed TAF-SVM is superior to SVM in terms of the face-recognition accuracy. The results also indicate that the proposed TAF-SVM can achieve smaller error variances than SVM over a number of tests such that better recognition stability can be obtained

...read moreread less

Journal Article•DOI•

A weighted support vector machine for data classification

[...]

Xulei Yang¹, Qing Song¹, Yue Wang¹•Institutions (1)

Nanyang Technological University¹

01 Aug 2007-International Journal of Pattern Recognition and Artificial Intelligence

TL;DR: Experimental results indicate that the proposed method reduces the affect of outliers and yields higher classification rate than standard SVM does when outliers exist in the training data set.

...read moreread less

Abstract: This paper presents a weighted support vector machine (WSVM) to improve the outlier sensitivity problem of standard support vector machine (SVM) for two-class data classification. The basic idea is to assign different weights to different data points such that the WSVM training algorithm learns the decision surface according to the relative importance of data points in the training data set. The weights used in WSVM are generated by a robust fuzzy clustering algorithm, kernel-based possibilistic c-means (KPCM) algorithm, whose partition generates relative high values for important data points but low values for outliers. Experimental results indicate that the proposed method reduces the effect of outliers and yields higher classification rate than standard SVM does when outliers exist in the training data set.

...read moreread less

Journal Article•DOI•

Robust Linear Model Selection Based on Least Angle Regression

[...]

Jafar A. Khan¹, Stefan Van Aelst¹, Ruben H. Zamar²•Institutions (2)

University of British Columbia¹, Ghent University²

01 Jan 2007-Journal of the American Statistical Association

TL;DR: This article considers the problem of building a linear prediction model when the number of candidate predictors is large and the data possibly contain anomalies that are difficult to visualize and clean and introduces two different approaches to robustify LARS.

...read moreread less

Abstract: In this article we consider the problem of building a linear prediction model when the number of candidate predictors is large and the data possibly contain anomalies that are difficult to visualize and clean. We want to predict the nonoutlying cases; therefore, we need a method that is simultaneously robust and scalable. We consider the stepwise least angle regression (LARS) algorithm which is computationally very efficient but sensitive to outliers. We introduce two different approaches to robustify LARS. The plug-in approach replaces the classical correlations in LARS by robust correlation estimates. The cleaning approach first transforms the data set by shrinking the outliers toward the bulk of the data (which we call multivariate Winsorization) and then applies LARS to the transformed data. We show that the plug-in approach is time-efficient and scalable and that the bootstrap can be used to stabilize its results. We recommend using bootstrapped robustified LARS to sequence a number of candidate pred...

...read moreread less

Journal Article•DOI•

Outlier sums for differential gene expression analysis

[...]

Robert Tibshirani¹, Trevor Hastie¹•Institutions (1)

Stanford University¹

01 Jan 2007-Biostatistics

TL;DR: A method for detecting genes that, in a disease group, exhibit unusually high gene expression in some but not all samples, which can be particularly useful in cancer studies, where mutations that can amplify or turn off gene expression often occur in only a minority of samples.

...read moreread less

Abstract: We propose a method for detecting genes that, in a disease group, exhibit unusually high gene expression in some but not all samples. This can be particularly useful in cancer studies, where mutations that can amplify or turn off gene expression often occur in only a minority of samples. In real and simulated examples, the new method often exhibits lower false discovery rates than simple t-statistic thresholding. We also compare our approach to the recent cancer profile outlier analysis proposal of Tomlins and others (2005).

...read moreread less

Journal Article•DOI•

Extension of the mixture of factor analyzers model to incorporate the multivariate t-distribution

[...]

Geoff McLachlan¹, Richard Bean¹, L. Ben-Tovim Jones¹•Institutions (1)

University of Queensland¹

01 Jul 2007-Computational Statistics & Data Analysis

TL;DR: An EM-based algorithm is developed for the fitting of mixtures of t-factor analyzers and its application is demonstrated in the clustering of some microarray gene-expression data.

...read moreread less

Proceedings Article•DOI•

Outlier Robust ICP for Minimizing Fractional RMSD

[...]

Jeff M. Phillips¹, Ran Liu¹, Carlo Tomasi¹•Institutions (1)

Duke University¹

21 Aug 2007

TL;DR: A new distance measure is formalized, fractional root mean squared distance (FRMSD), which incorporates the fraction of inliers into the distance function and is guaranteed to converge to a locally optimal solution.

...read moreread less

Abstract: We describe a variation of the iterative closest point (ICP) algorithm for aligning two point sets under a set of transformations. Our algorithm is superior to previous algorithms because (1) in determining the optimal alignment, it identifies and discards likely outliers in a statistically robust manner, and (2) it is guaranteed to converge to a locally optimal solution. To this end, we formalize a new distance measure, fractional root mean squared distance (FRMSD), which incorporates the fraction of inliers into the distance function. Our framework can easily incorporate most techniques and heuristics from modern registration algorithms. We experimentally validate our algorithm against previous techniques on 2 and 3 dimensional data exposed to a variety of outlier types.

...read moreread less

Proceedings Article•DOI•

A Kalman filter for robust outlier detection

[...]

Jo-Anne Ting¹, Evangelos A. Theodorou¹, Stefan Schaal¹•Institutions (1)

University of Southern California¹

10 Dec 2007

TL;DR: A modified Kalman filter is introduced that can perform robust, real-time outlier detection in the observations, without the need for manual parameter tuning by the user, using a weighted least squares-like approach.

...read moreread less

Abstract: In this paper, we introduce a modified Kalman filter that can perform robust, real-time outlier detection in the observations, without the need for manual parameter tuning by the user. Robotic systems that rely on high quality sensory data can be sensitive to data containing outliers. Since the standard Kalman filter is not robust to outliers, other variations of the Kalman filter have been proposed to overcome this issue, but these methods may require manual parameter tuning, use of heuristics or complicated parameter estimation. Our Kalman filter uses a weighted least squares-like approach by introducing weights for each data sample. A data sample with a smaller weight has a weaker contribution when estimating the current time step's state. We learn the weights and system dynamics using a variational Expectation-Maximization framework. We evaluate our Kalman filter algorithm on data from a robotic dog.

...read moreread less

Journal Article•DOI•

Permanents, Order Statistics, Outliers, and Robustness

[...]

Narayanaswamy Balakrishnan¹•Institutions (1)

McMaster University¹

30 Mar 2007-Revista Matematica Complutense

TL;DR: In this article, the robustness properties of several linear estimators when multiple outliers are possibly present in the sample are investigated. But the robust properties of linear estimator are not discussed.

...read moreread less

Abstract: In this paper, we consider order statistics and outlier models, and focus primarily on multiple-outlier models and associated robustness issues. We first synthesise recent developments on order statistics arising from independent and non-identically distributed random variables based primarily on the theory of permanents. We then highlight various applications of these results in evaluating the robustness properties of several linear estimators when multiple outliers are possibly present in the sample.

...read moreread less

Journal Article•DOI•

Discussion paper. Conditional growth charts

[...]

Ying Wei, Xuming He

22 Feb 2007-arXiv: Statistics Theory

TL;DR: This paper proposed a global semiparametric quantile regression model that has the ability to estimate conditional quantiles without the usual distributional assumptions, and developed a new model assessment tool for longitudinal growth data.

...read moreread less

Abstract: Growth charts are often more informative when they are customized per subject, taking into account prior measurements and possibly other covariates of the subject. We study a global semiparametric quantile regression model that has the ability to estimate conditional quantiles without the usual distributional assumptions. The model can be estimated from longitudinal reference data with irregular measurement times and with some level of robustness against outliers, and it is also flexible for including covariate information. We propose a rank score test for large sample inference on covariates, and develop a new model assessment tool for longitudinal growth data. Our research indicates that the global model has the potential to be a very useful tool in conditional growth chart analysis.

...read moreread less

Journal Article•DOI•

Dealing with missing values and outliers in principal component analysis

[...]

Ivana Stanimirova¹, Michal Daszykowski¹, Beata Walczak¹•Institutions (1)

University of Silesia in Katowice¹

15 Apr 2007-Talanta

TL;DR: It is shown that the proposed strategy for dealing with missing values and outlying observations simultaneously in principal component analysis works well for highly contaminated data containing different amounts of missing elements.

...read moreread less

Journal Article•DOI•

Bayesian uncertainty analysis in distributed hydrologic modeling: A case study in the Thur River basin (Switzerland)

[...]

Jing Yang¹, Peter Reichert¹, Karim C. Abbaspour¹•Institutions (1)

Swiss Federal Institute of Aquatic Science and Technology¹

01 Oct 2007-Water Resources Research

TL;DR: In this article, a continuous time autoregressive error model was proposed for statistical inference and uncertainty analysis in hydrologic modeling and applied to the Thur River basin in Switzerland, subject to completely different climatic conditions.

...read moreread less

Abstract: [1] Calibration and uncertainty analysis in hydrologic modeling are affected by measurement errors in input and response and errors in model structure. Recently, extending similar approaches in discrete time, a continuous time autoregressive error model was proposed for statistical inference and uncertainty analysis in hydrologic modeling. The major advantages over discrete time formulation are the use of a continuous time error model for describing continuous processes, the possibility of accounting for seasonal variations of parameters in the error model, the easier treatment of missing data or omitted outliers, and the opportunity for continuous time predictions. The model was developed for the Chaohe Basin in China and had some features specific for this semiarid climatic region (in particular, the seasonal variation of parameters in the error model in response to seasonal variation in precipitation). This paper tests and extends this approach with an application to the Thur River basin in Switzerland, which is subject to completely different climatic conditions. This application corroborates the general applicability of the approach but also demonstrates the necessity of accounting for the heavy tails in the distributions of residuals and innovations. This is done by replacing the normal distribution of the innovations by a Student t distribution, the degrees of freedom of which are adapted to best represent the shape of the empirical distribution of the innovations. We conclude that with this extension, the continuous time autoregressive error model is applicable and flexible for hydrologic modeling under different climatic conditions. The major remaining conceptual disadvantage is that this class of approaches does not lead to a separate identification of model input and model structural errors. The major practical disadvantage is the high computational demand characteristic for all Markov chain Monte Carlo techniques.

...read moreread less

Journal Article•DOI•

Robust estimators for the fixed effects panel data model

[...]

Maria Caterina Bramati¹, Christophe Croux²•Institutions (2)

Université libre de Bruxelles¹, Katholieke Universiteit Leuven²

01 Nov 2007-Econometrics Journal

TL;DR: The aim of this work is to study robust regression techniques in the fixed effects linear panel data framework by means of breakdown point computations and simulation experiments, and to show the potential of robust panel data methods.

...read moreread less

Abstract: Panel data estimators can be strongly biased in the presence of outlying observations. Although most researchers are aware of this problem, little literature is existing on robust estimation of the parameters in a panel data model. In this paper, robust versions of the classical Within Group estimator are considered. The robustness of these estimators with respect to outliers will be investigated. The presence of outliers can lead to erroneous estimates in regression models. Indeed, the classical least-squares (LS) approach is known to be very sensitive to outliers. Moreover, outliers are not always detectable by looking at residuals from a LS fit, since the latter suffers from the masking effect. Masking means here that outliers affect the LS estimator in such a way that outlier diagnostics based on LS are not capable of detecting them anymore. Note also that diagnostic measures like the Cook Distance suffer from the masking effect, as soon as multiple outliers are present. More robust alternatives to LS are the Least Absolute Deviation estimator and M-estimators. Unfortunately, these estimators are not robust with respect to leverage points, i.e. outliers in the space of the covariates. Thus, regression estimators having a high breakdown

...read moreread less

Posted Content•

Trust and Growth: A Shaky Relationship

[...]

Mikael Elinder¹, Mikael Elinder², Niclas Berggren³, Niclas Berggren¹, Henrik Jordahl¹, Henrik Jordahl⁴, Henrik Jordahl⁵ - Show less +3 more•Institutions (5)

Research Institute of Industrial Economics¹, Uppsala University², University of Economics, Prague³, Center for Economic Studies⁴, Institute for the Study of Labor⁵

24 May 2007-Social Science Research Network

TL;DR: The authors conducted an extensive robustness analysis of the relationship between trust and growth by investigating a later time period and a bigger sample than in previous studies, and found that when outliers (especially China) are removed, the trust-growth relationship is no longer robust.

...read moreread less

Abstract: We conduct an extensive robustness analysis of the relationship between trust and growth by investigating a later time period and a bigger sample than in previous studies. In addition to robustness tests that focus on model uncertainty, we systematize the investigation of outlier influence on the results by using the robust estimation technique Least Trimmed Squares. We find that when outliers (especially China) are removed, the trust-growth relationship is no longer robust. On average, the trust coefficient is half as large as in previous findings.

...read moreread less

Book Chapter•DOI•

Incremental one-class learning with bounded computational complexity

[...]

Rowland R. Sillito¹, Robert B. Fisher¹•Institutions (1)

University of Edinburgh¹

09 Sep 2007

TL;DR: This method is shown to outperform a current state-of-the-art incremental one-class learning algorithm (Incremental SVDD) on a variety of datasets, while requiring only an upper limit on model complexity to be specified.

...read moreread less

Abstract: An incremental one-class learning algorithm is proposed for the purpose of outlier detection. Outliers are identified by estimating - and thresholding - the probability distribution of the training data. In the early stages of training a non-parametric estimate of the training data distribution is obtained using kernel density estimation. Once the number of training examples reaches the maximum computationally feasible limit for kernel density estimation, we treat the kernel density estimate as a maximally-complex Gaussian mixture model, and keep the model complexity constant bymerging a pair of components for each newkernel added. This method is shown to outperform a current state-of-the-art incremental one-class learning algorithm (Incremental SVDD [5]) on a variety of datasets, while requiring only an upper limit on model complexity to be specified.

...read moreread less

Proceedings Article•DOI•

Robust multi-task learning with t-processes

[...]

Shipeng Yu¹, Volker Tresp, Kai Yu•Institutions (1)

Siemens¹

20 Jun 2007

TL;DR: A robust framework for Bayesian multitask learning, t-processes (TP), which are a generalization of Gaussian processes for multi-task learning, which allows the system to effectively distinguish good tasks from noisy or outlier tasks.

...read moreread less

Abstract: Most current multi-task learning frameworks ignore the robustness issue, which means that the presence of "outlier" tasks may greatly reduce overall system performance. We introduce a robust framework for Bayesian multitask learning, t-processes (TP), which are a generalization of Gaussian processes (GP) for multi-task learning. TP allows the system to effectively distinguish good tasks from noisy or outlier tasks. Experiments show that TP not only improves overall system performance, but can also serve as an indicator for the "informativeness" of different tasks.

...read moreread less

Journal Article•DOI•

Cancer outlier differential gene expression detection.

[...]

Baolin Wu¹•Institutions (1)

University of Minnesota¹

01 Jul 2007-Biostatistics

TL;DR: This work proposes the outlier robust t-statistic (ORT), which is intuitively motivated from the t-Statistic, the most commonly used differential gene expression detection method for cancer genes that are over- or down-expressed in some but not all samples in a disease group.

...read moreread less

Abstract: SUMMARY We study statistical methods to detect cancer genes that are over- or down-expressed in some but not all samples in a disease group. This has proven useful in cancer studies where oncogenes are activated only in a small subset of samples. We propose the outlier robust t-statistic (ORT), which is intuitively motivated from thet-statistic, the most commonly used differential gene expression detection method. Using real and simulation studies, we compare the ORT to the recently proposed cancer outlier profile analysis (Tomlins and others, 2005) and the outlier sum statistic of Tibshirani and Hastie (2006). The proposed method often has more detection power and smaller false discovery rates. Supplementary information can be found at http://www.biostat.umn.edu/∼baolin/research/ort.html.

...read moreread less

Journal Article•DOI•

Rank-order versus mean based statistics for neuroimaging.

[...]

Chris Rorden¹, Leonardo Bonilha¹, Thomas E. Nichols²•Institutions (2)

University of South Carolina¹, University of Michigan²

01 May 2007-NeuroImage

TL;DR: A nonparametric approach for neuroimaging data analysis that is based on the rank-order of data that may offer a small benefit for datasets where the assumptions of the t-test have been violated, for example datasets where data from one of the groups exhibits a skewed distribution due to floor or ceiling effects.

...read moreread less

Journal Article•DOI•

Example-based single document image super-resolution: a global MAP approach with outlier rejection

[...]

Dmitry Datsenko¹, Michael Elad¹•Institutions (1)

Technion – Israel Institute of Technology¹

01 Sep 2007-Multidimensional Systems and Signal Processing

TL;DR: This paper proposes an efficient scheme for using image examples as driving a powerful regularization, applied to the image scale-up (super-resolution) problem, and demonstrates the algorithm on several scanned documents with promising results.

...read moreread less

Abstract: Regularization plays a vital role in inverse problems, and especially in ill-posed ones. Along with classical regularization techniques based on smoothness, entropy, and sparsity, an emerging powerful regularization is one that leans on image examples. In this paper, we propose an efficient scheme for using image examples as driving a powerful regularization, applied to the image scale-up (super-resolution) problem. In this work, we target specifically scanned documents containing written text, graphics, and equations. Our algorithm starts by assigning per each location in the degraded image several candidate high-quality patches. Those are found as the nearest-neighbors (NN) in an image-database that contains pairs of corresponding low- and high-quality image patches. The found examples are used for the definition of an image prior expression, merged into a global MAP penalty function. We use this penalty function both for rejecting some of the irrelevant outlier examples, and then for reconstructing the desired image. We demonstrate our algorithm on several scanned documents with promising results.

...read moreread less

Proceedings Article•DOI•

An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method

[...]

Kyung-A Yoon¹, Ohsung Kwon¹, Doo-Hwan Bae¹•Institutions (1)

KAIST¹

20 Sep 2007

TL;DR: This work proposes an approach to outlier detection of software measurement data using the k-means clustering method in this work, which helps in the detection of the outlier which reduces the data quality during the software measurement implementation.

...read moreread less

Abstract: The quality of software measurement data affects the accuracy of project manager's decision making using estimation or prediction models and the understanding of real project status. During the software measurement implementation, the outlier which reduces the data quality is collected, however its detection is not easy. To cope with this problem, we propose an approach to outlier detection of software measurement data using the k-means clustering method in this work.

...read moreread less

Collapse