scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1998"


Journal ArticleDOI
TL;DR: In this article, the effects of a break under the null hypothesis and the choice of break date are considered, and new limiting distributions are derived, including the case where a shift in trend occurs under the unit root null hypothesis.
Abstract: The authors consider unit root tests that allow a shift in trend at an unknown time. They focus on the additive outlier approach but also give results for the innovational outlier approach. Various methods of choosing the break date are considered. New limiting distributions are derived, including the case where a shift in trend occurs under the unit root null hypothesis. Limiting distributions are invariant to mean shifts but not to slope shifts. Simulations are used to assess finite sample size and power. The authors focus on the effects of a break under the null and the choice of break date. Copyright 1998 by Economics Department of the University of Pennsylvania and the Osaka University Institute of Social and Economic Research Association.

490 citations


Journal ArticleDOI
TL;DR: It was concluded that careful design of a selection algorithm should include consideration of spectral noise distributions in the input data to increase the likelihood of successful and appropriate selection for data with noise distributions resulting in large outliers.
Abstract: The mathematical basis of improved calibration through selection of informative variables for partial least-squares calibration has been identified. A theoretical investigation of calibration slopes indicates that including uninformative wavelengths negatively affect calibrations by producing both large relative bias toward zero and small additive bias away from the origin. These theoretical results are found regardless of the noise distribution in the data. Studies are performed to confirm this result using a previously used selection method compared to a new method, which is designed to perform more appropriately when dealing with data having large outlying points by including estimates of spectral residuals. Three different data sets are tested with varying noise distributions. In the first data set, Gaussian and log-normal noise was added to simulated data which included a single peak. Second, near-infrared spectra of glucose in cell culture media taken with an FT-IR spectrometer were analyzed. Finally, dispersive Raman Stokes spectra of glucose dissolved in water were assessed. In every case considered here, improved prediction is produced through selection, but data with different noise characteristics showed varying degrees of improvement depending on the selection method used. The practical results showed that, indeed, including residuals into ranking criteria improves selection for data with noise distributions resulting in large outliers. It was concluded that careful design of a selection algorithm should include consideration of spectral noise distributions in the input data to increase the likelihood of successful and appropriate selection.

256 citations


Journal ArticleDOI
TL;DR: It is proposed that robust statistical analysis can be of great use for determinations of reference intervals from limited or possibly unreliable data.
Abstract: We propose a new methodology for the estimation of reference intervals for data sets with small numbers of observations or for those with substantial numbers of outliers. We propose a prediction interval that uses robust estimates of location and scale. The SAS software can be readily modified to do these calculations. We compared four reference interval procedures (nonparametric, transformed, robust with a nonparametric lower limit, and transformed robust) for sample sizes of 20, 40, 60, 80, 100, and 120 from chi 2 distributions of 1, 4, 7, and 10 df. chi 2 distributions were chosen because they simulate the skewness of distributions often found in clinical chemistry populations. We used the root mean square error as the measure of performance and used computer simulation to calculate this measure. The robust estimator showed the best performance for small sample sizes. As the sample size increased, the performance values converged. The robust method for calculating upper reference interval values yields reasonable results. In two examples using real data for haptoglobin and glucose, the robust estimator provides slightly smaller upper reference limits than the other procedures. Lastly, the robust estimator was compared with the other procedures in a population where 5% of the values were multiplied by a factor of 5. The reference intervals were calculated with and without outlier detection. In this case, the robust approach consistently yielded upper reference interval values that were closer to those of the true underlying distributions. We propose that robust statistical analysis can be of great use for determinations of reference intervals from limited or possibly unreliable data.

233 citations


Proceedings ArticleDOI
23 Jun 1998
TL;DR: This paper employs a simple and efficient outlier rejection rule, called X84, and proves that its theoretical assumptions are satisfied in the feature tracking scenario, and shows a quantitative example of the benefits introduced by the algorithm for the case of fundamental matrix estimation.
Abstract: This paper addresses robust feature tracking. We extend the well-known Shi-Tomasi-Kanade tracker by introducing an automatic scheme for rejecting spurious features. We employ a simple and efficient outlier rejection rule, called X84, and prove that its theoretical assumptions are satisfied in the feature tracking scenario. Experiments with real and synthetic images confirm that our algorithm makes good features track better; we show a quantitative example of the benefits introduced by the algorithm for the case of fundamental matrix estimation. The complete code of the robust tracker is available via ftp.

197 citations


Journal ArticleDOI
TL;DR: Experiments on both 2-D and 3-D data sets show that convergence is possible even for very rough initial positionings, and that the final registration accuracy typically approaches less than one quarter of the interpoint sampling resolution of the images.

169 citations


Journal ArticleDOI
TL;DR: This article developed several techniques for data exploration for outliers and outlier analysis and then applied these to the detailed analysis of outliers in two large scale multilevel data sets from educational contexts.
Abstract: This paper offers the data analyst a range of practical procedures for dealing with outliers in multilevel data. It first develops several techniques for data exploration for outliers and outlier analysis and then applies these to the detailed analysis of outliers in two large scale multilevel data sets from educational contexts. The techniques include the use of deviance reduction, measures based on residuals, leverage values, hierarchical cluster analysis and a measure called DFITS. Outlier analysis is more complex in a multilevel data set than in, say, a univariate sample or a set of regression data, where the concept of an outlying value is straightforward. In the multilevel situation one has to consider, for example, at what level or levels a particular response is outlying, and in respect of which explanatory variables; furthermore, the treatment of a particular response at one level may affect its status or the status of other units at other levels in the model.

155 citations


Journal ArticleDOI
TL;DR: Both the proposed resampling by the half-means method and the smallest half-volume method are simple to use, are conceptually clear, and provide results superior to MVT and the current best-performing technique, MCD.
Abstract: The unreliability of multivariate outlier detection techniques such as Mahalanobis distance and hat matrix leverage has been known in the statistical community for well over a decade. However, only within the past few years has a serious effort been made to introduce robust methods for the detection of multivariate outliers into the chemical literature. Techniques such as the minimum volume ellipsoid (MVE), multivariate trimming (MVT), and M-estimators (e.g., PROP), and others similar to them, such as the minimum covariance determinant (MCD), rely upon algorithms that are difficult to program and may require significant processing times. While MCD and MVE have been shown to be statistically sound, we found MVT unreliable due to the method's use of the Mahalanobis distance measure in its initial step. We examined the performance of MCD and MVT on selected data sets and in simulations and compared the results with two methods of our own devising. Both the proposed resampling by the half-means method and the smallest half-volume method are simple to use, are conceptually clear, and provide results superior to MVT and the current best-performing technique, MCD. Either proposed method is recommended for the detection of multiple outliers in multivariate data.

139 citations


Posted Content
TL;DR: This paper showed that the substitution of the quartiles with the median leads to a better performance in the non-Gaussian case in terms of resistance and efficiency, and an outside rate that is less affected by the sample size.
Abstract: The techniques of exploratory data analysis include a resistant rule, based on a linear combination of quartiles, for identification of outliers. This paper shows that the substitution of the quartiles with the median leads to a better performance in the non-Gaussian case. The improvement occurs in terms of resistance and efficiency, and an outside rate that is less affected by the sample size. The paper also studies issues of practical importance in the spirit of robustness by considering moderately skewed and fat tail distributions obtatined as special cases of the Generalized Lambda Distribution.

126 citations


Journal ArticleDOI
TL;DR: Two test statistics are presented in judging the adequacy of a hypothesized model; both are asymptotically distribution free if using distribution free weight matrices and possess both finite sample and large sample robustness.
Abstract: Covariance structure analysis is used to evaluate hypothesized influences among unmeasured latent and observed variables. As implemented, it is not robust to outliers and bad data. Several robust methods in model fitting and testing are proposed. These include direct estimation of M-estimators of structured parameters and a two-stage procedure based on robust M- and S-estimators of population covariances. The large sample properties of these estimators are obtained. The equivalence between a direct M-estimator and a two-stage estimator based on an M-estimator of population covariance is established when sampling from an elliptical distribution. Two test statistics are presented in judging the adequacy of a hypothesized model; both are asymptotically distribution free if using distribution free weight matrices. So these test statistics possess both finite sample and large sample robustness. The two-stage procedures can be easily adapted into standard software packages by modifying existing asymptotically distribution free procedures. To demonstrate the two-stage procedure, S-estimator and M-estimators under different weight functions are calculated for some real data sets.

121 citations


Journal ArticleDOI
TL;DR: Two new methods for robust parameter estimation of mixtures in the context of magnetic resonance (MR) data segmentation are presented, one based on genetic algorithms and the other on an extension of the expectation-maximization algorithm that estimates parameters of Gaussian mixtures but incorporates an outlier rejection scheme.
Abstract: Presents two new methods for robust parameter estimation of mixtures in the context of magnetic resonance (MR) data segmentation. The head is constituted of different types of tissue that can be modeled by a finite mixture of multivariate Gaussian distributions. The authors' goal is to estimate accurately the statistics of desired tissues in presence of other ones of lesser interest. These latter can be considered as outliers and can severly bias the estimates of the former. For this purpose, the authors introduce a first method, which is an extension of the expectation-maximization (EM) algorithm, that estimates parameters of Gaussian mixtures but incorporates an outlier rejection scheme which allows to compute the properties of the desired tissues in presence of atypical data. The second method is based on genetic algorithms and is well suited for estimating the parameters of mixtures of different kind of distributions. The authors use this property by adding a uniform distribution to the Gaussian mixture for modeling the outliers. The proposed genetic algorithm can efficiently estimate the parameters of this extended mixture for various initial settings. Also, by changing the minimization criterion, estimates of the parameters can be obtained by histogram fitting which considerably reduces the computational cost. Experiments on synthetic and real MR data show that accurate estimates of the gray and white matters parameters are computed.

120 citations


01 Jan 1998
TL;DR: A covariance estimator is developed using a Bayesian formulation, which is adviintageous when the training set size varies and reflects the prior of each class, and a binary tree design is proposed to deal with the problem of varying training sample size.
Abstract: An important problem in pattern recognition is the effect of limited training samples on classification performance. When the ratio of the number of training samples to the dimensionality is small, parameter estimates become highly variable, causing the deterioration of classification performance. This problem has become more prevalent in remlote sensing with the emergence of a new generation of sensors. While the new sensor technology provides higher spectral and spatial resolution, enabling a greater number of spectrally separable classes to be identified, the needed labeled samples for designing the classifier remain difficult and expensive to acquire. In this thesis, several issules concerning the classification of high dimensional data with limited training samples are addressed. First of all, better parameter estimates can be obtained using a large number of unlabeled samples in addition to training samples under the mixture model. However, the estimation method is sensitive to the presence of statistical out1:iers. In remote sensing data, classes with few samples are difficult to identify and may constitute statistical outliers. Therefore, a robust parameter estima.tion method for the mixture model is introduced. Motivated by the fact that covariance estimates become highly variable with limited training samples, a covariance estimator is developed using a Bayesian formulation. The proposed covariance estimator is adviintageous when the training set size varies and reflects the prior of each class. Finally, a binary tree design is proposed to deal with the problem of varying training sample size. The proposed binary tree can function as both a classifiler and a feature extraction method. The benefits and limitations of the proposed methods are discussed and demonstrated with experiments. Work leading to the report was supported in part by NASA Grant NAG5-3975. This. support is gratefully acknowledged. CHAPTER 1: INTRODUCTION

Book
01 Jan 1998
TL;DR: The algorithm used to compute the weighted likelihood estimates is competitive with EM, and it is similar to EM when the components are not well separated, and a new statistical stopping rule is proposed for the termination of the algorithm.
Abstract: Problems associated with the analysis of data from a mixture of distributions include the presence of outliers in the sample, the fact that a component may not be well represented in the data, and the problem of biases that occur when the model is slightly misspecified. We study the performance of weighted likelihood in this context. The method produces estimates with low bias and mean squared error, and it is useful in that it unearths data substructures in the form of multiple roots. This in turn indicates multiple potential mixture model fits due to the presence of more components than originally specified in the model. To compute the weighted likelihood estimates, we use as starting values the method of moment estimates computed on bootstrap subsamples drawn from the data. We address a number of important practical issues involving bootstrap sample size selection, the role of starting values, and the behavior of the roots. The algorithm used to compute the weighted likelihood estimates is competitive with EM, and it is similar to EM when the components are not well separated. Moreover, we propose a new statistical stopping rule for the termination of the algorithm. An example and a small simulation study illustrate the above points.

Journal ArticleDOI
TL;DR: In this paper, a simple way of constructing a bivariate boxplot based on convex hull peeling and B-spline smoothing is proposed, which leads to defining a natural inner region which is completely nonparametric and smooth.

Posted Content
TL;DR: A framework to predict the full probability distribution is presented as a mixture model: the dynamics of the individual states is modeled with so-called "experts" (potentially nonlinear neural networks), and the dynamics between the states is modeling using a hidden Markov approach.
Abstract: Most approaches in forecasting merely try to predict the next value of the time series.In contrast, this paper presents a framework to predict the full probability distribution. Itis expressed as a mixture model: the dynamics of the individual states is modeled with so-called"experts" (potentially nonlinear neural networks), and the dynamics between the states is modeledusing a hidden Markov approach. The full density predictions are obtained by a weighted superpositionof the individual densities of each expert. This model class is called "hidden Markov experts".Results are presented for daily S&P500 data. While the predictive accuracy of the mean doesnot improve over simpler models, evaluating the prediction of the full density shows a clear out-of-sampleimprovement both over a simple GARCH(1,l) model (which assumes Gaussian distributedreturns) and over a "gated experts" model (which expresses the weighting for each state non-recursivelyas a function of external inputs). Several interpretations are given: the blending ofsupervised and unsupervised learning, the discovery of hidden states, the combination of forecasts,the specialization of experts, the removal of outliers, and the persistence of volatility.

Journal ArticleDOI
TL;DR: Simulations show that in the second-order case, the threshold performance of the technique is close to that of the WSF method and stochastic/deterministic ML methods, which are known today as the most powerful and computationally expensive DOA estimation techniques.
Abstract: A new powerful tool for improving the threshold performance of direction finding is considered. The main idea of our approach is to reduce the number of outliers in the DOA estimates using a previously proposed joint estimation strategy (JES). For this purpose, multiple different DOA estimators are calculated in a parallel manner for the same batch of data (i.e. for a single data record). Employing these estimators simultaneously, the JES improves the threshold performance because it removes outliers and exploits only "successful" estimators that are sorted out using a hypothesis testing procedure. We consider an efficient modification of the JES with application to the pseudo-randomly generated eigenstructure estimator banks based on secondand higher order statistics. Weighted MUSIC estimators based on the covariance and contracted quadricovariance matrices are chosen as appropriate underlying techniques for the second- and fourth-order estimator banks, respectively. Computer simulations with uncorrelated sources verify the dramatic improvements of threshold performance as compared with the conventional second- and fourth-order MUSIC algorithms. Simulations also show that in the second-order case, the threshold performance of our technique is close to that of the WSF method and stochastic/deterministic ML methods, which are known today as the most powerful (in the sense of estimation performance) and, at the same time, as the most computationally expensive DOA estimation techniques. The computational cost of our algorithm is much lower than that of the WSF and ML techniques because no multidimensional optimization is required.

Journal ArticleDOI
TL;DR: This article uses outlier-robust estimation techniques to examine the impact of atypical events on cointegration analysis and proposes a new diagnostic tool for signaling when standard coIntegration results might be driven by a few aberrant observations.
Abstract: Standard unit-root and cointegration tests are sensitive to atypical events such as outliers and structural breaks. In this article, we use outlier-robust estimation techniques to examine the impact of these events on cointegration analysis. Our outlier-robust cointegration test provides a new diagnostic tool for signaling when standard cointegration results might be driven by a few aberrant observations. A main feature of our approach is that the proposed robust estimator can be used to compute weights for all observations, which in turn can be used to identify the approximate dates of atypical events. We evaluate our method using simulated data and a Monte Carlo experiment. We also present an empirical example showing the usefulness of the proposed analysis.

Journal ArticleDOI
01 Dec 1998-Test
TL;DR: Focusing on generalized linear mixed models, this work explores the questions of when hierarchical model stages are separable and checkable and illustrates the approach with both real and simulated data.
Abstract: Recent computational advances have made it feasible to fit hierarchical models in a wide range of serious applications. In the process, the question of model adequacy arises. While model checking usually addresses the entire model specification, model failures can occur at each hierarchical stage. Such failures include outliers, mean structures errors, dispersion misspecification, and inappropriate exchangeabilities. We propose an approach which is entirely simulation based. Given a model specification and a dataset, we need only be able to simulate draws from the resultant posterior. By replicating a posterior of interest using data obtained under the model we can “see” the extent of variability in such a posterior. Then, we can compare the posterior obtained under the observed data with this medley of posterior replicates to ascertain whether the former is in agreement with them and accordingly, whether it is plausible that the observed data came from the proposed model. Many such comparisons can be run, each focusing on a different potential model failure. Focusing on generalized linear mixed models, we explore the questions of when hierarchical model stages are separable and checkable and illustrate the approach with both real and simulated data.

Journal ArticleDOI
TL;DR: This work considers econometric modeling of weekly observed scanning data on a fast moving consumer good with a specific focus on the relationship between market share, distribution, advertising, price, and promotion, and uses cointegration techniques to quantify the long-run effects of marketing efforts.

Journal ArticleDOI
TL;DR: The concepts of polarity and quality of the training data are introduced and two robust learning algorithms for determining a robust nonlinear interval regression model are proposed, derived in a manner similar to the back-propagation (BP) algorithm.

Journal ArticleDOI
TL;DR: The discrimination power of the classical or/and robust diagnostics for the 34 regression data sets with multiple outliers is compared to construct models (or logical rules) to identify outliers in the new data sets.

Proceedings ArticleDOI
12 May 1998
TL;DR: This paper addresses a new framework for designing robust neural network classifiers and suggests to adapt the outlier probability and regularisation parameters by minimizing the error on a validation set, and a simple gradient descent scheme is derived.
Abstract: This paper addresses a new framework for designing robust neural network classifiers. The network is optimized using the maximum a posteriori technique, i.e., the cost function is the sum of the log-likelihood and a regularization term (prior). In order to perform robust classification, we present a modified likelihood function which incorporates the potential risk of outliers in the data. This leads to the introduction of a new parameter, the outlier probability. Designing the neural classifier involves optimization of network weights as well as outlier probability and regularization parameters. We suggest to adapt the outlier probability and regularisation parameters by minimizing the error on a validation set, and a simple gradient descent scheme is derived. In addition, the framework allows for constructing a simple outlier detector. Experiments with artificial data demonstrate the potential of the suggested framework.

Journal ArticleDOI
TL;DR: Inverse modeling has become a standard technique for estimating hydrogeologic parameters as discussed by the authors, and robustness of these estimators has been tested by means of Monte Carlo simulations of a synthetic experiment, in which both non-Gaussian random errors and systematic modeling errors have been introduced.
Abstract: Inverse modeling has become a standard technique for estimating hydrogeologic parameters. These parameters are usually inferred by minimizing the sum of the squared differences between the observed system state and the one calculated by a mathematical model. The robustness of the least squares criterion, however, has to be questioned because of the tendency of outliers in the measurements to strongly influence the outcome of the inversion. We have examined alternative approaches to the standard least squares formulation. The robustness of these estimators has been tested by means of Monte Carlo simulations of a synthetic experiment, in which both non-Gaussian random errors and systematic modeling errors have been introduced. The approach was then applied to data from an actual gas-pressure-pulse-decay experiment. The study demonstrates that robust estimators have the potential to reduce estimation bias in the presence of noisy data and minor systematic errors, which may be a significant advantage over the standard least squares method.

Journal ArticleDOI
TL;DR: In this paper, the problem of identifying examinees with aberrant response patterns in a computerized adaptive test (CAT) is considered, where the vector Y of responses of an examinee from a CAT is a multivariate response vector.
Abstract: We consider the problem of identifying examinees with aberrant response patterns in a computerized adaptive test (CAT). The vector Y of responses of an examinee from a CAT is a multivariate response vector. Multivariate observations may be outlying in many different directions, and we characterize specific directions as corresponding to outliers with different interpretations. We develop a class of outlier statistics to identify different types of outliers based on a control chart–type methodology. The outlier methodology is adaptable to general longitudinal discretes data structures. We consider several procedures to judge how extreme a particular outlier is. Data from a nationally administered CAT examination motivates our development and is used to illustrate the results.

Journal ArticleDOI
TL;DR: A new clustering-based approach for multiple outlier identification that utilizes the predicted and residual values obtained from a least squares fit of the data is proposed.

Journal ArticleDOI
TL;DR: A cluster analysis technique is suggested as a way for discriminating outliers and normal observation data and it is shown that dynamic reconciliation can be carried out simultaneously with outlier detection.

Proceedings ArticleDOI
06 Jul 1998
TL;DR: A robust parameter estimation method for the mixture model that assigns full weight to the sample from the main body of the data, but automatically gives reduced weight to statistical outliers.
Abstract: An important problem in pattern recognition is the effect of limited training samples on classification performance. When the ratio of the number of training samples to the dimensionality is small, parameter estimates become highly variable, causing the deterioration of classification performance. This problem has become more prevalent in remote sensing with the emergence of a new generation of sensors. While the new sensor technology provides higher spectral and spatial resolution, enabling a greater number of spectrally separable classes to be identified, the needed labeled samples for designing the classifier remain difficult and expensive to acquire. Better parameter estimates can be obtained by exploiting a large number of unlabeled samples in addition to training samples using the expectation maximization (EM) algorithm under the mixture model. However, the estimation method is sensitive to the presence of statistical outliers. In remote sensing data, classes with few samples are difficult to identify and may constitute statistical outliers. Therefore, we propose a robust parameter estimation method for the mixture model. The proposed method assigns full weight to the sample from the main body of the data, but automatically gives reduced weight to statistical outliers. Experimental results show that the robust method prevents performance deterioration due to statistical outliers in the data as compared to the estimates obtained from EM approach.

Journal ArticleDOI
TL;DR: An application of the Artificial Neural Networks methodology was investigated using experimental data from a mixture properties study and compared to classical modelling technique both graphically and numerically to quantitatively describe the achieved degree of data fitting and robustness of the developed models.

01 Jan 1998
TL;DR: This project has examined the use of four newer techniques for visualising aspects of higher dimensional data sets: projection pursuit; Geographically Weighted Regression; RADVIZ; and Parallel Co-ordinates.
Abstract: Although visualisation has become a ‘hot topic’ in the social sciences, the majority of visualisation studies and techniques apply only to one or two dimensional datasets. Relatively little headway has been made into visualising higher dimensional data although, paradoxically, most social science datasets are highly multivariate. Investigating multivariate data, whether it be done visually or not, in just one or two dimensions can be highly misleading. Two well-known examples of this are the use of a correlation coefficient instead of a regression parameter as an indicator of the relationship between two variables and the use of scatterplots instead of leverage plots as indicators of relationships. This project has therefore investigated several methods for visualising aspects of higher dimensional (i.e. multivariate) datasets. Although some techniques are quite well-established for this purpose, such as Andrews Plots and Chernov Faces, we have ignored these because of their well-know problems. In the case of Andrews plots the functions used are subjective and the plots become very difficult to read when the number of observations rises beyond 30. In the case of Chernov faces, variables which are attached to certain attributes of the face, for example, the eyes, receive more weight in the subjective determination of ‘unusual’ cases. Instead we have examined the use of four newer techniques for visualising aspects of higher dimensional data sets: projection pursuit; Geographically Weighted Regression; RADVIZ; and Parallel Co-ordinates. In projection pursuit the objective is to project an m-dimensional set of points onto a two-dimensional plane (or a three-dimensional volume) by constrained optimisation. The choice of function to be optimised depends on what aspect of the data are the focus of investigation. The technique therefore offers a great deal of flexibility from identifying clusters of similar cases to identify outliers in multivariate space. A problem with projection pursuit though is that it is difficult to interpret because the projection plots produced are of indices produced by linear combinations of variables which might not have any obvious meaning. The technique of Geographical Weighted Regression usefully allows the visualisation of spatial non-stationarity in regression parameter estimates. The output from the technique consists of maps of the spatial drift in parameter estimates which can be used to investigate spatial variations in relationships or for model development because the maps can indicate the effects of missing variables. Relatively little mention is made of Geographically Weighted Regression here because the authors have developed this technique and have written about it in a number of other sources. The RADVIZ approach essentially involves calculating the resultant vector, for each case, of a series of m forces which are the m variables measured for that case. A plot of the locations of these resultants depicts the similarity in the overall measurements across the cases. It is particularly useful for compositional data, such as percentage shares of votes in elections. One drawback of the technique is that it is possible to get similar looking projections from quite different basic data properties and so the interpretation of RADVIZ needs some caution. Finally, the parallel co-ordinates approach is perhaps the most intuitive of the four techniques we examined in that it is essentially a multidimensional variation on the scatterplot. Instead of two axes though, in parallel co-ordinates you can draw relationships between m axes which are depicted as parallel lines. However, the choice of ordering of the axes is influential to the depiction of relationships within the dataset and care must therefore be taken in selecting a particular ordering and the depiction of the data in parallel co-ordinates can get rather messy when large numbers of cases are involved.

Journal ArticleDOI
TL;DR: Experimental results on several benchmarks indicate that the second approach to using two learning criteria and two approaches to using them for training neural network classifiers, specifically a Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF) networks, leads to an improved generalisation performance.
Abstract: This paper presents a study of two learning criteria and two approaches to using them for training neural network classifiers, specifically a Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF) networks. The first approach, which is a traditional one, relies on the use of two popular learning criteria, i.e. learning via minimising a Mean Squared Error (MSE) function or a Cross Entropy (CE) function. It is shown that the two criteria have different charcteristics in learning speed and outlier effects, and that this approach does not necessarily result in a minimal classification error. To be suitable for classification tasks, in our second approach an empirical classification criterion is introduced for the testing process while using the MSE or CE function for the training. Experimental results on several benchmarks indicate that the second approach, compared with the first, leads to an improved generalisation performance, and that the use of the CE function, compared with the MSE function, gives a faster training speed and improved or equal generalisation performance.

Journal ArticleDOI
TL;DR: In this paper, the authors considered likelihood ratio tests for detecting a single outlier in multivariate linear models, where an observation is called an outlier if there has been a shift in the mean.