scispace - formally typeset
Search or ask a question

Showing papers on "Outlier published in 1985"


Journal ArticleDOI
TL;DR: In this article, the statistical circumstances reflected varying conditions of relative forecast errors, error correlations and outliers, and the six methods chosen have all been advocated in various publications and consist of equal weighting, pooled average, optimal with independence assumption, and three variations on the formulation of a Bayesian combination based upon posterior probabilities.

105 citations


Journal ArticleDOI
TL;DR: In this paper, a restricted maximum likelihood (REML) approach was proposed to estimate the variance parameters of an outlier in a fixed-effect model, where the residual variance and outlier position are the same under both models.
Abstract: SUMMARY For single outliers in normal theory fixed effects models a mean slippage model is commonly used. An alternative is to model the outlier as arising from an unknown observation with inflated variance. Maximum likelihood estimates for the position of the outlier under the two models need not agree. This paper considers maximizing a restricted part of the likelihood to estimate the variance parameters and character- izes these estimates in terms of standard least squares parameters. It is shown that the residual variance and outlier position are the same under both models. In a recent paper Cook, Holschuh and Weisberg (1982) consider two models for single outliers in fLxed effects linear models. One is based on the assumption that contamination gives rise to slippage in the expected values of the observations (Weisberg, 1980, Section 5.3). Cook et al. point out the key role of Studentized residuals in this model, for instance in estimating the position of an outlier or testing for the presence of an outlier. Alternatively one can assume that an outlier arises from an error term with an increased variance. Cook et al. argue intuitively that it seems reasonable that Studentized residuals might play a similar role under this alternative model but show that maximum likelihood estimates of outlier position can differ under these two models. Cook et al. note that their models can be fitted into a linear model framework discussed by Harville (1977). They do not note that Harville recommends a restricted maximum likelihood (REML) approach using only a restricted part of the likelihood to estimate the variance para- meters. Patterson and Thompson (1971) noted that this REML approach takes account of loss of degrees of freedom in estimating flxed effects. In this note, REML estimates for the alternative model are derived and expressed in standard least squares statistics. It is shown that the observation having the largest Studentized residual is the one picked out as the outlier.

94 citations


Journal ArticleDOI
TL;DR: In this article, two approaches to robust estimation for the Box-Cox power-transformation model were considered, one approach maximizes weighted, modified likelihoods, and the other approach bounds a measure of gross-error sensitivity.
Abstract: We consider two approaches to robust estimation for the Box–Cox power-transformation model. One approach maximizes weighted, modified likelihoods. A second approach bounds a measure of gross-error sensitivity. Among our primary concerns is the performance of these estimators on actual data. In examples that we study, there seem to be only minor differences between these two robust estimators, but they behave rather differently than the maximum likelihood estimator or estimators that bound only the influence of the residuals. These examples show that model selection, determination of the transformation parameter, and outlier identification are fundamentally interconnected.

81 citations


Journal ArticleDOI
TL;DR: Nelson et al. as discussed by the authors proposed a robust estimation procedure to identify and reduce the influence of extreme locations for the bivariate normal home-range method and tested the goodness-of-fit of the assumed probability distribution.
Abstract: A robust estimation procedure is proposed to identify and reduce the influence of extreme locations for the bivariate normal home-range method. Tests are proposed for validating the underlying probability distribution from observed animal locations. Location data from a black bear (Ursus americanus) are used to demonstrate the effect of outliers on size and orientation of home-range estimates and to illustrate the goodness-of-fit tests. J. WILDL. MANAGE. 49(2):513-519 Burt (1943:351) defined the home range as "that area traversed by the individual in its normal activities of food gathering, mating, and caring for young." He believed that occasional sallies and exploratory moves outside the area should not be included as part of the home range. However, a lack of standard conventions for identifying such extreme locations has resulted in potentially arbitrary home-range estimates (Schoener 1981). Hayne (1949) recognized that biological understanding of an animal's home range required information about the intensity of use within the area. Furthermore, he believed that knowledge of the use pattern was important to define the limit of the home range. This pattern of use has subsequently been termed the utilization distribution or UD (Jennrich and Turner 1969, Van Winkle 1975, Anderson 1982). When the observed UD matches a simple probability distribution, we can readily obtain estimates of home-range size, shape, and orientation. These parameters provide a basis for important behavioral and ecological interpretations. Three commonly used methods to estimate home range are the minimum convex polygon, circular bivariate normal, and general bivariate normal methods. These approaches and their underlying probability distributions are discussed by Metzgar (1973); their biological assumptions, sample size biases, and sensitivity to extreme locations have been criticized by several authors (Jennrich and Turner 1969, Dixon and Chapman 1980, MacDonald et al. 1980, Schoener 1981, Anderson 1982). Typically, home-range methods are applied without evaluating the fit of the assumed probability distribution to the observed data. The method selected seems to be based on tradition rather than underlying properties of the data. I this paper we describe a robust estimator t minimize the problems of outliers for the general bivariate normal method and propose procedures for testing the goodness-of-fit of the assumed probability distributions. Location data from a black bear are used to illustrate the good ess-of-fit test and to demonstrate the effect of outliers on size and orientation for bivariate normal and minimum convex polygon home ranges. We wish to thank R. K. Steinhorst for valuable assistance during the development of the statistical methods. Data for our example were provided by J. Unsworth and the Idaho Dep. of Fish and Game. L. J. Nelson, D. F. Stauffer, and D. H. Johnson provided comments on the manuscript. Computer time for this project was provided by the Computing Cent., Univ. of Idaho. This is Contrib. No. 270 from the For., Wildl. and Range Exp. Stn., Univ. of Idaho. This content downloaded from 157.55.39.132 on Thu, 15 Sep 2016 04:56:52 UTC All use subject to http://about.jstor.org/terms 514 HOME RANGE * Samuel and Garton J. Wildl. Manage. 49(2):1985 UNIFORM DISTRIBUTION Metzgar (1973) described the frequency distribution of locations for an animal with equal probability of occurrence per unit of area throughout its home range. This bivariate uniform distribution assumes that the animal has no area of highest activity (center of activity), although an arithmetic center exists. A uniform UD may be appropriate for animals that perceive the environment in a fine-grained fashion (Schoener 1981), whose home ranges are uniform, and that lack a center of activity. The minimum convex polygon (MCP) appears to be an appropriate method to represent the homerange boundary of a uniform UD. The MCP method defines a distinct boundary as Metzgar (1973) suggested for the uniform UD, and Stickel (1954) found that home ranges of uniform use were accurately represented by similar polygon methods. However, because the MCP method does not consider the distribution of use within the home range (Macdonald et al. 1980, Voigt and Tinline 1980), the calculation of potential interaction measures (Macdonald et al. 1980, Voigt and Tinline 1980) by proportional home-range overlap (Owings et al. 1977, Nelson and Mech 1981, Seegmiller and Ohmart 1981) assumes an underlying uniform use pattern and may produce dramatically different values from that of a bivariate normal (Macdonald et al. 1980).

74 citations


Journal ArticleDOI
TL;DR: The authors explored the effects of outlier-induced collinearities on the estimation of regression coefficients and showed that these effects can be similar in many respects to those resulting from approximate linear dependencies among the columns of predictor-variable values.
Abstract: When an observation in a regression analysis has very large values on two or more predictor variables, artificial collinearities can be induced. The effects of such collinearities on a regression analysis are not well documented, although they can be shown to be similar in many respects to those resulting from approximate linear dependencies among the columns of predictor-variable values. The purpose of this article is to explore the effects of outlier-induced collinearities on the estimation of regression coefficients.

43 citations


Journal ArticleDOI
TL;DR: A method of detecting potentially bad data cases, or outliers, is presented which is based on the average squared deviation of a given subject's cross product of standard scores from the average over all correlations in the matrix.
Abstract: Bad data due to faked responses, errors, and other difficulties can distort correlations among variables leading to poor factor analytic results based on matrices of such correlations. A method of detecting potentially bad data cases, or outliers, is presented which is based on the average squared deviation of a given subject's cross product of standard scores from the average over all correlations in the matrix. Results of applying both this program and the BMD 10M outlier program to the same data examples are given. About 40 to 60 percent of the cases identified as outliers by the two programs were the same cases. Many cases identified as outliers proved not to be "bad data", however, so these programs should be used to identify cases that need scrutiny rather than as the sole basis for eliminating data.

27 citations



Journal ArticleDOI
TL;DR: In this article, a Bayesian estimator that incorporates prior information in a flexible way was developed to recover the prior beliefs that an investigator imposes by pre-analyzing the data via various sample inclusion rules.
Abstract: A frequent practice in empirical work is to "preanalyze" the data via various sample inclusion rules. Truncation of "outliers" is common. These procedures are a form of sample censoring imposed by the investigator. Such censoring produces effects familiar from the sample selection literature. This paper investigates the question why an investigator might want to censor a sample and what the costs are. In an empirical example, using a variance components model of a wage equation, potential inconsistency problems are highlighted. The results indicate that while the slope coefficients, $\hat\beta$ , may typically be less sensitive to censoring than the variance components, some common forms of censoring also markedly affect $\hat\beta$ . Finally, a Bayesian estimator that incorporates prior information in a flexible way was developed. The usual Bayesian procedure was reversed, by using the Bayesian estimator to recover the prior beliefs that an investigator imposes by ...

24 citations


Journal ArticleDOI
TL;DR: The masking effect in cases of tests for outlier(s) is defined and quantified by the loss in power due to the presence of more than the anticipated number of discordant observations in the sample as mentioned in this paper.
Abstract: The masking effect in cases of tests for outlier(s) is defined and quantified by the loss in power due to the presence of more than the anticipated number of discordant observations in the sample. This effect is illustrated in cases of some commonly used outlier tests for exponential samples—namely, Dixon-type tests and the Cochran test. A comparison between the performances of the modified Dixon-type test and the Cochran test is made. Tables for powers of these tests are presented for different sample sizes and different values of discordancy parameter. An illustrative example is presented to support the conclusion that Cochran and modified Dixon-type tests do not suffer from the masking effect in the presence of two outliers.

15 citations



Journal ArticleDOI
TL;DR: In this paper, an empirical performance study of outlier detection procedures for Weibull or extreme value distributions using a mixture model in which a known number of randomly chosen observations are contaminated was carried out.
Abstract: We carried out an empirical performance study of outlier detection procedures for Weibull or extreme–value distributions using a mixture model in which a known number of randomly chosen observations are contaminated. Procedures studied were: L(L') based on leaps (differences of adjacent observations divided by expectation), V, Q and W (Mann, 1982), R1(R'1), R2(R'2), R3(R'3) (Dixon, 1950) and G(G') (Grubbs, 1950). Percentage points for statistics L(L'), R1(R'1), R2(R'2), R3(R'3) and G(G') were computed empirically for the extreme-value distribution and are tabulated. The procedures L(L') (or equivalently in power V) performed best, with few exceptions, for the contaminated model tested. The Grubb statistic G' performed well in testing for lower outliers. Mann's W , which was best for the labeled slippage was substantially poorer than the others for the mixture model. Dixon's R1(R'1)is recommended as a generally useful test for sample sizes in the range investigated (n=5,20)

Journal ArticleDOI
TL;DR: It is shown that alternative robust estimators of the covariance matrix are appealing in analyzing VCG data when outliers are present in the sample and should be greatly expanded in order to validate the asymptotic properties of S.

Journal ArticleDOI
TL;DR: A BASIC program is provided that implements Mosteller and Tukey’s (1977) technique for weighting observations less as they depart from the middle of a distribution, resulting in a “bisquare-weighted mean” that is compared with more traditional measures of central tendency.
Abstract: Observations that depart considerably from the center of a distribution demand special consideration. They may be retained, trimmed, or weighted less than other data. This article provides a BASIC program that implements Mosteller and Tukey’s (1977) technique for weighting observations less as they depart from the middle of a distribution. Influence curves for this “bisquare-weighted mean,” or “bimean,” are displayed and compared with more traditional measures of central tendency.

Book ChapterDOI
01 Jan 1985
TL;DR: This chapter describes a few methods for discovering potential sources of poor fit to fixed effects models by providing methods for recognizing one or more estimates that deviate greatly from their expected values if the model were correct.
Abstract: This chapter describes a few methods for discovering potential sources of poor fit to fixed effects models. These diagnostic procedures provide methods for recognizing one or more estimates that deviate greatly from their expected values if the model were correct. These procedures often point to studies that differ from others in ways that are remediable; for example, they may represent mistakes in coding or calculation. Sometimes the diagnostic procedures point to sets of studies that differ in a collective way that suggests a new explanatory variable. Sometimes a study is an outlier that cannot be explained by an obvious characteristic of the study. The analysis of data containing a few observations that are outliers is a complicated task. It invariably requires the use of good judgment and decisions that are, in some sense, compromises. There are two extreme positions on dealing with outliers: (1) data are sacred, and no datum point (study) should ever be set aside for any reason and (2) data should be tested for outliers, and data points (studies) that fail to conform to the hypothesized model should be removed.

Journal ArticleDOI
TL;DR: In this paper, a methodology based on an application of the implicit function theorem to derive an approximation to the maximum likelihood estimator is presented for gaining insight into properties such as outlier influence, bias, and width of confidence intervals.
Abstract: A methodology is presented for gaining insight into properties — such as outlier influence, bias, and width of confidence intervals — of maximum likelihood estimates from nonidentically distributed Gaussian data. The methodology is based on an application of the implicit function theorem to derive an approximation to the maximum likelihood estimator. This approximation, unlike the maximum likelihood estimator, is expressed in closed form and thus it can be used in lieu of costly Monte Carlo simulation to study the properties of the maximum likelihood estimator.

Journal ArticleDOI
TL;DR: This paper showed that the median is the most bias-resistant estimator, in the class of L-statistics with symmetric nonnegative coefficients that add up to one, for a class of distributions which includes the normal, double-exponential and logistic distributions.
Abstract: The effect is studied of an outlier which has the same symmetric distribution as the other observations except for a change in location and a possible increase in scale. We show that the median is the most bias-resistant estimator, in the class of L-statistics with symmetric nonnegative coefficients that add up to one, for a class of distributions which includes the normal, double-exponential and logistic distributions.

Journal ArticleDOI
TL;DR: The results of a population genetic study of several Polynesian Outlier and Melanesian populations are compared with recent findings from archaeology to offer independent confirmation of particular inter-island contacts and prehistoric population movements.
Abstract: The results of a population genetic study of several Polynesian Outlier and Melanesian populations are compared with recent findings from archaeology. Certain remarkable correspondences offer independent confirmation of particular inter-island contacts and prehistoric population movements.

Journal ArticleDOI
TL;DR: In this article, the performance of least absolute deviations (LAD) estimators for the parameters of the first-order autoregressive model in the presence of outliers is examined.
Abstract: In general linear modeling, an alternative to the method of least squares (LS) is the least absolute deviations (LAD) procedure. Although LS is more widely used, the LAD approach yields better estimates in the presence of outliers. In this paper, we examine the performance of LAD estimators for the parameters of the first-order autoregressive model in the presence of outliers. A simulation study compared these estimates with those given by LS. The general conclusion is that LAD does not deal successfully with additive outliers. A simple procedure is proposed which allows exception reporting when outliers occur.

Journal ArticleDOI
TL;DR: The robust method of analysis is described and its potential usefulness is illustrated by applying the technique to two data sets.
Abstract: Recent advances in statistical estimation theory have resulted in the development of new procedures, called robust methods, that can be used to estimate the coefficients of a regression model Because such methods take into account the impact of discrepant data points during the initial estimation process, they offer a number of advantages over ordinary least squares and other analytical procedures (such as the analysis of outliers or regression diagnostics) This paper describes the robust method of analysis and illustrates its potential usefulness by applying the technique to two data sets The first application uses artificial data; the second uses a data set analyzed previously by Tufte [15] and, more recently, by Chatterjee and Wiseman [6]

Journal ArticleDOI
TL;DR: The authors showed that Fisher's Z, Arcsine and Ruben's transformation are robust in the presence of an outlier as long as b=3 and N ≥ 20, and that they are seriously affected for b=9 even when N=40.
Abstract: N independent bivariate observations are presumed to be generated from bivariate normal but in fact one of them is from . Four tests t, Fisher's Z, Arcsine and Ruben's transformation, are considered and i t is shown that they are robust in the presence of an outlier as long as b=3 and N ≥ 20, and that they are seriously affected for b=9 even when N=40.

Journal ArticleDOI
TL;DR: In this article, an outlier-generating model of mean-slippage type was used to characterise four different forms of outlier manifestation, and it was found that the unidentifiability problem provided no obstacle for detecting or testing the outliers for three of the four forms.
Abstract: Summary The linear structural model provides one way of modelling a linear relationship between two random variables. It is well known that problems of unidentifiability arise for unreplicated observations and normal error structure. As in all data sets, outliers can arise and methods are needed for detecting and testing them. An outlier-generating model of mean–slippage type can be used to characterise four different forms of outlier manifestation. It is interesting to find that the unidentifiability problem provides no obstacle for detecting or testing the outliers for three of the four forms. Detection principles, and specific discordancy tests, are derived and illustrated by application to some data on physical measurements of Pacific squid.

Journal ArticleDOI
TL;DR: In this paper, the authors give an extension of this concept to multivariate distributions by ordering multidimensional random variables in terms of the corresponding maximum norm, and show that conditions on the marginal distribution functions completely determine outlier-behaviour.
Abstract: QUEEN (1976) has considered the ideas of outlier-resistance and outlier-prone-ness of an individual distribution. We give an extension of this concept to multivariate distributions by ordering multidimensional random variables in terms of the corresponding maximum-norms. It is shown that conditions on the marginal distribution functions completely determine outlier-behaviour

01 Jun 1985
TL;DR: In this article, the authors present some adaptive estimates for regression models, which can be adapted with respect to the data in such a way that the resulting estimates are in some sense optimal.
Abstract: Regression models belong to those statistical models, which are applied to extremely diverse types of data in many fields of quantitative relationships Normally distributed errors are usually assumed and least squares estimates are applied It is known that for normally distributed errors the least squares estimates are optimal in several respects, while for nonnormally distributed errors these estimates are ineffective and, moreover, they are sensitive to outlying observations Classes of estimators were developed which show a reasonable behavior for comparatively large families of error distributions and which are not too sensitive to the outliers Such estimators are usually called robust Some of these estimators can be adapted with respect to the data In such a way that the resulting estimates are in some sense optimal; these estimators are called adaptive The aim of this paper is to present some adaptive estimates for regression models

Journal ArticleDOI
TL;DR: In this article, the authors present three diagnostic measures for judging the influence that single or multiple observations exert on the estimate s2, the mean square error, and discuss distributional properties of these measures and conclude with an example.
Abstract: We present three intuitive diagnostic measures for judging the influence that single or multiple observations exert on the estimate s2, the mean square error. The three measures are shown to be equivalent statistics. We discuss distributional properties of these measures and conclude with an example.

Journal ArticleDOI
TL;DR: In this paper, the measures of dispersion for ungrouped data proposed by Gini and Lienert, which are defined as the mean of the ranges of pairs and triplets of n values, are generalised.
Abstract: The measures of dispersion for ungrouped data proposed by Gini and Lienert, which are defined as the mean of the ranges of pairs and triplets of n values, are generalised. A family of measures of dispersion emerges with weights based on ranks instead of the measurements themselves.

Journal ArticleDOI
Ben F. Houston1
TL;DR: In this paper, the same marginal density for an inverted student density function was used with a different numerical integration procedure and the tables of normalized residual critical values for a single regression outlier prepared by Lund (1975) were extended from 100 to 500 data points for up to 24 independent variable terms.
Abstract: Tables of normalized residual critical values for a single regression outlier prepared by Lund (1975) are extended from 100 to 500 data points for up to 24 independent variable terms.The same marginal density for an inverted student density function was used with a different numerical integration procedure.