Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

doi:10.2307/2981802

Home
/
Papers
/
Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

Journal Article•DOI•

Regression Diagnostics: Identifying Influential Data and Sources of Collinearity

W. W. Muir¹•Institutions (1)

University of Strathclyde¹

01 May 1981-Vol. 144, Iss: 3, pp 367-368

TL;DR: This chapter discusses Detecting Influential Observations and Outliers, a method for assessing Collinearity, and its applications in medicine and science.

read less

Abstract: 1. Introduction and Overview. 2. Detecting Influential Observations and Outliers. 3. Detecting and Assessing Collinearity. 4. Applications and Remedies. 5. Research Issues and Directions for Extensions. Bibliography. Author Index. Subject Index.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Classification and regression trees

[...]

Wei-Yin Loh¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2011-Wiley Interdisciplinary Reviews-Data Mining and Knowledge Discovery

TL;DR: This article gives an introduction to the subject of classification and regression trees by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples.

...read moreread less

Abstract: Classification and regression trees are machine-learning methods for constructing prediction models from data. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each partition. As a result, the partitioning can be represented graphically as a decision tree. Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost. Regression trees are for dependent variables that take continuous or ordered discrete values, with prediction error typically measured by the squared difference between the observed and predicted values. This article gives an introduction to the subject by reviewing some widely available algorithms and comparing their capabilities, strengths, and weakness in two examples. © 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 14-23 DOI: 10.1002/widm.8 This article is categorized under: Technologies > Classification Technologies > Machine Learning Technologies > Prediction Technologies > Statistical Fundamentals

...read moreread less

16,974 citations

Additional excerpts

...Linear regression: 1970 Boston housing data (Harrison and Rubinfeld, 1978; Belsley et al., 1980)...
[...]

Book•

Experimental Design and Data Analysis for Biologists

[...]

Gerry P. Quinn¹, Michael J. Keough²•Institutions (2)

Monash University¹, University of Melbourne²

21 Mar 2002

TL;DR: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data is as discussed by the authors, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced.

...read moreread less

Abstract: An essential textbook for any student or researcher in biology needing to design experiments, sample programs or analyse the resulting data The text begins with a revision of estimation and hypothesis testing methods, covering both classical and Bayesian philosophies, before advancing to the analysis of linear and generalized linear models Topics covered include linear and logistic regression, simple and complex ANOVA models (for factorial, nested, block, split-plot and repeated measures and covariance designs), and log-linear models Multivariate techniques, including classification and ordination, are then introduced Special emphasis is placed on checking assumptions, exploratory data analysis and presentation of results The main analyses are illustrated with many examples from published papers and there is an extensive reference list to both the statistical and biological literature The book is supported by a website that provides all data sets, questions for each chapter and links to software

...read moreread less

9,509 citations

Cites background from "Regression Diagnostics: Identifying..."

...There will be a condition index for each principal component and values greater than 30 indicate collinearities that require attention (Belsley et al. 1980, Chaterjee & Price 1991). The second is the condition number, which is simply the largest condition index ( ). Third, Hocking (1996) proposed an indicator of collinearity that is simply min and suggested values less than 0....
[...]
...Belsley et al. (1980) and Cook & Weisberg (1982) are the standard references, and other good discussions and illustrations include Bollen & Jackman (1990), Chatterjee & Price (1991) and Neter et al....
[...]

Journal Article•DOI•

Local Indicators of Spatial Association—LISA

[...]

Luc Anselin¹•Institutions (1)

Arizona State University¹

03 Sep 2010-Geographical Analysis

TL;DR: In this paper, a new general class of local indicators of spatial association (LISA) is proposed, which allow for the decomposition of global indicators, such as Moran's I, into the contribution of each observation.

...read moreread less

Abstract: The capabilities for visualization, rapid data retrieval, and manipulation in geographic information systems (GIS) have created the need for new techniques of exploratory data analysis that focus on the “spatial” aspects of the data. The identification of local patterns of spatial association is an important concern in this respect. In this paper, I outline a new general class of local indicators of spatial association (LISA) and show how they allow for the decomposition of global indicators, such as Moran's I, into the contribution of each observation. The LISA statistics serve two purposes. On one hand, they may be interpreted as indicators of local pockets of nonstationarity, or hot spots, similar to the Gi and G*i statistics of Getis and Ord (1992). On the other hand, they may be used to assess the influence of individual locations on the magnitude of the global statistic and to identify “outliers,” as in Anselin's Moran scatterplot (1993a). An initial evaluation of the properties of a LISA statistic is carried out for the local Moran, which is applied in a study of the spatial pattern of conflict for African countries and in a number of Monte Carlo simulations.

...read moreread less

8,933 citations

Journal Article•DOI•

A caution regarding rules of thumb for variance inflation factors.

[...]

Robert M. O'Brien¹•Institutions (1)

University of Oregon¹

13 Mar 2007-Quality & Quantity

TL;DR: In this article, the authors examined the effect of the variance inflation factor (VIF) on the results of regression analyses, and found that threshold values of the VIF need to be evaluated in the context of several other factors that influence the variance of regression coefficients.

...read moreread less

Abstract: The Variance Inflation Factor (VIF) and tolerance are both widely used measures of the degree of multi-collinearity of the ith independent variable with the other independent variables in a regression model. Unfortunately, several rules of thumb – most commonly the rule of 10 – associated with VIF are regarded by many practitioners as a sign of severe or serious multi-collinearity (this rule appears in both scholarly articles and advanced statistical textbooks). When VIF reaches these threshold values researchers often attempt to reduce the collinearity by eliminating one or more variables from their analysis; using Ridge Regression to analyze their data; or combining two or more independent variables into a single index. These techniques for curing problems associated with multi-collinearity can create problems more serious than those they solve. Because of this, we examine these rules of thumb and find that threshold values of the VIF (and tolerance) need to be evaluated in the context of several other factors that influence the variance of regression coefficients. Values of the VIF of 10, 20, 40, or even higher do not, by themselves, discount the results of regression analyses, call for the elimination of one or more independent variables from the analysis, suggest the use of ridge regression, or require combining of independent variable into a single index.

...read moreread less

7,165 citations

Cites background from "Regression Diagnostics: Identifying..."

...Belsley et al. (1980) also note the asymmetry between these situations....
[...]
... Belsley et al. (1980) also note the asymmetry between these situations....
[...]
...duce parameter estimates of the “incorrect sign” and of implausible magnitude; create situations in which small changes in the data produce wide swings in parameter estimates; and, in truly extreme cases, prevent the numerical solution of a model ( Belsley et al., 1980; Greene, 1993)....
[...]
...…is large; produce parameter estimates of the “incorrect sign” and of implausible magnitude; create situations in which small changes in the data produce wide swings in parameter estimates; and, in truly extreme cases, prevent the numerical solution of a model (Belsley et al., 1980; Greene, 1993)....
[...]

Journal Article•DOI•

An Examination of the Nature of Trust in Buyer–Seller Relationships:

[...]

Patricia M. Doney, Joseph P. Cannon¹•Institutions (1)

Emory University¹

01 Apr 1997-Journal of Marketing

TL;DR: In this paper, the authors integrate theory developed in several disciplines to determine five cognitive processes through which industrial buyers can develop trust of a supplier firm and its salesperson and their salesperson.

...read moreread less

Abstract: The authors integrate theory developed in several disciplines to determine five cognitive processes through which industrial buyers can develop trust of a supplier firm and its salesperson. These p...

...read moreread less

6,637 citations