scispace - formally typeset
Search or ask a question

Showing papers on "Proper linear model published in 1977"


Book
01 Jan 1977
TL;DR: Simple linear regression Multiple linear regression Regression Diagnostics: Detection of Model Violations Qualitative Variables as Predictors Transformation of Variables Weighted Least Squares The Problem of Correlated Errors Analysis of Collinear Data Biased Estimation of Regression Coefficients Variable Selection Procedures Logistic Regression Appendix References as discussed by the authors
Abstract: Simple Linear Regression Multiple Linear Regression Regression Diagnostics: Detection of Model Violations Qualitative Variables as Predictors Transformation of Variables Weighted Least Squares The Problem of Correlated Errors Analysis of Collinear Data Biased Estimation of Regression Coefficients Variable Selection Procedures Logistic Regression Appendix References Index.

3,721 citations


Book
01 Jan 1977
TL;DR: In this paper, the authors take into serious consideration the further development of regression computer programs that are efficient, accurate, and considered an important part of statistical research, and provide up-to-date accounts of computational methods and algorithms currently in use without getting entrenched in minor computing details.
Abstract: Description: Regression analysis is an often used tool in the statistician's toolbox. This new edition takes into serious consideration the furthering development of regression computer programs that are efficient, accurate, and considered an important part of statistical research. The book provides up-to-date accounts of computational methods and algorithms currently in use without getting entrenched in minor computing details.

2,811 citations


Book
11 Jan 1977
TL;DR: The assumption is made in this volume devoted to data analysis and regression that the student has had a 1st course in statistics and that attitudes and approaches are more important than the techniques this book can teach.
Abstract: The assumption is made in this volume devoted to data analysis and regression that the student has had a 1st course in statistics. Attitudes and approaches are more important than the techniques this book can teach. Readers can learn to identify at least the following attitudes understanding and approaches: an approach to the formulation of statistical and data analytical problems such that for example the students shortcut to inference can be properly understood and the role of vague concepts becomes clear; the role of indications (of pointers to behavior not necessarily on prechosen scales) in contrast to conclusions or decisions about prechosen quantities or alternatives; the importance of displays and the value of graphs in forcing the unexpected upon the reader; the importance of re-expression; the need to seek out the real uncertainty as a nontrivial task; the importance of iterated calculation; how the ideas of robustness and resistance can change both what one does and what one thinks; what regression is all about; what regression coefficient can and cannot do; that the behavior of ones data can often be used to guide its analysis; the importance of looking at and drawing information from residuals; and the idea that data analysis can profit from repeated starts and fresh approaches and that there is not just a single analysis for a substantial problem. The 16 chapters of this book include the following: some practical philosophy for data analysis; a background for simple linear regression; the nature and importance of re-expression; a method of direct assessment; the direct and flexible approach to 2 way tables; a review of resistant/robust techniques in the simpler applications; standardization; regression and regression coefficients; a mathematical approach to understanding regression; guided regression and examining regression residuals. Among the special features of this volume are the following: an introduction to stem and leaf displays; use of running medians for smoothing; the ladder of re-expression for straightening curves; methods of re-expression for analysis; special tables to make re-expression easy in hand calculations; robust and resistant measures of location and scale; and regression with errors of measurement.

1,430 citations


Journal ArticleDOI
Ronald D. Snee1
TL;DR: It is concluded that data splitting is an effective method of model validation when it is not practical to collect new data to test the model.
Abstract: Methods to determine the validity of regression models include comparison of model predictions and coefficients with theory, collection of new data to check model predictions. comparison of results with theoretical model calculations, and data splitting or cross-validation in which a portion of the data is used to estimate the model coefficients, and the remainder of the data is used to measure the prediction accuracy of the model. An expository review of these methods is presented. It is concluded that data splitting is an effective method of model validation when it is not practical to collect new data to test the model. The DUPLEX algorithm, developed by R. W. Kennard, is recommended for dividing the data into the estimation set and prediction set when there is no obvious variable such as time to use as a basis to split the data. Several examples are included to illustrate the various methods of model validation.

1,165 citations


Journal ArticleDOI
TL;DR: In this paper, the authors compared OREG with 56 alternatives which pull some or all of the estimated regression coefficients some of the way to zero and showed that OREG outperforms other alternatives when collinearity effects are present, noncentrality in the original model is small, and selected true regression coefficients are small.
Abstract: Estimated regression coefficients and errors in these estimates are computed for 160 artificial data sets drawn from 160 normal linear models structured according to factorial designs. Ordinary multiple regression (OREG) is compared with 56 alternatives which pull some or all estimated regression coefficients some or all the way to zero. Substantial improvements over OREG are exhibited when collinearity effects are present, noncentrality in the original model is small, and selected true regression coefficients are small. Ridge regression emerges as an important tool, while a Bayesian extension of variable selection proves valuable when the true regression coefficients vary widely in importance.

287 citations


Journal ArticleDOI
TL;DR: The authors investigated the power of two methodologies, the tests of Brown, Durbin and Evans [2] and variable parameter regression, to detect several varieties of instability in the coefficients of a linear regression model.
Abstract: This paper investigates the power of two methodologies, the tests of Brown, Durbin, and Evans [2] and variable parameter regression, to detect several varieties of instability in the coefficients of a linear regression model. The study reported by Khan [10] on the stability of the demand for money is replicated with variable parameter regression, and his results are in part questioned and in part sharpened.

228 citations


Journal ArticleDOI
TL;DR: In this paper, the authors considered the test reset which is intended to detect a nonzero mean of the disturbance in a linear regression model and found that the power of the test may decline as the size of the disturbances increases.
Abstract: This article considers the test reset which is intended to detect a nonzero mean of the disturbance in a linear regression model. Analysis of an approximation to the test statistic's distribution and Monte Carlo experiments reveal that the power of the test may decline as the size of the disturbance mean increases. However, the possibility is remote and declines with increasing sample size. Alternative sets of test variables are considered, and their effect on the power of the test is studied in Monte Carlo experiments. The best set seems to be composed of powers of the explanatory variables.

200 citations


Posted Content
TL;DR: In this article, the user of linear multiple regression is provided with a battery of diagnostic tools to determine which data points have high leverage or influence on the estimation process and how these possibly discrepant data points differ from the patterns set by the majority of the data.
Abstract: This paper attempts to provide the user of linear multiple regression with a battery of diagnostic tools to determine which, if any, data points have high leverage or influence on the estimation process and how these possibly discrepant data points differ from the patterns set by the majority of the data The point of view taken is that when diagnostics indicate the presence of anomolous data, the choice is open as to whether these data are in fact unusual and helpful, or possibly harmful and thus in need of modifications or deletion The methodology developed depends on differences, derivatives, and decompositions of basic regression statistics There is also a discussion of how these techniques can be used with robust and ridge estimators An example is given showing the use of diagnostic methods in the estimation of a cross-country savings rate model

138 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proposed the minimization of the sum of relative errors (MSRE) as an alternative criterion to the minimum of squared error (MSSE) and minimum of absolute error (MSAE).
Abstract: When linear regression is used for prediction purposes, the minimization of the sum of relative errors (MSRE) is proposed as an alternative criterion to the minimization of the sum of squared errors (MSSE) and the minimization of the sum of absolute errors (MSAE). The problem is formulated as a linear programming problem and a solution procedure is given. The problem of subset selection with the MSRE criterion is also considered and results illustrated with an example.

96 citations


Journal ArticleDOI
TL;DR: In this paper, the authors proved the asymptotic linearity of the regression parameter of a class of linear rank statistics when errors in the regression model are strictly stationary and strongly mixing.
Abstract: This paper proves the asymptotic linearity in the regression parameter of a class of linear rank statistics when errors in the regression model are strictly stationary and strongly mixing. Besides this, several other weak convergence results are proved which yield the asymptotic normality of $L$ and $M$ estimators of the regression parameter under the above dependent structure. All these results are useful in studying the effect of the above dependence on the asymptotic behavior of $R, M$ and $L$ estimators vis-a-vis the least squares estimator. An example of linear model with Gaussian errors is given where it is shown that the asymptotic efficiency of certain classes of $R, M$ and $L$ estimators relative to the least squared estimator is greater than or equal to its value under the usual independent errors model.

54 citations


Journal ArticleDOI
TL;DR: A regression analysis is performed with data on spinal cord injuries to demonstrate the benefits of determining which, if any, multicollinearities are present in prediction data.
Abstract: Summary In this paper a regression analysis is performed with data on spinal cord injuries in order to demonstrate the benefits of determining which, if any, multicollinearities are present in prediction data. Existing muilticollinearities are shown to be useful both in determining characteristics of the sampled population as well as explaining possible erratic behavior of variable selection procedures. Latent root regression is performed on the data to illustrate one method of using biased regression techniques to incorporate knowledge of multicollinearities in developing prediction equations. Medical experimentation frequently involves the collection of a large amount of data and the utilization of that data to predict one or more response variables. In this process the data analyst commonly employs a linear regression model. The adequacy of the assumed model (or a reduced one) is investigated with summary statistics such as SSE, R2, or residual plots. Note that these statistics reflect how well the fitted model estimates the observed sample but not necessarily the adequacy of prediction or the validity of the model for a population of values. The predictor variables themselves are often ignored in any assessment of the fitted model. In p articular interrelationships among the predictor variables are rarely investigated by the typical data analyst. Yet existing interrelationships can severely restrict the effective use of the prediction equation since the fitted model may only be an adequate predictor for limited regions of the predictor variables. This can occur-and go unnoticed-even in conjunction with a large R2 for the fitted model. The purpose of this article is to demonstrate through the analysis of a set of data on spinal cord injuries the benefits that can be derived from an examination of linear relationships (multicollinearities) among predictor variables. The source of the data is the National Spinal Cord Injury Registry which was organized at the Medical University of South Carolina in order to collect data on spinal cord injuries. One of the goals of this data collection is to provide information to physicians regarding potential gains in patient mobility following spinal cord injuries. We will analyze a small portion of the available data, not with the goal of determining a fitted model to be used as a final solution to the problem, I Part of this research was conducted while the second author was in the Biomnetry D)epartment.,

Journal ArticleDOI
TL;DR: In this article, alternative variable selection procedures and their implications for principal components regression are investigated and compared for a multiple linear regression analysis with principal component regression (PCR), and the results show that the results of PCR have important effects on the quality of least squares parameter estimates.
Abstract: Multicollinearity or near exact linear dependence among the vectors of regressor variables in a multiple linear regression analysis can have important effects on the quality of least squares parameter estimates. One frequently suggested approach for these problems is principal components regression. This paper investigates alternative variable selection procedures and their implications for such an analysis.

Journal ArticleDOI
TL;DR: A simple generalization of the usual ridge regression estimator for the linear regression model is given in this article, which avoids the need to center all variables and is proved to be location invariant.
Abstract: A simple generalization of the usual ridge regression estimator for the linear regression model is given which avoids the need to center all variables. The estimator is proved to be location invariant. This estimator is of pedagogical interest and in forecasting also of practical importance.

Journal ArticleDOI
TL;DR: In this paper, the applications of Gabriel's procedures to multivariate linear regression are presented and illustrated as generalizations of Aitkin's technique, which is used in the MANOVA context.
Abstract: SUMMARY Simultaneous procedures for variable selection in multiple linear regression have recently been given by Aitkin. One of these procedures, proposed for the case when the regression equation is to be used for descriptive purposes, is an application of each of a number of simultaneous procedures concerned with the multivariate general linear model and given by Gabriel with applications in the MANOVA context. The applications of Gabriel's procedures to multivariate linear regression are presented here and illustrated as generalizations of Aitkin's technique.

Journal ArticleDOI
TL;DR: In this paper, the authors present a numerical criterion to help the experimenter evaluate the adequacy of the regression model, in light of both the range of values to be estimated by the equation and the size of the error term.
Abstract: A decision rule often used for the acceptance or rejection of a fitted regression equation is whether or not the regression F ratio exceeds the critical F value. In fact, all this tells us is whether the fitted equation is better than the mean as a predictor. Many times the experimenter's primary interest is how well the fitted equation represents the true model. This paper presents a numerical criterion to help the experimenter evaluate the adequacy of the regression model, in light of both the range of values to be estimated by the equation and the size of the error term. This criterion can be tested utilizing the ordinary regression F ratio but referring to special critical values. An example from the rubber industry illustrates the use of the criterion.


Journal ArticleDOI
Rudoph Dutter1
TL;DR: The algorithms described here are modified versions of the “sophisticated method” given by Huber (1973, [8]) which sometimes fail to converge and convergence proofs are given.
Abstract: Several iterative procedures have been proposed and developed to solve numerically the problem of robust regression, in particular, of robust linear regression The algorithms described here are modified versions of the “sophisticated method” given by Huber (1973, [8]) which sometimes fail to converge In this paper, the new algorithms are formulated and convergence proofs are given The behavior of the procedures is illustrated by a numerical example and is compared to another (“simple”) algorithm

Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of choosing the best subset of independent variables to predict a future value of the dependent variable, and to control the level of a dependent variable at a preassigned value.
Abstract: SIUMMARY A Bayesian decision theory approach to the choice of regression design is considered when it is intended to use the regression to help control the dependent variable at a chosen value, the control to be effected by choosing certain of the independent variables and fixing them at selected values. Some key word8: Bayesian decision theory; Control; D-optimum; Linear regression; Preposterior analysis; Regression design. 1. INTRODIUCTION This paper is concerned with the design of a regression experiment in which observations of a dependent variable are obtained at selected values of a set of independent variables. Using Bayesian decision theory, Lindley (1968) considered the analysis of data from a regression experiment in two situations, namely when it is desired to choose the best subset of independent variables (a) to predict a future value of the dependent variable, (b) to control the level of the dependent variable at a preassigned value. The first situation he termed a 'prediction problem' and the second a 'control problem'. The optimal choice of a designed regression experiment for Lindley's approach to the prediction problem was considered by Brooks (1972). In the present paper, the optimal choice of designed experiment for the control problem is considered, whe-n the regression is linear in the independent variables. To do this, a decision-theoretic analysis of the following sequence of decisions and events is considered. First, the size n of the experiment is decided, and then n sets of values X1, ..., X,

Journal ArticleDOI
TL;DR: In this paper, the unknown constants in a multiple linear regression model under the minimum sum of weighted absolute errors (MSWAE) were estimated by a generalization of an earlier algorithm, which is compared to a bounded variable algorithm.
Abstract: We propose an algorithm to estimate the unknown constants in a multiple linear regression model under the minimum sum of weighted absolute errors (MSWAE). The proposed algorithm, a generalization of an earlier algorithm, is compared to a bounded variable algorithm. Some somputational experience is reported.

Journal ArticleDOI
TL;DR: In this article, a special purpose linear programming algorithm for obtaining least-absolute value estimators in a linear model with dummy variables is presented, which employs a compact basis inverse procedure and incorporates the advanced basis exchange techniques available in specialized algorithms for the general linear least absolute value problem.
Abstract: Dummy (0, 1) variables are frequently used in statistical modeling to represent the effect of certain extraneous factors. This paper presents a special purpose linear programming algorithm for obtaining least-absolute-value estimators in a linear model with dummy variables. The algorithm employs a compact basis inverse procedure and incorporates the advanced basis exchange techniques available in specialized algorithms for the general linear least-absolute-value problem. Computational results with a computer code version of the algorithm are given.

Journal ArticleDOI
K.C. Yeh1
TL;DR: A new method is presented for the determination of kinetic parameters based on a functional relationship among experimental data derived from the postulated model, which does not require initial estimates or repetitive iteration for linear systems and can be applied to nonlinear models.


Journal ArticleDOI
TL;DR: The semistandardized regression coefficient is presented for combining unstandardized and standardized variables into a single regression equation.
Abstract: The semistandardized regression coefficient is presented for combining unstandardized and standardized variables into a single regression equation Equations, interpretations, and an example are given for applying the coefficients to multivariate data

01 Sep 1977
TL;DR: In this paper, sensitivity coefficients are proposed to measure the effects of these errors and show that they can be easily computed from quantities ordinarily calculated in performing the regression, which can be used to estimate sensitivity coefficients.
Abstract: : This paper is concerned with errors in the observed values of the independent variables of a linear regression. Sensitivity coefficients are proposed to measure the effects of these errors and show that they can easily be computed from quantities ordinarily calculated in performing the regression.

Journal ArticleDOI
TL;DR: The Bartels-Conn algorithm for least absolute value regression was shown to be able to skip across points at which the conventional simplex-method algorithms for LAV regression would be required to carry out pivot operations as mentioned in this paper.
Abstract: The Barrodale and Roberts algorithm for least absolute value (LAV) regression and the algorithm proposed by Bartels and Conn both have the advantage that they are often able to skip across points at which the conventional simplex-method algorithms for LAV regression would be required to carry out an (expensive) pivot operation. We indicate here that this advantage holds in the Bartels-Conn approach for a wider class of problems: the minimization of piecewise linear functions. We show how LAV regression, restricted LAV regression, general linear programming and least maximum absolute value regression can all be easily expressed as piecewise linear minimization problems.

Journal ArticleDOI
TL;DR: In this paper, a procedure that utilizes the sample multiple correlation to form a lower bound for the level of predictive precision of a fitted regression equation is suggested, which yields probability statements which are true at least 100(1−α)% of the time.
Abstract: A procedure that utilizes the sample multiple correlation to form a lower bound for the level of predictive precision of a fitted regression equation is suggested. The procedure is shown to yield probability statements which are true at least 100(1−α)% of the time.

Journal ArticleDOI
TL;DR: In this article, the effect of applying classical linear regression when measurement error is present in x and the appropriate model is functional linear regression is examined and the bias in the classical slop emate is characterized in terms of a single parameter, τ.
Abstract: We examine the effect of applying classical linear regression when measurement error is present in x and the appropriate model is functional linear regression. The bias in the classical slop emate is characterized in terms of a single parameter, τ. An unbiased estimator , is obtained assuming a prior estimate is available of $sigma;xA confidenceinterval for x is derived which is also a confidence interval for the noncentrality parameter of th noncentral F. Then the jackknife method is used to set confidencs limits for the true slope.Results are applied to comparison of assay methods in the clinical chemistry laboratory.



Journal ArticleDOI
TL;DR: The problem of constructing a confidence region, for the expected value of a response vector corresponding to a regressor vector, in a multivariate multiple regression model is considered in this paper.
Abstract: The problem of constructing a confidence region, for the expected value of a response vector yo corresponding to a regressor vector xo, in a multivariate multiple regression model is considered. The prediction region for yo is also presented. The corres..