scispace - formally typeset
Search or ask a question

Showing papers on "Proper linear model published in 1996"


Proceedings Article
03 Dec 1996
TL;DR: This work compares support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space and expects that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.
Abstract: A new regression technique based on Vapnik's concept of support vectors is introduced. We compare support vector regression (SVR) with a committee regression technique (bagging) based on regression trees and ridge regression done in feature space. On the basis of these experiments, it is expected that SVR will have advantages in high dimensionality space because SVR optimization does not depend on the dimensionality of the input space.

4,009 citations


Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of combining a collection of general regression fit vectors to obtain a better predictive model and develop a general framework for this problem and examine a cross-validation-based proposal called "model mix" or "stacking" in this context.
Abstract: We consider the problem of how to combine a collection of general regression fit vectors to obtain a better predictive model. The individual fits may be from subset linear regression, ridge regression, or something more complex like a neural network. We develop a general framework for this problem and examine a cross-validation—based proposal called “model mix” or “stacking” in this context. We also derive combination methods based on the bootstrap and analytic methods and compare them in examples. Finally, we apply these ideas to classification problems where the estimated combination weights can yield insight into the structure of the problem.

318 citations



Journal ArticleDOI
TL;DR: In this article, the authors argue that orthogonal regression is often misused in errors-in-variables linear regression because of a failure to account for equation errors, and that the typical result is to overcorrect for measurement error, that is, overestimate the slope, because equation error is ignored.
Abstract: Orthogonal regression is one of the standard linear regression methods to correct for the effects of measurement error in predictors. We argue that orthogonal regression is often misused in errors-in-variables linear regression because of a failure to account for equation errors. The typical result is to overcorrect for measurement error, that is, overestimate the slope, because equation error is ignored. The use of orthogonal regression must include a careful assessment of equation error, and not merely the usual (often informal) estimation of the ratio of measurement error variances. There are rarer instances, for example, an example from geology discussed here, where the use of orthogonal regression without proper attention to modeling may lead to either overcorrection or undercorrection, depending on the relative sizes of the variances involved. Thus our main point, which does not seem to be widely appreciated, is that orthogonal regression, just like any measurement error analysis, requires ...

233 citations


Journal ArticleDOI
TL;DR: It turns out that statistical linear regression is superior to fuzzy linear regression in terms of predictive capability, whereas their comparative descriptive performance depends on various factors associated with the data set and proper specificity of the model.

182 citations


Journal ArticleDOI
TL;DR: In this paper, the authors investigate whether seasonal adjustment procedures are, at least approximately, linear data transformations and define a set of properties for the adequacy of a linear approximation to a seasonal-adjustment filter.
Abstract: We investigate whether seasonal-adjustment procedures are, at least approximately, linear data transformations. This question was initially addressed by Young and is important with respect to many issues including estimation of regression models with seasonally adjusted data. We focus on the X-11 program and rely on simulation evidence, involving linear unobserved component autoregressive integrated moving average models. We define a set of properties for the adequacy of a linear approximation to a seasonal-adjustment filter. These properties are examined through statistical tests. Next, we study the effect of X-11 seasonal adjustment on regression statistics assessing the statistical significance of the relationship between economic variables. Several empirical results involving economic data are also reported.

88 citations


Journal ArticleDOI
TL;DR: Geometry is a very useful tool for illustrating regression analysis as mentioned in this paper, however, despite its merits the geometric approach is seldom used, and one reason for this might be that there are very few applications at an elementary level.
Abstract: Geometry is a very useful tool for illustrating regression analysis. Despite its merits the geometric approach is seldom used. One reason for this might be that there are very few applications at an elementary level. This article gives a brief introduction to the geometric approach in regression analysis, and then geometry is used to shed some light on the problem of comparing the “importance” of the independent variables in a multiple regression model. Even though no final answer of how to assess variable importance is given, it is still useful to illustrate the different measures geometrically to gain a better understanding of their properties.

74 citations


01 Mar 1996
TL;DR: The authors proposed a modified linear regression-based case-weighting which ensures positive weights via use of a ridging procedure, and model misspecification robustness via the inclusion of a nonparametric regression bias correction factor.
Abstract: Case-weighting or assigning a unique weight to each sample unit is a popular method of sample weighting when internal consistency of the survey estimates is paramount. If in addition external constraints on key variables (the survey benchmarks) must also be met, then case-weights computed via generalised least squares, based on an assumed linear regression model for the survey variables, can be used. Unfortunately, this method of weighting can lead to negative case-weights. It is also susceptible to bias if the linear model is misspecified. This article proposes a modified method of linear regression-based case-weighting which ensures positive weights via use of a ridging procedure, and model misspecification robustness via the inclusion of a nonparametric regression bias correction factor. Empirical results which illustrate the gains from the new method of weighting are presented

73 citations


Journal ArticleDOI
TL;DR: In this paper, the authors focus on simple linear regression parameters: the values of the slope (a) and the y-intercept (b), and define four possible outcomes which allow, at first, to define the quality of a simulation without considering the coefficient of determination, r2.

73 citations


Journal ArticleDOI
TL;DR: The purpose of this paper is to examine the strengths and weaknesses of this formulation, suggest possible improvements, and discuss fundamental questions about fuzzy linear regression.

67 citations


Journal ArticleDOI
TL;DR: A simulation study of logistic regression in which hierarchical regression fitted by a two-stage procedure to ordinary maximum likelihood is compared, indicating that hierarchical modelling of continuous covariates offers worthwhile improvement over ordinary maximum-likelihood.
Abstract: Hierarchical regression – which attempts to improve standard regression estimates by adding a second-stage ‘prior’ regression to an ordinary model – provides a practical approach to evaluating multiple exposures. We present here a simulation study of logistic regression in which we compare hierarchical regression fitted by a two-stage procedure to ordinary maximum likelihood. The simulations were based on case-control data on diet and breast cancer, where the hierarchical model uses a second-stage regression to pull conventional dietary-item estimates toward each other when they have similar levels of food constituents. Our results indicate that hierarchical modelling of continuous covariates offers worthwhile improvement over ordinary maximum-likelihood, provided one does not underspecify the second-stage standard deviations.

Journal ArticleDOI
TL;DR: In this paper, a case study involving the optimization of the sealing function of an automobile door weatherstrip is modelled using multiple regression models and it is demonstrated that the experimental statistics which are needed to calculate signal-to-noise performance statistics may be readily obtained with the use of a linear statistical model.
Abstract: The data from a case study involving the optimization of the sealing function of an automobile door weatherstrip is modelled using multiple regression models. It is demonstrated that the experimental statistics which are needed to calculate signal-to-noise performance statistics may be readily obtained with the use of a linear statistical model.

Journal ArticleDOI
TL;DR: In this article, the problem of simultaneous prediction of actual and average values of the study variable in a linear regression model when a set of linear restrictions binding the regression coefficients is available, and analyzes the performance properties of predictors arising from the methods of restricted regression and mixed regression besides least squares is considered.
Abstract: This article considers the problem of simultaneous prediction of actual and average values of the study variable in a linear regression model when a set of linear restrictions binding the regression coefficients is available, and analyzes the performance properties of predictors arising from the methods of restricted regression and mixed regression besides least squares.

Journal ArticleDOI
Qi Li1
TL;DR: In this paper, it was shown that a √ n-consistent estimator for the coefficients of the parametric part of the regression function can be obtained by using a non-negative second-order kernel function as long as the dimension of the variables in the non-parametric part is less than or equal to, five.

Book ChapterDOI
01 Jan 1996
TL;DR: In this article, the authors introduce a new approach for properly scaling redescending M-estimating equations and for obtaining high breakdown point solutions to the equations by the introduction of the constrained m-estimates of regression, or the CM-estimate of regression for short.
Abstract: When using redescending M-estimates of regression, one must choose not only an estimate of scale, but since the redescending M-estimating equations may admit multiple solutions, of which all of them may not be a desired solution, one must also have a method for choosing a desirable solution to the estimating equations. We introduce here a new approach for properly scaling redescending M-estimating equations and for obtaining high breakdown point solutions to the equations by the introduction of the constrained M-estimates of regression, or the CM-estimates of regression for short. Unlike the S-estimates of regression, the CM-estimates of regression can be tuned to obtain good local robustness properties while maintaining a breakdown point of 1/2.

Book
12 Nov 1996
TL;DR: Describing Patterns in Data: Organizing Data: Association and Relationships and Analyzing Count Data.
Abstract: Describing Patterns in Data. Organizing Data: Association and Relationships. Collecting Data. Probability. Random Variables and Probability Distributions. Continuous Random Variables and Sampling Distributions. From Samples to Populations: Inferences About Means. Comparing Means. Analyzing Count Data. Simple Linear Regression. Multiple Linear Regression and Time Series Models. Management and Statistics. Appendices. Answers to Selected Exercises. Index.

Journal ArticleDOI
TL;DR: In this paper, the Bahadur-Kiefer representation of θn is given for each p ≥ 1, and the Lp estimator θ0 is the value 6n such that
Abstract: We consider the following linear regression model:where are independent and identically distributed random variables, Yi, is real, Zi has values in Rm, Ui, is independent of Zi, and θ0 is an m-dimensional parameter to be estimated. The Lp estimator of θ0 is the value 6n such thatHere, we will give the exact Bahadur-Kiefer representation of θn, for each p ≥ 1. Explicitly, we will see that, under regularity conditions,where and c is a positive constant, which depends on p and on the random variable X.

Journal ArticleDOI
TL;DR: In this article, the idea of using regression quantiles to test symmetry in a linear regression model is generalized to the non-parametric regression setting, and properties of the L p -quantiles, defined through an asymmetric L p −loss function, are derived.

Journal ArticleDOI
TL;DR: In this paper, the authors study the properties of the preliminary test, restricted and unrestricted ridge regression estimators of the linear regression model with non-normal disturbances, and derive the biases and the mean square error (MSE) of the estimators under the null and alternative hypotheses and compared with the usual estimators.
Abstract: In this paper, we study the properties of the preliminary test, restricted and unrestricted ridge regression estimators of the linear regression model with non-normal disturbances. We present the estimators of the regression coefficients combining the idea of preliminary test and ridge regression methodology, when it is suspected that the regression coefficients may be restricted to a subspace and the regression error is distributed as multivariate t. Accordingly we consider three estimators, namely the Unrestricted Ridge Regression Estimator (URRRE), the Restricted Ridge Regression Estimator (RRRE) and finally the Preliminary test Ridge Regression Estimator (PTRRE). The biases and the mean square error (MSE) of the estimators are derived under the null and alternative hypotheses and compared with the usual estimators. By studying the MSE criterion, the regions of optimahty of the estimators are determined.

Journal ArticleDOI
TL;DR: In this article, the authors considered the general linear regression model (y,Xβ,V|R2β2 = r) where the block partitioned regressor matrix X = (X1 X2) may be deficient in column rank, the dispersion matrix V is possibly singular, βt = (βt 1 βt2) is the vector of unknown regression coefficients, and β2 is possibly subject to consistent linear constraints R2β 2 = r.

Journal ArticleDOI
TL;DR: It is shown that dominant component analysis and the standard multiple linear regression method are directly related to each other and it is demonstrated that an earlier proposed iterative procedure for the orthogonalization of a correlated variable can be efficiently replaced by one step regression.
Abstract: Several topics in connection with a recently proposed method for the orthogonalization of predictor variables (dominant component analysis) are considered. Applying the sequential regression procedure, it is shown that dominant component analysis and the standard multiple linear regression method are directly related to each other. In addition, it is demonstrated that an earlier proposed iterative procedure for the orthogonalization of a correlated variable can be efficiently replaced by one step regression. It is also shown that the coefficient of determination for an orthogonal descriptor coincides with the corresponding squared semipartial correlation coefficient. Finally, the origin of extra information in an orthogonalized predictor variable is discussed.

Journal ArticleDOI
TL;DR: In this article, robustness properties of the ML-II posterior mean are studied under the assumption of a mixture of g-prior distributions for the parameters and ML-I posterior density for the coefficient vector.

01 Feb 1996
TL;DR: Local polynomial regression as discussed by the authors is proving to be a particularly simple and eective method of nonparametric regression, which can be used to estimate the distribution of airborne mercury about an incinerator using biomonitoring data.
Abstract: Nonparametric regression estimates a conditional expectation of a response given a predictor variable without requiring parametric assumptions about this conditional expectation. There are many methods of nonparametric regression including kernel estimation, smoothing splines, regression splines, and orthogonal series. Local regression ts parametric models locally by using kernel weights. Local regression is proving to be a particularly simple and eeective method of nonparametric regression. This talk reviews recent work on local polynomial regression including estimation of derivatives, multivariate predictors, and bandwidth selection. Three applications to environmental science are discussed: 1. Estimation of the distribution of airborne mercury about an incinerator using biomonitoring data. 2. Estimation of airborne pollutants from LIDAR (LIght Detection And Ranging) data. Because of substantial heteroskedasticity, this example requires estimation of the conditional variance function as well as the conditional expectation function. 3. Estimation of gradients from elevation data. The estimated gradients are used in a model to predict soil movement during earthquakes. Data from Noshiro, Japan are used. The rst and third examples use two-dimensional spatial data. Though these problems could be analyzed by geostatistics, local polynomial regression has the advantage of modeling the heteroskedasticity in the rst and second examples and being able to estimate gradients in the third.

Journal ArticleDOI
TL;DR: An adaptive fuzzy rule-based framework is described so that the model can be used under climate change and performs better than the regression model and has potential for monthly precipitation forecasting.
Abstract: In order to link the monthly areal precipitation to large-scale circulation patterns, a fuzzy indexing technique is used in conjunction with a fuzzy rule-based technique and also a standard linear regression. After clustering the lag-correlation centers, fuzziness is introduced, and several representative indices of the monthly areal precipitation in Arizona are calculated and interpreted. The relation between the indices and the precipitation is analyzed to develop the fuzzy model and then a multivariate linear regression model. To measure the forecasting capability of the models, the data are divided into a calibration period (1947–79) and a validation period (1980–1988). A comparison of the results shows that the fuzzy rule-based model performs better than the regression model and has potential for monthly precipitation forecasting. Moreover, an adaptive fuzzy rule-based framework is described so that the model can be used under climate change.

Journal ArticleDOI
TL;DR: In this article, an extensive simulation study to choose the proper penalty function, by using different models and using different error random variables, was conducted. But the penalty function selection was not discussed.

Journal ArticleDOI
TL;DR: In this paper, a series of computer simulations were used to demonstrate the need to employ appropriate statistical weighting factors in carrying out regression analysis of experimental data, and were analyzed using unweighted and weighted, nonlinear and linear regression methods.
Abstract: Henry's law constants, H, of volatile organic compounds are often determined using static headspace gas chromatography. From the methodology of this approach, the experimental data are expected to conform to a nonlinear function and can, accordingly, be analyzed by a nonlinear regression procedure. Alternatively, the data can be transformed to a linear function and subsequently analyzed by linear regression. A series of computer simulations was used to demonstrate the need to employ appropriate statistical weighting factors in carrying out regression analysis of experimental data. These simulations were based on error-dispersed, realistic data sets for a wide range of H input values, and were analyzed using unweighted and weighted, nonlinear and linear regression methods. For a given data set, nonlinear and linear regression analyses, when both unweighted, return different H values. This disparity is sharply reduced when weighted regression methods are used. The regression results for H (and standard devi...

Journal ArticleDOI
TL;DR: This article extended the scope of the corrected-score method studied by Nakamura and Stefanski (1989) to a large class of generalized linear measurement error models, including rare-event logistic regression and extreme-value binary regression.

Journal ArticleDOI
TL;DR: A general method for use in the chemical industry for eliciting and quantifying an expert's subjective opinion concerning a normal linear regression model and using the elicited values to determine a probability distribution on the regression parameters that quantifies and expresses the expert's opinions.

Journal ArticleDOI
TL;DR: In this article, the authors examine several techniques that have been developed to ameliorate the effects of colinearity or to make use of prior information in satellite meteorology.
Abstract: Least squares or regression techniques have been used for many problems in satellite meteorology. Because of the large number of variables and the linear dependence among these variables, colinearity causes significant problems in the application of standard regression techniques. In some of the applications there is prior knowledge about the values of the regression parameters. Since there are errors in the predictor variables as well as the predictand variables, the standard assumptions for ordinary least squares are not valid. In this paper the authors examine several techniques that have been developed to ameliorate the effects of colinearity or to make use of prior information. These include ridge regression, shrinkage estimators, rotated regression, and orthogonal regression. In order to illustrate the techniques and their properties, the authors apply them to two simple examples. These techniques are then applied to a real problem in satellite meteorology: that of estimating theoretical co...

Journal ArticleDOI
Oliver Linton1
TL;DR: In this article, the second moments of the truncated expansion were calculated to compare two competing estimators and to define a method of bandwidth choice for linear regression models with heteroskedasticity of unknown form.
Abstract: We develop stochastic expansions with remainder , where 0<μ<1/2, for a standardised semiparametric GLS estimator, a standard error, and a studentized statistic, in the linear regression model with heteroskedasticity of unknown form. We calculate the second moments of the truncated expansion, and use these approximations to compare two competing estimators and to define a method of bandwidth choice.