Showing papers on "Proper linear model published in 2007"

PDF

Open Access

Journal Article•DOI•

Relative importance for linear regression in R: The package relaimpo

[...]

01 Jan 2007-Journal of Statistical Software

TL;DR: A brief tutorial introduction to the R package relaimpo, which implements six different metrics for assessing relative importance of regressors in the linear model, and a newly proposed metric (Feldman 2005) called pmvd.

...read moreread less

Abstract: Relative importance is a topic that has seen a lot of interest in recent years, particularly in applied work. The R package relaimpo implements six different metrics for assessing relative importance of regressors in the linear model, two of which are recommended - averaging over orderings of regressors and a newly proposed metric (Feldman 2005) called pmvd. Apart from delivering the metrics themselves, relaimpo also provides (exploratory) bootstrap confidence intervals. This paper offers a brief tutorial introduction to the package. The methods and relaimpo's functionality are illustrated using the data set swiss that is generally available in R. The paper targets readers who have a basic understanding of multiple linear regression. For the background of more advanced aspects, references are provided.

...read moreread less

1,908 citations

Journal Article•DOI•

Extending the Linear Model With R: Generalized Linear, Mixed Effects and Nonparametric Regression Models

[...]

Colin M. Gallagher

01 Dec 2007-Journal of the American Statistical Association

579 citations

Book•

Modern Methods for Robust Regression

[...]

Robert Andersen

06 Sep 2007

TL;DR: In this article, the authors define robustness as resistance and resistance to OLS estimates, and define robust regression for the linear model L-Estimators R-EstIMators M-Estimates GM-Estimate S-Estimation S- Estimate Generalized S-Evalator MM-Estime Comparing the various estimators Diagnostics Revisited: Robust Regression-Related Methods for Detecting Outliers.

...read moreread less

Abstract: List of Figures List of Tables Series Editor's Introduction Acknowledgments 1. Introduction Defining Robustness Defining Robust Regression A Real-World Example: Coital Frequency of Married Couples in the 1970s 2. Important Background Bias and Consistency Breakdown Point Influence Function Relative Efficiency Measures of Location Measures of Scale M-Estimation Comparing Various Estimates Notes 3. Robustness, Resistance, and Ordinary Least Squares Regression Ordinary Least Squares Regression Implications of Unusual Cases for OLS Estimates and Standard Errors Detecting Problematic Observations in OLS Regression Notes 4. Robust Regression for the Linear Model L-Estimators R-Estimators M-Estimators GM-Estimators S-Estimators Generalized S-Estimators MM-Estimators Comparing the Various Estimators Diagnostics Revisited: Robust Regression-Related Methods for Detecting Outliers Notes 5. Standard Errors for Robust Regression Asymptotic Standard Errors for Robust Regression Estimators Bootstrapped Standard Errors Notes 6. Influential Cases in Generalized Linear Models The Generalized Linear Model Detecting Unusual Cases in Generalized Linear Models Robust Generalized Linear Models Notes 7. Conclusions Appendix: Software Considerations for Robust Regression References Index About the Author

...read moreread less

322 citations

Journal Article•DOI•

A change point method for linear profile data

[...]

Mahmoud A. Mahmoud¹, Peter A. Parker², William H. Woodall³, Douglas M. Hawkins⁴•Institutions (4)

Cairo University¹, Langley Research Center², Virginia Tech³, University of Minnesota⁴

01 Mar 2007-Quality and Reliability Engineering International

TL;DR: A change point approach based on the segmented regression technique for testing the constancy of the regression parameters in a linear profile data set using a data set from a calibration application at the National Aeronautics and Space Administration (NASA) Langley Research Center.

...read moreread less

Abstract: We propose a change point approach based on the segmented regression technique for testing the constancy of the regression parameters in a linear profile data set. Each sample collected over time in the historical data set consists of several bivariate observations for which a simple linear regression model is appropriate. The change point approach is based on the likelihood ratio test for a change in one or more regression parameters. We compare the performance of this method to that of the most effective Phase I linear profile control chart approaches using a simulation study. The advantages of the change point method over the existing methods are greatly improved detection of sustained step changes in the process parameters and improved diagnostic tools to determine the sources of profile variation and the location(s) of the change point(s). Also, we give an approximation for appropriate thresholds for the test statistic. The use of the change point method is demonstrated using a data set from a calibration application at the National Aeronautics and Space Administration (NASA) Langley Research Center. Copyright © 2006 John Wiley & Sons, Ltd.

...read moreread less

297 citations

Proceedings Article•DOI•

Least squares linear discriminant analysis

[...]

Jieping Ye¹•Institutions (1)

Arizona State University¹

20 Jun 2007

TL;DR: The equivalence relationship between the proposed least squares formulation and LDA for multi-class classifications is rigorously established under a mild condition, which is shown empirically to hold in many applications involving high-dimensional data.

...read moreread less

Abstract: Linear Discriminant Analysis (LDA) is a well-known method for dimensionality reduction and classification. LDA in the binaryclass case has been shown to be equivalent to linear regression with the class label as the output. This implies that LDA for binary-class classifications can be formulated as a least squares problem. Previous studies have shown certain relationship between multivariate linear regression and LDA for the multi-class case. Many of these studies show that multivariate linear regression with a specific class indicator matrix as the output can be applied as a preprocessing step for LDA. However, directly casting LDA as a least squares problem is challenging for the multi-class case. In this paper, a novel formulation for multivariate linear regression is proposed. The equivalence relationship between the proposed least squares formulation and LDA for multi-class classifications is rigorously established under a mild condition, which is shown empirically to hold in many applications involving high-dimensional data. Several LDA extensions based on the equivalence relationship are discussed.

...read moreread less

291 citations

Journal Article•DOI•

Statistical monitoring of nonlinear product and process quality profiles

[...]

James D. Williams¹, William H. Woodall², Jeffrey B. Birch²•Institutions (2)

General Electric¹, Virginia Tech²

01 Dec 2007-Quality and Reliability Engineering International

TL;DR: The use of the T2 control chart is extended to monitor the coefficients resulting from a parametric nonlinear regression model fit to profile data and three general approaches to the formulation of theT2 statistics and determination of the associated upper control limits for Phase I applications are given.

...read moreread less

Abstract: In many quality control applications, use of a single (or several distinct) quality characteristic(s) is insufficient to characterize the quality of a produced item. In an increasing number of cases, a response curve (profile) is required. Such profiles can frequently be modeled using linear or nonlinear regression models. In recent research others have developed multivariate T2 control charts and other methods for monitoring the coefficients in a simple linear regression model of a profile. However, little work has been done to address the monitoring of profiles that can be represented by a parametric nonlinear regression model. Here we extend the use of the T2 control chart to monitor the coefficients resulting from a parametric nonlinear regression model fit to profile data. We give three general approaches to the formulation of the T2 statistics and determination of the associated upper control limits for Phase I applications. We also consider the use of non-parametric regression methods and the use of metrics to measure deviations from a baseline profile. These approaches are illustrated using the vertical board density profile data presented in Walker and Wright (Comparing curves using additive models. Journal of Quality Technology 2002; 34:118–129). Copyright © 2007 John Wiley & Sons, Ltd.

...read moreread less

285 citations

Book•

Linear Models and Generalizations: Least Squares and Alternatives

[...]

C. Radhakrishna Rao, Helge Toutenburg, Shalabh, Christian Heumann

10 Dec 2007

TL;DR: The Simple Linear Regression Model and its Extensions as discussed by the authors and the Generalized Linear regression model are two popular models for categorical response variables. But they are not suitable for the analysis of incomplete data sets.

...read moreread less

Abstract: The Simple Linear Regression Model.- The Multiple Linear Regression Model and Its Extensions.- The Generalized Linear Regression Model.- Exact and Stochastic Linear Restrictions.- Prediction in the Generalized Regression Model.- Sensitivity Analysis.- Analysis of Incomplete Data Sets.- Robust Regression.- Models for Categorical Response Variables.

...read moreread less

268 citations

Journal Article•DOI•

Diagnostic Tools and a Remedial Method for Collinearity in Geographically Weighted Regression

[...]

David C. Wheeler¹•Institutions (1)

Emory University¹

01 Oct 2007-Environment and Planning A

TL;DR: In this article, the authors present diagnostic tools and ridge regression in GWR and demonstrate the utility of these techniques with an example using the Columbu... and integrate ridge regression into GWR to constrain and stabilize regression coefficients and lower prediction error.

...read moreread less

Abstract: Geographically weighted regression (GWR) is drawing attention as a statistical method to estimate regression models with spatially varying relationships between explanatory variables and a response variable. Local collinearity in weighted explanatory variables leads to GWR coefficient estimates that are correlated locally and across space, have inflated variances, and are at times counterintuitive and contradictory in sign to the global regression estimates. The presence of local collinearity in the absence of global collinearity necessitates the use of diagnostic tools in the local regression model building process to highlight areas in which the results are not reliable for statistical inference. The method of ridge regression can also be integrated into the GWR framework to constrain and stabilize regression coefficients and lower prediction error. This paper presents numerous diagnostic tools and ridge regression in GWR and demonstrates the utility of these techniques with an example using the Columbu...

...read moreread less

241 citations

Journal Article•DOI•

On the equivalence of case-crossover and time series methods in environmental epidemiology

[...]

Yun Lu¹, Scott L. Zeger¹•Institutions (1)

Johns Hopkins University¹

01 Apr 2007-Biostatistics

TL;DR: It is shown that case-crossover using conditional logistic regression is a special case of time series analysis when there is a common exposure such as in air pollution studies, and this equivalence provides computational convenience for case- crossover analyses and a better understanding of timeseries models.

...read moreread less

Abstract: The case-crossover design was introduced in epidemiology 15 years ago as a method for studying the effects of a risk factor on a health event using only cases. The idea is to compare a case's exposure immediately prior to or during the case-defining event with that same person's exposure at otherwise similar "reference" times. An alternative approach to the analysis of daily exposure and case-only data is time series analysis. Here, log-linear regression models express the expected total number of events on each day as a function of the exposure level and potential confounding variables. In time series analyses of air pollution, smooth functions of time and weather are the main confounders. Time series and case-crossover methods are often viewed as competing methods. In this paper, we show that case-crossover using conditional logistic regression is a special case of time series analysis when there is a common exposure such as in air pollution studies. This equivalence provides computational convenience for case-crossover analyses and a better understanding of time series models. Time series log-linear regression accounts for overdispersion of the Poisson variance, while case-crossover analyses typically do not. This equivalence also permits model checking for case-crossover data using standard log-linear model diagnostics.

...read moreread less

221 citations

Journal Article•DOI•

Fitting finite mixtures of generalized linear regressions in R

[...]

Bettina Grün¹, Friedrich Leisch²•Institutions (2)

Vienna University of Technology¹, Ludwig Maximilian University of Munich²

01 Jul 2007-Computational Statistics & Data Analysis

TL;DR: The R package flexmix provides flexible modelling of finite mixtures of regression models using the EM algorithm, and several new features of the software such as fixed and nested varying effects for mixture of generalized linear models and multinomial regression for a priori probabilities given concomitant variables are introduced.

...read moreread less

171 citations

Journal Article•DOI•

PLS classification of functional data

[...]

Cristian Preda¹, Gilbert Saporta², Caroline Leveder•Institutions (2)

Lille University of Science and Technology¹, Conservatoire national des arts et métiers²

01 Jul 2007-Computational Statistics

TL;DR: Partial least squares (PLS) as mentioned in this paper was proposed for linear discriminant analysis (LDA) when predictors are data of functional type (curves), based on the equivalence between LDA and the multiple linear regression (binary response) and LDA, and the canonical correlation analysis (more than two groups).

...read moreread less

Abstract: Partial least squares (PLS) approach is proposed for linear discriminant analysis (LDA) when predictors are data of functional type (curves). Based on the equivalence between LDA and the multiple linear regression (binary response) and LDA and the canonical correlation analysis (more than two groups), the PLS regression on functional data is used to estimate the discriminant coefficient functions. A simulation study as well as an application to kneading data compare the PLS model results with those given by other methods.

...read moreread less

Proceedings Article•DOI•

Predictive discrete latent factor models for large scale dyadic data

[...]

Deepak Agarwal¹, Srujana Merugu¹•Institutions (1)

Yahoo!¹

12 Aug 2007

TL;DR: A novel statistical method to predict large scale dyadic response variables in the presence of covariate information that simultaneously incorporates the effect of covariates and estimates local structure that is induced by interactions among the dyads through a discrete latent factor model.

...read moreread less

Abstract: We propose a novel statistical method to predict large scale dyadic response variables in the presence of covariate information. Our approach simultaneously incorporates the effect of covariates and estimates local structure that is induced by interactions among the dyads through a discrete latent factor model. The discovered latent factors provide a redictive model that is both accurate and interpretable. We illustrate our method by working in a framework of generalized linear models, which include commonly used regression techniques like linear regression, logistic regression and Poisson regression as special cases. We also provide scalable generalized EM-based algorithms for model fitting using both "hard" and "soft" cluster assignments. We demonstrate the generality and efficacy of our approach through large scale simulation studies and analysis of datasets obtained from certain real-world movie recommendation and internet advertising applications.

...read moreread less

Journal Article•DOI•

On rates of convergence in functional linear regression

[...]

Yehua Li¹, Tailen Hsing²•Institutions (2)

University of Georgia¹, Ohio State University²

01 Oct 2007-Journal of Multivariate Analysis

TL;DR: In this paper, the authors investigated the rate of convergence of estimating the regression weight function in a functional linear regression model, where the predictor and the weight function are smooth and periodic in the sense that the derivatives are equal at the boundary points.

...read moreread less

Journal Article•DOI•

Multiple regression with fuzzy data

[...]

Andrzej Bargiela¹, Witold Pedrycz², Tomoharu Nakashima³•Institutions (3)

University of Nottingham¹, University of Alberta², Osaka Prefecture University³

01 Oct 2007-Fuzzy Sets and Systems

TL;DR: An iterative algorithm for multiple regression with fuzzy variables is proposed using the standard least-squares criterion as a performance index and the regression problem is posed as a gradient-descent optimisation.

...read moreread less

Journal Article•DOI•

Analysis and prediction of flow from local source in a river basin using a Neuro-fuzzy modeling tool.

[...]

Muhammad Aqil¹, Ichiro Kita², Akira Yano², Soichi Nishiyama³•Institutions (3)

Tottori University¹, Shimane University², Yamaguchi University³

01 Oct 2007-Journal of Environmental Management

TL;DR: The neuro-fuzzy model is recommended as an alternative tool for modeling of flow dynamics in the study area and was able to improve the root mean square error (RMSE) and mean absolute percentage error (MAPE) values of the multiple linear regression forecasts by about 13.52% and 10.73%, respectively.

...read moreread less

Book Chapter•DOI•

Seemingly Unrelated Regression

[...]

Denzil G. Fiebig

30 Nov 2007

Journal Article•DOI•

A comparison of various methods for multivariate regression with highly collinear variables

[...]

Henk A.L. Kiers¹, Age K. Smilde²•Institutions (2)

University of Groningen¹, University of Amsterdam²

24 Jul 2007-Statistical Methods and Applications

TL;DR: It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables, and in those cases, typically PLS and PCR give the best recoveries of regress weights.

...read moreread less

Abstract: Regression tends to give very unstable and unreliable regression weights when predictors are highly collinear. Several methods have been proposed to counter this problem. A subset of these do so by finding components that summarize the information in the predictors and the criterion variables. The present paper compares six such methods (two of which are almost completely new) to ordinary regression: Partial least Squares (PLS), Principal Component regression (PCR), Principle covariates regression, reduced rank regression, and two variants of what is called power regression. The comparison is mainly done by means of a series of simulation studies, in which data are constructed in various ways, with different degrees of collinearity and noise, and the methods are compared in terms of their capability of recovering the population regression weights, as well as their prediction quality for the complete population. It turns out that recovery of regression weights in situations with collinearity is often very poor by all methods, unless the regression weights lie in the subspace spanning the first few principal components of the predictor variables. In those cases, typically PLS and PCR give the best recoveries of regression weights. The picture is inconclusive, however, because, especially in the study with more real life like simulated data, PLS and PCR gave the poorest recoveries of regression weights in conditions with relatively low noise and collinearity. It seems that PLS and PCR are particularly indicated in cases with much collinearity, whereas in other cases it is better to use ordinary regression. As far as prediction is concerned: Prediction suffers far less from collinearity than recovery of the regression weights.

...read moreread less

Journal Article•DOI•

M-estimation of linear models with dependent errors

[...]

Wei Biao Wu

01 Apr 2007-Annals of Statistics

TL;DR: In this paper, asymptotic properties of M-estimates of regression parameters in linear models in which errors are dependent are derived and weak and strong Bahadur representations are derived.

...read moreread less

Abstract: We study asymptotic properties of M-estimates of regression parameters in linear models in which errors are dependent. Weak and strong Bahadur representations of the M-estimates are derived and a central limit theorem is established. The results are applied to linear models with errors being short-range dependent linear processes, heavy-tailed linear processes and some widely used nonlinear time series.

...read moreread less

Book Chapter•DOI•

Multiple linear regression.

[...]

Lynn E. Eberly¹•Institutions (1)

University of Minnesota¹

01 Jan 2007-Methods of Molecular Biology

TL;DR: This chapter describes multiple linear regression, a statistical approach used to describe the simultaneous associations of several variables with one continuous outcome.

...read moreread less

Abstract: This chapter describes multiple linear regression, a statistical approach used to describe the simultaneous associations of several variables with one continuous outcome. Important steps in using this approach include estimation and inference, variable selection in model building, and assessing model fit. The special cases of regression with interactions among the variables, polynomial regression, regressions with categorical (grouping) variables, and separate slopes models are also covered. Examples in microbiology are used throughout.

...read moreread less

Journal Article•DOI•

Stratification for the propensity score compared with linear regression techniques to assess the effect of treatment or exposure.

[...]

Stephen Senn¹, Erika Graf², Angelika Caputo³•Institutions (3)

University of Glasgow¹, University Medical Center Freiburg², Novartis³

30 Dec 2007-Statistics in Medicine

TL;DR: It is demonstrated that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model.

...read moreread less

Abstract: Stratifying and matching by the propensity score are increasingly popular approaches to deal with confounding in medical studies investigating effects of a treatment or exposure. A more traditional alternative technique is the direct adjustment for confounding in regression models. This paper discusses fundamental differences between the two approaches, with a focus on linear regression and propensity score stratification, and identifies points to be considered for an adequate comparison. The treatment estimators are examined for unbiasedness and efficiency. This is illustrated in an application to real data and supplemented by an investigation on properties of the estimators for a range of underlying linear models. We demonstrate that in specific circumstances the propensity score estimator is identical to the effect estimated from a full linear model, even if it is built on coarser covariate strata than the linear model. As a consequence the coarsening property of the propensity score-adjustment for a one-dimensional confounder instead of a high-dimensional covariate-may be viewed as a way to implement a pre-specified, richly parametrized linear model. We conclude that the propensity score estimator inherits the potential for overfitting and that care should be taken to restrict covariates to those relevant for outcome.

...read moreread less

Journal Article•DOI•

Least squares estimation of linear regression models for convex compact random sets

[...]

Gil González-Rodríguez¹, Ángela Blanco¹, Norberto Corral¹, Ana Colubi¹•Institutions (1)

University of Oviedo¹

24 Jan 2007-Advanced Data Analysis and Classification

TL;DR: This work derives least squares estimators for the simple linear regression model and examines them from a theoretical perspective and a stepwise algorithm is developed in order to find the estimates in this case.

...read moreread less

Abstract: Simple and multiple linear regression models are considered between variables whose “values” are convex compact random sets in $${\mathbb{R}^p}$$ , (that is, hypercubes, spheres, and so on). We analyze such models within a set-arithmetic approach. Contrary to what happens for random variables, the least squares optimal solutions for the basic affine transformation model do not produce suitable estimates for the linear regression model. First, we derive least squares estimators for the simple linear regression model and examine them from a theoretical perspective. Moreover, the multiple linear regression model is dealt with and a stepwise algorithm is developed in order to find the estimates in this case. The particular problem of the linear regression with interval-valued data is also considered and illustrated by means of a real-life example.

...read moreread less

Journal Article•DOI•

Linear Regression Estimation of Discrete Choice Models with Nonparametric Distributions of Random Coefficients

[...]

Patrick Bajari¹, Patrick Bajari², Jeremy T. Fox³, Stephen P. Ryan•Institutions (3)

National Bureau of Economic Research¹, University of Minnesota², University of Chicago³

01 Apr 2007-The American Economic Review

TL;DR: Bajari et al. as discussed by the authors introduced a computationally simple estimator that uses linear regression to estimate the distribution of random coefficients and compared their estimator to several alternatives in a Monte Carlo exercise, and found the estimator predicts out-of-sample market shares well.

...read moreread less

Abstract: Random coefficient discrete choice models are a popular method for estimating demand in differentiated product markets. We introduce a computationally simple estimator that uses linear regression to estimate the distribution of random coefficients. The estimator is nonparametric for the distribution of the random coefficients. We compare our estimator to several alternatives in a Monte Carlo exercise, and find the estimator predicts out-of-sample market shares well. We discuss extensions to panel data and dynamic programming. ∗Bajari: Department of Economics, University of Minnesota, Twin Cities and NBER, 1035 Heller Hall, 271 19th Ave South, Minneapolis, MN 55455, email: bajari@econ.umn.edu; Fox: Department of Economics, University of Chicago, 1126 E. 59th St., Chicago, IL 60637, email: fox@uchicago.edu; Ryan: Massachusetts Institute of Technology and NBER, 50 Memorial Drive, E52-262C, Cambridge, MA 02142, email: sryan@mit.edu.

...read moreread less

Journal Article•DOI•

Fuzzy nonparametric regression based on local linear smoothing technique

[...]

Ning Wang¹, Wen-Xiu Zhang¹, Chang-Lin Mei¹•Institutions (1)

Xi'an Jiaotong University¹

01 Sep 2007-Information Sciences

TL;DR: A fuzzy nonparametric model with crisp input and LR fuzzy output is considered and the local linear smoothing technique in statistics with the cross-validation procedure for selecting the optimal value of the smoothing parameter is fuzzified to fit this model.

...read moreread less

Journal Article•DOI•

Testing linear independence in linear models with interval-valued data

[...]

María Ángeles Gil¹, Gil González-Rodríguez¹, Ana Colubi¹, Manuel Montenegro¹•Institutions (1)

University of Oviedo¹

01 Mar 2007-Computational Statistics & Data Analysis

TL;DR: Methods are constructed to test whether there is some 'linear' relationship between imprecise predictor and response variables in a regression analysis and a suitable equivalence for the hypothesis of linear independence in this model is obtained in terms of the mid-spread representations of the interval-valued variables.

...read moreread less

Journal Article•DOI•

Analysis of experimental data on internal waves with statistical method

[...]

Cheng-Wu Chen, Chen-Yuan Chen, Peter Hsien‐Chueh Yang, Tsung-Hao Chen

13 Mar 2007-Engineering Computations

TL;DR: The proposed statistical scheme is demonstrated by the analysis of experimental data on internal waves, in which the results can well illustrate what has been investigated in laboratory experiment and may be applicable to the naturally occurring reflection of internal waves from sloping b...

...read moreread less

Abstract: Purpose – This study seeks to develop a systematic means of identifying regression models using a complex regression model with a statistical method.Design/methodology/approach – As a widely adopted statistical scheme for analyzing multifactor data, regression analysis provides a conceptually simple algorithm for examining functional relationships among variables. This investigation assesses the proposed relationship using a sample of data in regression analysis and then estimates the fit using statistics. Furthermore, several algorithms and added variable plots are presented to obtain an appropriate regression model and the relationship between response variables y, p and explanatory variables x0,x1,x2, … ,xp.Findings – The proposed statistical scheme is demonstrated by the analysis of experimental data on internal waves, in which the results can well illustrate what has been investigated in laboratory experiment and may be applicable to the naturally occurring reflection of internal waves from sloping b...

...read moreread less

DOI•

Role of Categorical Variables in Multicollinearity in the Linear Regression Model

[...]

Malte Wißmann, Helge Toutenburg, Shalabh

10 Dec 2007

TL;DR: The present article exposes the diagnostic tool condition number to linear regression models with categorical explanatory variables and analyzes how the dummy variables and choice of reference category can affect the degree of multicollinearity.

...read moreread less

Abstract: The present article discusses the role of categorical variable in the problem of multicollinearity in linear regression model. It exposes the diagnostic tool condition number to linear regression models with categorical explanatory variables and analyzes how the dummy variables and choice of reference category can affect the degree of multicollinearity. Such an effect is analyzed analytically as well as numerically through simulation and real data application.

...read moreread less

Journal Article•DOI•

Visualizable and interpretable regression models with good prediction power

[...]

Hyunjoong Kim¹, Wei-Yin Loh², Yu-Shan Shih³, Probal Chaudhuri⁴•Institutions (4)

Yonsei University¹, University of Wisconsin-Madison², National Chung Cheng University³, Indian Statistical Institute⁴

22 Mar 2007-Iie Transactions

TL;DR: A tree-structured method that fits a simple but nontrivial model to each partition of the variable space that ensures that each piece of the fitted regression function can be visualized with a graph or a contour plot.

...read moreread less

Abstract: Many methods can fit models with a higher prediction accuracy, on average, than the least squares linear regression technique. But the models, including linear regression, are typically impossible to interpret or visualize. We describe a tree-structured method that fits a simple but nontrivial model to each partition of the variable space. This ensures that each piece of the fitted regression function can be visualized with a graph or a contour plot. For maximum interpretability, our models are constructed with negligible variable selection bias and the tree structures are much more compact than piecewise-constant regression trees. We demonstrate, by means of a large empirical study involving 27 methods, that the average prediction accuracy of our models is almost as high as that of the most accurate “black-box” methods from the statistics and machine learning literature.

...read moreread less

Journal Article•DOI•

Bayesian multivariate linear regression with application to change point models in hydrometeorological variables

[...]

Ousmane Seidou¹, Ousmane Seidou², Jérôme Asselin¹, Taha B. M. J. Ouarda¹•Institutions (2)

Institut national de la recherche scientifique¹, University of Ottawa²

01 Aug 2007-Water Resources Research

TL;DR: In this paper, a change point detection approach for multivariate linear regression models is presented, which can account for missing data in the response variables and/or in the explicative variables and also improves on recently published change point detector methodologies by allowing a more flexible and thus more realistic prior specification for the existence of a change and the date of change as well as for the regression parameters.

...read moreread less

Abstract: [1] Multivariate linear regression is one of the most popular modeling tools in hydrology and climate sciences for explaining the link between key variables. Piecewise linear regression is not always appropriate since the relationship may experiment sudden changes due to climatic, environmental, or anthropogenic perturbations. To address this issue, a practical and general approach to the Bayesian analysis of the multivariate regression model is presented. The approach allows simultaneous single change point detection in a multivariate sample and can account for missing data in the response variables and/or in the explicative variables. It also improves on recently published change point detection methodologies by allowing a more flexible and thus more realistic prior specification for the existence of a change and the date of change as well as for the regression parameters. The estimation of all unknown parameters is achieved by Monte Carlo Markov chain simulations. It is shown that the developed approach is able to reproduce the results of Rasmussen (2001) as well as those of Perreault et al. (2000a, 2000b). Furthermore, two of the examples provided in the paper show that the proposed methodology can readily be applied to some problems that cannot be addressed by any of the above-mentioned approaches because of limiting model structure and/or restrictive prior assumptions. The first of these examples deals with single change point detection in the multivariate linear relationship between mean basin-scale precipitation at different periods of the year and the summer–autumn flood peaks of the Broadback River located in northern Quebec, Canada. The second one addresses the problem of missing data estimation with uncertainty assessment in multisite streamflow records with a possible simultaneous shift in mean streamflow values that occurred at an unknown date.

...read moreread less

Book Chapter•DOI•

General linear models.

[...]

Edward H. Ip¹•Institutions (1)

Wake Forest University¹

01 Jan 2007-Methods of Molecular Biology

TL;DR: This chapter presents the general linear model as an extension to the two-sample t-test, analysis of variance (ANOVA), and linear regression, and the F test is introduced as a means to test for the strength of group effect.

...read moreread less

Abstract: This chapter presents the general linear model as an extension to the two-sample t-test, analysis of variance (ANOVA), and linear regression. We illustrate the general linear model using two-way ANOVA as a prime example. The underlying principle of ANOVA, which is based on the decomposition of the value of an observed variable into grand mean, group effect and random noise, is emphasized. Further into this chapter, the F test is introduced as a means to test for the strength of group effect. The procedure of F test for identifying a parsimonious set of factors in explaining an outcome of interest is also described.

...read moreread less

Journal Article•DOI•

Estimating the error distribution function in semiparametric regression

[...]

Ursula U. Müller, Anton Schick, Wolfgang Wefelmeyer

01 Jan 2007-Statistics & Decisions

TL;DR: In this paper, a stochastic expansion for a residual-based estimator of the error distribution function in a partially linear regression model is proved, which implies a functional central limit theorem.

...read moreread less

Abstract: We prove a stochastic expansion for a residual-based estimator of the error distribution function in a partly linear regression model. It implies a functional central limit theorem. As special cases we cover nonparametric, nonlinear and linear regression models.

...read moreread less

Collapse