scispace - formally typeset
Search or ask a question

Showing papers on "Proper linear model published in 2005"


Journal ArticleDOI
TL;DR: This paper uses a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and shows how this approach can be used to build the logistics regression models at the leaves by incrementally refining those constructed at higher levels in the tree.
Abstract: Tree induction methods and linear models are popular techniques for supervised learning tasks, both for the prediction of nominal classes and numeric values. For predicting numeric quantities, there has been work on combining these two schemes into `model trees', i.e. trees that contain linear regression functions at the leaves. In this paper, we present an algorithm that adapts this idea for classification problems, using logistic regression instead of linear regression. We use a stagewise fitting process to construct the logistic regression models that can select relevant attributes in the data in a natural way, and show how this approach can be used to build the logistic regression models at the leaves by incrementally refining those constructed at higher levels in the tree. We compare the performance of our algorithm to several other state-of-the-art learning schemes on 36 benchmark UCI datasets, and show that it produces accurate and compact classifiers.

1,200 citations


Journal ArticleDOI

1,131 citations


Book
08 Mar 2005
TL;DR: The reliability handbook as discussed by the authors is the most comprehensive statistical reliability textbook, with 45 different chapters. The mathematical level of a number of these chapters is at or above the content of Meeker and Escobar (1998).
Abstract: nondeterministic methods are probabilistic methods “that replace traditional deterministic approaches for making design decisions with a new risk-based approach that uses rigorous models to quantify uncertainty and assess safety” (Preface, p. ix). Applications in Part 3 of the handbook are limited to the first three industries listed above. Participants as authors include engineers and scientists, as well as statisticians, from industry and from government agencies such as NASA, plus the usual coterie of academic participants. Altogether 76 authors and co-authors have assembled 45 different chapters. The first seven chapters constitute Part 1, “Status and Future of Nondeterministic Approaches” (NDA’s). These chapters focus on methods for product design and the research that supports the creation and development of new reliability methodologies. The next 20 chapters have been gathered under the heading for Part 2, “Nondeterministic Modeling: Critical Issues and Recent Advances.” Some of the statistical topics that can be found here are uncertainty, information theory, evidence theory, interval methods, expert knowledge, reliability models, system reliability, probability models, response surfaces, Bayesian modeling and updating, accelerated life testing, and variance-reduction techniques, where the foregoing laundry list follows the order of the chapters. The mathematical level of a number of these chapters is at or above the content of Meeker and Escobar (1998), the most comprehensive statistical reliability textbook. This is certainly not the technical level of a book for reliability engineers, such as O’Connor (2002). I did not expect to find a lot of interesting chapters among the applications in Part III, since BP does not design and deliver products in very many of its businesses. There are a number of chapters that relate to design in aerospace or machinery. There are also chapters on reliability assessment, such as corrosionfatigue damage, a common problem in refineries, and analysis for composite structures, which are everywhere in oil and gas production. There is a chapter on reliability assessment for ships, a big issue for a company like BP that owns a fleet of ships that transport liquefied natural gas. There is another chapter on reliability assessment and maintenance for large pipelines. BP owns a few thousand miles of those, whose reliability is critical to the operation of the entire business. Some of these chapters have some mathematical content, but generally they are more typical of the engineering reliability textbook genre. This is probably not a book that should be on every reliability engineer’s desktop. Too many papers have a research orientation to their content. However, any organization in industry, government, or academia that has an interest in reliability technology should make sure to acquire copies for their associated libraries. Every scientist or engineer with an interest in reliability should be able to find something of interest in this handbook.

765 citations


Journal ArticleDOI
TL;DR: In this article, a functional linear regression (FLR) method is proposed for sparse longitudinal data, where both the predictor and response are functions of a covariate such as time.
Abstract: We propose nonparametric methods for functional linear regression which are designed for sparse longitudinal data, where both the predictor and response are functions of a covariate such as time. Predictor and response processes have smooth random trajectories, and the data consist of a small number of noisy repeated measurements made at irregular times for a sample of subjects. In longitudinal studies, the number of repeated measurements per subject is often small and may be modeled as a discrete random number and, accordingly, only a finite and asymptotically nonincreasing number of measurements are available for each subject or experimental unit. We propose a functional regression approach for this situation, using functional principal component analysis, where we estimate the functional principal component scores through conditional expectations. This allows the prediction of an unobserved response trajectory from sparse measurements of a predictor trajectory. The resulting technique is flexible and allows for different patterns regarding the timing of the measurements obtained for predictor and response trajectories. Asymptotic properties for a sample of n subjects are investigated under mild conditions, as n → oo, and we obtain consistent estimation for the regression function. Besides convergence results for the components of functional linear regression, such as the regression parameter function, we construct asymptotic pointwise confidence bands for the predicted trajectories. A functional coefficient of determination as a measure of the variance explained by the functional regression model is introduced, extending the standard R 2 to the functional case. The proposed methods are illustrated with a simulation study, longitudinal primary biliary liver cirrhosis data and an analysis of the longitudinal relationship between blood pressure and body mass index.

696 citations


Book
John Geweke1
14 Sep 2005
TL;DR: In this article, the authors proposed a Bayesian inference method based on the prior distribution of the probability distributions of the classes of a set of classes in a class with respect to the probability distribution of each class.
Abstract: Preface. 1. Introduction. 1.1 Two Examples. 1.1.1 Public School Class Sizes. 1.1.2 Value at Risk. 1.2 Observables, Unobservables, and Objects of Interest. 1.3 Conditioning and Updating. 1.4 Simulators. 1.5 Modeling. 1.6 Decisionmaking. 2. Elements of Bayesian Inference. 2.1 Basics. 2.2 Sufficiency, Ancillarity, and Nuisance Parameters. 2.2.1 Sufficiency. 2.2.2 Ancillarity. 2.2.3 Nuisance Parameters. 2.3 Conjugate Prior Distributions. 2.4 Bayesian Decision Theory and Point Estimation. 2.5 Credible Sets. 2.6 Model Comparison. 2.6.1 Marginal Likelihoods. 2.6.2 Predictive Densities. 3. Topics in Bayesian Inference. 3.1 Hierarchical Priors and Latent Variables. 3.2 Improper Prior Distributions. 3.3 Prior Robustness and the Density Ratio Class. 3.4 Asymptotic Analysis. 3.5 The Likelihood Principle. 4. Posterior Simulation. 4.1 Direct Sampling,. 4.2 Acceptance and Importance Sampling. 4.2.1 Acceptance Sampling. 4.2.2 Importance Sampling. 4.3 Markov Chain Monte Carlo. 4.3.1 The Gibbs Sampler. 4.3.2 The Metropolis-Hastings Algorithm. 4.4 Variance Reduction. 4.4.1 Concentrated Expectations. 4.4.2 Antithetic Sampling. 4.5 Some Continuous State Space Markov Chain Theory. 4.5.1 Convergence of the Gibbs Sampler. 4.5.2 Convergence of the Metropolis-Hastings Algorithm. 4.6 Hybrid Markov Chain Monte Carlo Methods. 4.6.1 Transition Mixtures. 4.6.2 Metropolis within Gibbs. 4.7 Numerical Accuracy and Convergence in Markov Chain Monte Carlo. 5. Linear Models. 5.1 BACC and the Normal Linear Regression Model. 5.2 Seemingly Unrelated Regressions Models. 5.3 Linear Constraints in the Linear Model. 5.3.1 Linear Inequality Constraints. 5.3.2 Conjectured Linear Restrictions, Linear Inequality Constraints, and Covariate Selection. 5.4 Nonlinear Regression. 5.4.1 Nonlinear Regression with Smoothness Priors. 5.4.2 Nonlinear Regression with Basis Functions. 6. Modeling with Latent Variables. 6.1 Censored Normal Linear Models. 6.2 Probit Linear Models. 6.3 The Independent Finite State Model. 6.4 Modeling with Mixtures of Normal Distributions. 6.4.1 The Independent Student-t Linear Model. 6.4.2 Normal Mixture Linear Models. 6.4.3 Generalizing the Observable Outcomes. 7. Modeling for Time Series. 7.1 Linear Models with Serial Correlation. 7.2 The First-Order Markov Finite State Model. 7.2.1 Inference in the Nonstationary Model. 7.2.2 Inference in the Stationary Model. 7.3 Markov Normal Mixture Linear Model. 8. Bayesian Investigation. 8.1 Implementing Simulation Methods. 8.1.1 Density Ratio Tests. 8.1.2 Joint Distribution Tests. 8.2 Formal Model Comparison. 8.2.1 Bayes Factors for Modeling with Common Likelihoods. 8.2.2 Marginal Likelihood Approximation Using Importance Sampling. 8.2.3 Marginal Likelihood Approximation Using Gibbs Sampling. 8.2.4 Density Ratio Marginal Likelihood Approximation. 8.3 Model Specification. 8.3.1 Prior Predictive Analysis. 8.3.2 Posterior Predictive Analysis. 8.4 Bayesian Communication. 8.5 Density Ratio Robustness Bounds. Bibliography. Author Index. Subject Index.

665 citations


Journal ArticleDOI
TL;DR: In this paper, the authors recommend that ecologists choose regression, especially replicated regression, over ANOVA when dealing with continuous factors for two reasons: (1) regression is generally a more powerful approach than ANOVA and (2) regression provides quantitative output that can be incorporated into ecological models more effectively than ANVA output.
Abstract: Linear regression and analysis of variance (ANOVA) are two of the most widely used statistical techniques in ecology. Regression quantitatively describes the relationship between a response variable and one or more continuous independent variables, while ANOVA determines whether a response variable differs among discrete values of the independent variable(s). Designing experiments with discrete factors is straightforward because ANOVA is the only option, but what is the best way to design experiments involving continuous factors? Should ecologists prefer experiments with few treatments and many replicates analyzed with ANOVA, or experiments with many treatments and few replicates per treatment analyzed with regression? We recommend that ecologists choose regression, especially replicated regression, over ANOVA when dealing with continuous factors for two reasons: (1) regression is generally a more powerful approach than ANOVA and (2) regression provides quantitative output that can be incorporated into ecological models more effectively than ANOVA output.

334 citations


Book
01 Jan 2005
TL;DR: In this article, the authors present an overview of statistical methods for forecasting and data analysis, as well as a case study of the Box-Jenkins Seasonal Modeling and its application to time series regression.
Abstract: Part I: INTRODUCTION AND REVIEW OF BASIC STATISTICS. 1. An Introduction to Forecasting. Forecasting and Data. Forecasting Methods. Errors in Forecasting. Choosing a Forescasting Technique. An Overview of Quantitative Forecasting Techniques. 2. Basic Statistical Concepts. Populations. Probability. Random Samples and Sample Statistics. Continuous Probability Distributions. The Normal Probability Distribution. The t-Distribution, the F-Distribution, the Chi-Square Distribution. Confidence Intervals for a Population Mean. Hypothesis Testing for a Population Mean. Exercises. Part II: REGRESSION ANALYSIS. 3. Simple Linear Regression. The Simple Linear Regression Model. The Least Squares Point Estimates. Point Estimates and Point Predictions. Model Assumptions and the Standard Error. Testing the Significance of the Slope and y Intercept. Confidence and Prediction Intervals. Simple Coefficients of Determination and Correlation. An F Test for the Model. Exercises. 4. Multiple Linear Regression. The Linear Regression Model. The Least Squares Estimates, and Point Estimation and Prediction. The Mean Square Error and the Standard Error. Model Utility: R2, Adjusted R2, and the Overall F Test. Testing the Significance of an Independent Variable. Confidence and Prediction Intervals. The Quadratic Regression Model. Interaction. Using Dummy Variables to Model Qualitative Independent Variables. The Partial F Test: Testing the Significance of a Portion of a Regression Model. Exercises. 5. Model Building and Residual Analysis. Model Building and the Effects of Multicollinearity. Residual Analysis in Simple Regression. Residual Analysis in Multiple Regression. Diagnostics for Detecting Outlying and Influential Observations. Exercises. Part III: TIME SERIES REGRESSION, DECOMPOSITION METHODS, AND EXPONENTIAL SMOOTHING. 6. Time Series Regression. Modeling Trend by Using Polynomial Functions. Detecting Autocorrelation. Types of Seasonal Variation. Modeling Seasonal Variation by Using Dummy Variables and Trigonometric Functions. Growth Curves. Handling First-Order Autocorrelation. Exercises. 7. Decomposition Methods. Multiplicative Decomposition. Additive Decomposition. The X-12-ARIMA Seasonal Adjustment Method. Exercises. 8. Exponential Smoothing. Simple Exponential Smoothing. Tracking Signals. Holts Trend Corrected Exponential Smoothing. Holt-Winters Methods. Damped Trends and Other Exponential Smoothing Methods. Models for Exponential Smoothing and Prediction Intervals. Exercises. Part IV: THE BOX-JENKINS METHODOLOGY. 9. Nonseasonal Box-Jenkins Modeling and Their Tentative Identification. Stationary and Nonstationary Time Series. The Sample Autocorrelation and Partial Autocorrelation Functions: The SAC and SPAC. An Introduction to Nonseasonal Modeling and Forecasting. Tentative Identification of Nonseasonal Box-Jenkins Models. Exercises. 10. Estimation, Diagnostic Checking, and Forecasting for Nonseasonal Box-Jenkins Models. Estimation. Diagnostic Checking. Forecasting. A Case Study. Box-Jenkins Implementation of Exponential Smoothing. Exercises. 11. Box-Jenkins Seasonal Modeling. Transforming a Seasonal Time Series into a Stationary Time Series. Three Examples of Seasonal Modeling and Forecasting. Box-Jenkins Error Term Models in Time Series Regression. Exercises. 12. Advanced Box-Jenkins Modeling. The General Seasonal Model and Guidelines for Tentative Identificatino. Intervention Models. A Procedure for Building a Transfer Function Model. Exercises. Appendix A: Statistical Tables Appendix B: Matrix Algebra for Regression Calculations. Matrices and Vectors. The Transpose of a Matrix. Sums and Differences of Matrices. Matrix Multiplication. The Identity Matrix. Linear Dependence and Linear Independence. The Inverse of a Matrix. The Least Squares Point Esimates. The Unexplained Variation and Explained Variation. The Standard Error of the Estimate b. The Distance Value. Using Squared Terms. Using Interaction Terms. Using Dummy Variable. The Standard Error of the Estimate of a Linear Combination of Regression Parameters. Exercises. Appendix C: References.

253 citations


Journal ArticleDOI
TL;DR: The approach proposed for PLS generalised linear regression is simple and easy to implement and can be easily generalised to any model that is linear at the level of the explanatory variables.

252 citations


Journal ArticleDOI
TL;DR: In this paper, Parametric (Modified Least Squares) and non-parametric (Theil-Sen) consistent predictors are given for linear regression in the presence of measurement errors together with analytical approximations of their prediction confidence intervals.

244 citations


Journal ArticleDOI
TL;DR: In this paper, a generalized functional linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function is proposed, where a linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function, and the expected value of the response is related to this linear predictor via a link function.
Abstract: We propose a generalized functional linear regression model for a regression situation where the response variable is a scalar and the predictor is a random function. A linear predictor is obtained by forming the scalar product of the predictor function with a smooth parameter function, and the expected value of the response is related to this linear predictor via a link function. If, in addition, a variance function is specified, this leads to a functional estimating equation which corresponds to maximizing a functional quasi-likelihood. This general approach includes the special cases of the functional linear model, as well as functional Poisson regression and functional binomial regression. The latter leads to procedures for classification and discrimination of stochastic processes and functional data. We also consider the situation where the link and variance functions are unknown and are estimated nonparametrically from the data, using a semiparametric quasi-likelihood procedure. An essential step in our proposal is dimension reduction by approximating the predictor processes with a truncated Karhunen-Loeve expansion. We develop asymptotic inference for the proposed class of generalized regression models. In the proposed asymptotic approach, the truncation parameter increases with sample size, and a martingale central limit theorem is applied to establish the resulting increasing dimension asymptotics. We establish asymptotic normality for a properly scaled distance between estimated and true functions that corresponds to a suitable L^2 metric and is defined through a generalized covariance operator.

223 citations


Book
20 Jan 2005
TL;DR: This chapter discusses Regression Models for Time Series Situations, Generalized Linear Models and Poisson Regression, and case studies in Linear Regression.
Abstract: 1. Introduction to Regression Models. 2. Simple Linear Regression. 3. A Review of Matrix Algebra and Important Results of Random Vectors. 4. Multiple Linear Regression Model. 5. Specification Issues in Regression Models. 6. Model Checking. 7. Model Selection. 8. Case Studies in Linear Regression. 9. Nonlinear Regression Models. 10. Regression Models for Time Series Situations. 11. Logistic Regression. 12. Generalized Linear Models and Poisson Regression. Brief Answers to Selected Exercises. Statistical Tables. References.

Book
01 Jan 2005
TL;DR: In this article, the authors present a review of probability and statistical inference techniques for environmental sampling, including simple random sampling, meta-analysis, and generalized linear models, which is a generalization of the classical linear model.
Abstract: Preface. 1 Linear regression. 1.1 Simple linear regression. 1.2 Multiple linear regression. 1.3 Qualitative predictors: ANOVA and ANCOVA models. 1.4 Random--effects models. 1.5 Polynomial regression. Exercises. 2 Nonlinear regression. 2.1 Estimation and testing. 2.2 Piecewise regression models. 2.3 Exponential regression models. 2.4 Growth curves. 2.5 Rational polynomials. 2.6 Multiple nonlinear regression. Exercises. 3 Generalized linear models. 3.1 Generalizing the classical linear model. 3.2 Theory of generalized linear models. 3.3 Specific forms of generalized linear models. Exercises. 4 Quantitative risk assessment with stimulus--response data. 4.1 Potency estimation for stimulus--response data. 4.2 Risk estimation. 4.3 Benchmark analysis. 4.4 Uncertainty analysis. 4.5 Sensitivity analysis. 4.6 Additional topics. Exercises. 5 Temporal data and autoregressive modeling. 5.1 Time series. 5.2 Harmonic regression. 5.3 Autocorrelation. 5.4 Autocorrelated regression models. 5.5 Simple trend and intervention analysis. 5.6 Growth curves revisited. Exercises. 6 Spatially correlated data. 6.1 Spatial correlation. 6.2 Spatial point patterns and complete spatial randomness. 6.3 Spatial measurement. 6.4 Spatial prediction. Exercises. 7 Combining environmental information. 7.1 Combining P--values. 7.2 Effect size estimation. 7.3 Meta--analysis. 7.4 Historical control information. Exercises. 8 Fundamentals of environmental sampling. 8.1 Sampling populations -- simple random sampling. 8.2 Designs to extend simple random sampling. 8.3 Specialized techniques for environmental sampling. Exercises. A Review of probability and statistical inference. A.1 Probability functions. A.2 Families of distributions. A.3 Random sampling. A.4 Parameter estimation. A.5 Statistical inference. A.6 The delta method. B Tables. References. Author index. Subject index.

Journal ArticleDOI
TL;DR: In this article, a multilevel generalization of the classic regression proce... is proposed, where the dependence on the regression parameters is linear in both MPR and MLR, and the basic concepts are illustrated using the Lorenz convection model, the classical double well problem, and a three-well problem.
Abstract: Predictive models are constructed to best describe an observed field’s statistics within a given class of nonlinear dynamics driven by a spatially coherent noise that is white in time. For linear dynamics, such inverse stochastic models are obtained by multiple linear regression (MLR). Nonlinear dynamics, when more appropriate, is accommodated by applying multiple polynomial regression (MPR) instead; the resulting model uses polynomial predictors, but the dependence on the regression parameters is linear in both MPR and MLR. The basic concepts are illustrated using the Lorenz convection model, the classical double-well problem, and a three-well problem in two space dimensions. Given a data sample that is long enough, MPR successfully reconstructs the model coefficients in the former two cases, while the resulting inverse model captures the three-regime structure of the system’s probability density function (PDF) in the latter case. A novel multilevel generalization of the classic regression proce...


Reference EntryDOI
15 Oct 2005
TL;DR: Generalized linear models (GLMMs) as mentioned in this paper represent a class of regression models for several types of dependent variables where the linear predictor includes only fixed effects, and are especially useful for analysis of correlated nonnormal data, and the term GLMMs often refers to models for these kinds of data.
Abstract: Generalized linear models (GLMs); represent a class of regression models for several types of dependent variables where the linear predictor includes only fixed effects. Incorporation of random effects into GLMs yields the class of models known as generalized linear mixed models (GLMMs). Random effects are typically included for analysis of clustered and/or longitudinal data to account for the correlation of the data. GLMMs are especially useful for analysis of correlated nonnormal data, and the term GLMMs often refers to models for these kinds of data. Keywords: generalized linear models; multilevel models; hierarchical linear models; logistic regression; probit regression; Poisson regression; maximum likelihood estimation

Journal ArticleDOI
TL;DR: In this paper, the authors present some of the considerations that arise in ensuring that two-step estimators are consistent in hierarchical models, even when neither stage is a conventional linear regression model.
Abstract: Two-step estimators for hierarchical models can be constructed even when neither stage is a conventional linear regression model. For example, the first stage might consist of probit models, or duration models, or event count models. The second stage might be a nonlinear regression specification. This note sketches some of the considerations that arise in ensuring that two-step estimators are consistent in such cases.

01 Jan 2005
TL;DR: In this paper, the authors adopt a Bayesian approach with priors for the regression coefficients that are scale mixtures of normal distributions and embody a high prior probability of proximity to zero.
Abstract: The problem of variable selection in regression and the generalised linear model is addressed. We adopt a Bayesian approach with priors for the regression coefficients that are scale mixtures of normal distributions and embody a high prior probability of proximity to zero. By seeking modal estimates we generalise the lasso. Properties of the priors and their resultant posteriors are explored in the context of the linear and generalised linear model especially when there are more variables than observations. We develop EM algorithms that embrace the need to explore the multiple modes of the non log-concave posterior distributions. Finally we apply the technique to microarray data using a probit model to find the genetic predictors of osteo- versus rheumatoid arthritis. Keywords: Bayesian modal analysis, Variable selection in regression, Scale mixtures of normals, Improper Jeffreys prior, lasso, Penalised likelihood, EMalgorithm, Multiple modes, More variables than observations, Singular value decomposition, Latent variables, Probit regression.

Journal ArticleDOI
TL;DR: In this article, a collection of functions that can be used to implement a robust analysis of a linear model based on weighted Wilcoxon (WW) estimations is presented, for instance, estimation, regression model, designed experiment, and autoregressive time series model for the sake of illustration.
Abstract: It is well-known that Wilcoxon procedures out perform least squares procedures when the data deviate from normality and/or contain outliers. These procedures can be generalized by introducing weights; yielding so-called weighted Wilcoxon (WW) techniques. In this paper we demonstrate how WW-estimates can be calculated using an L1 regression routine. More importantly, we present a collection of functions that can be used to implement a robust analysis of a linear model based on WW-estimates. For instance, estimation, tests of linear hypotheses, residual analyses, and diagnostics to detect differences in fits for various weighting schemes are discussed. We analyze a regression model, designed experiment, and autoregressive time series model for the sake of illustration. We have chosen to implement the suite of functions using the R statistical software package. Because R is freely available and runs on multiple platforms, WW-estimation and associated inference is now universally accessible.

Proceedings ArticleDOI
24 Oct 2005
TL;DR: An experimental comparison of several statistical machine learning methods for short-term prediction of travel times on road segments suggests that novel iterative linear regression algorithms should be a preferred prediction methods for large-scale travel time prediction.
Abstract: This paper presents an experimental comparison of several statistical machine learning methods for short-term prediction of travel times on road segments. The comparison includes linear regression, neural networks, regression trees, k-nearest neighbors, and locally-weighted regression, tested on the same historical data. In spite of the expected superiority of non-linear methods over linear regression, the only non-linear method that could consistently outperform linear regression was locally-weighted regression. This suggests that novel iterative linear regression algorithms should be a preferred prediction methods for large-scale travel time prediction.

Journal ArticleDOI
TL;DR: It is common practice to calculate large numbers of molecular descriptors, apply variable selection procedures to reduce the numbers, and then construct multiple linear regression (MLR) models with biological activity.
Abstract: It is common practice to calculate large numbers of molecular descriptors, apply variable selection procedures to reduce the numbers, and then construct multiple linear regression (MLR) models with biological activity. The significance of these models is judged using the usual statistical tests. Unfortunately, these tests are not appropriate under these circumstances since the MLR models suffer from “selection bias”. Experiments with regression using random numbers have generated critical values (Fmax) with which to assess significance.

Book
07 Dec 2005
TL;DR: Inference in Regression: Attributes as Explanatory Variables and Models of Systems: Autoregressive Models.
Abstract: TEXT. Regression. Inference in Regression. Attributes as Explanatory Variables. Nonlinear Relationships. Regression and Time Series. Lagged Variables. Regression Miscellanea. More on Inference in Regression. Autoregressive Models. The Classification Problem. More on Classification. Models of Systems. CASES. Appendices. Selected References. Index.

Proceedings ArticleDOI
01 Jan 2005
TL;DR: In this article, the authors proposed an adaptive time series model where the polynomial degree of each interval vary (constant, linear and so on) given a number of regressors.
Abstract: Time series are difficult to monitor, summarize and predict. Segmentation organizes time series into few intervals having uniform characteristics (flatness, linearity, modality, monotonicity and so on). For scalability, we require fast linear time algorithms. The popular piecewise linear model can determine where the data goes up or down and at what rate. Unfortunately, when the data does not follow a linear model, the computation of the local slope creates overfitting. We propose an adaptive time series model where the polynomial degree of each interval vary (constant, linear and so on). Given a number of regressors, the cost of each interval is its polynomial degree: constant intervals cost 1 regressor, linear intervals cost 2 regressors, and so on. Our goal is to minimize the Euclidean (l_2) error for a given model complexity. Experimentally, we investigate the model where intervals can be either constant or linear. Over synthetic random walks, historical stock market prices, and electrocardiograms, the adaptive model provides a more accurate segmentation than the piecewise linear model without increasing the cross-validation error or the running time, while providing a richer vocabulary to applications. Implementation issues, such as numerical stability and real-world performance, are discussed.

Journal ArticleDOI
TL;DR: In this article, a covariate adjustment method is proposed for situations where both predictors and response in a regression model are not directly observable, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate.
Abstract: mue11er@wa1d.ucdavis.edu SUMMARY We introduce covariate-adjusted regression for situations where both predictors and response in a regression model are not directly observable, but are contaminated with a multiplicative factor that is determined by the value of an unknown function of an observable covariate. We demonstrate how the regression coefficients can be estimated by establishing a connection to varying-coefficient regression. The proposed covariate adjustment method is illustrated with an analysis of the regression of plasma fibrinogen concentration as response on serum transferrin level as predictor for 69 haemodialysis patients. In this example, both response and predictor are thought to be influenced in a multiplicative fashion by body mass index. A bootstrap hypothesis test enables us to test the significance of the regression parameters. We establish consistency and convergence rates of the parameter estimators for this new covariate-adjusted regression model. Simulation studies demonstrate the efficacy of the proposed method.

Journal ArticleDOI
TL;DR: This work introduces a modified version of the Frisch-Newton algorithm for quantile regression described in Portnoy and Koenker[28], which substantially reduces the storage (memory) requirements and increases computational speed.
Abstract: Recent experience has shown that interior-point methods using a log barrier approach are far superior to classical simplex methods for computing solutions to large parametric quantile regression problems. In many large empirical applications, the design matrix has a very sparse structure. A typical example is the classical fixed-effect model for panel data where the parametric dimension of the model can be quite large, but the number of non-zero elements is quite small. Adopting recent developments in sparse linear algebra we introduce a modified version of the Frisch-Newton algorithm for quantile regression described in Portnoy and Koenker[28]. The new algorithm substantially reduces the storage (memory) requirements and increases computational speed. The modified algorithm also facilitates the development of nonparametric quantile regression methods. The pseudo design matrices employed in nonparametric quantile regression smoothing are inherently sparse in both the fidelity and roughness penalty components. Exploiting the sparse structure of these problems opens up a whole range of new possibilities for multivariate smoothing on large data sets via ANOVA-type decomposition and partial linear models.

Journal ArticleDOI
TL;DR: In this paper, the authors propose a new class of robust estimators for the parameters of a regression model in which the distribution of the error terms belongs to a class of exponential families including the log-gamma distribution.
Abstract: The authors propose a new class of robust estimators for the parameters of a regression model in which the distribution of the error terms belongs to a class of exponential families including the log-gamma distribution. These estimates, which are a natural extension of the MM-estimates for ordinary regression, may combine simultaneously high asymptotic efficiency and a high breakdown point. The authors prove the consistency and derive the asymptotic normal distribution of these estimates. A Monte Carlo study allows them to assess the efficiency and robustness of these estimates for finite samples.

Journal ArticleDOI
TL;DR: A multi-objective fuzzy linear regression model that allows all data points to influence the estimated parameters and the spread of the estimated values become wider as more data are included in the model.

Journal ArticleDOI
TL;DR: In this paper, the problem of estimating the regression coefficients in a competing risks model, where the relationship between the cause-specific hazard for the cause of interest and covariates is described using linear transformation models and when cause of failure is missing at random for a subset of individuals.
Abstract: SUMMARY We consider the problem of estimating the regression coefficients in a competing risks model, where the relationship between the cause-specific hazard for the cause of interest and covariates is described using linear transformation models and when cause of failure is missing at random for a subset of individuals. Using the theory of Robins et al. (1994) for missing data problems and the approach of Chen et al. (2002) for estimating regression coefficients for linear transformation models, we derive augmented inverse probability weighted complete-case estimators for the regression coefficients that are doubly robust. Simulations demonstrate the relevance of the theory in finite samples.

Book
22 Sep 2005
TL;DR: In this paper, the authors present a generalized linear model for regression analysis using multiple correlation testing Hypotheses and regression analysis results of regression analysis are used to test statistical hypotheses.
Abstract: List of Figures and Tables Series Editor's Introduction Acknowledgments 1 Generalized Linear Models 2 Some Basic Modeling Concepts Categorical Independent Variables Essential Components of Regression Modeling 3 Classical Multiple Regression Model Assumptions and Modeling Approach Results of Regression Analysis Multiple Correlation Testing Hypotheses 4 Fundamentals of Generalized Linear Modeling Exponential Family of Distributions Classical Normal Regression Logistic Regression Poisson Regression Proportional Hazards Survival Model 5 Maximum Likelihood Estimation 6 Deviance and Goodness of Fit Using Deviances to Test Statistical Hypotheses Goodness of Fit Assessing Goodness of Fit by Residual Analysis 7 Logistic Regression Example of Logistic Regression 8 Poisson Regression Example of Poisson Regression Model 9 Survival Analysis Survival Time Distributions Exponential Survival Model Example of Exponential Survival Model Conclusions Appendix References Index About the Authors

Journal ArticleDOI
TL;DR: In this paper, a partially linear transformation model is proposed for current status data analysis, where the unknown quantities are the transformation function, a linear regression parameter and a nonparametric regression effect.
Abstract: We consider partly linear transformation models applied to current status data. The unknown quantities are the transformation function, a linear regression parameter and a nonparametric regression effect. It is shown that the penalized MLE for the regression parameter is asymptotically normal and efficient and converges at the parametric rate, although the penalized MLE for the transformation function and nonparametric regression effect are only n 1/3 consistent. Inference for the regression parameter based on a block jack-knife is investigated. We also study computational issues and demonstrate the proposed methodology with a simulation study. The transformation models and partly linear regression terms, coupled with new estimation and inference techniques, provide flexible alternatives to the Cox model for current status data analysis.

Journal ArticleDOI
TL;DR: In this article, a method for the construction of a simultaneous confidence band for the normal-error multiple linear regression model is presented, where the confidence bands considered have their width proportional to the standard error of the estimated regression function, and the predictor variables are allowed to be constrained in intervals.
Abstract: This article presents a method for the construction of a simultaneous confidence band for the normal-error multiple linear regression model. The confidence bands considered have their width proportional to the standard error of the estimated regression function, and the predictor variables are allowed to be constrained in intervals. Past articles in this area gave exact bands only for the simple regression model. When there is more than one predictor variable, only conservative bands are proposed in the statistics literature. This article advances this methodology by providing simulation-based confidence bands for regression models with any number of predictor variables. Additionally, a criterion is proposed to assess the sensitivity of a simultaneous confidence band. This criterion is defined to be the probability that a false linear regression model is excluded from the band at least at one point and hence this false linear regression model is correctly declared as a false model by the band. Finally, th...