scispace - formally typeset
Search or ask a question

Regression models with responses on the unit interval: specification, estimation and comparison

TL;DR: In this paper, a generic structure is used to dene a set of regression models for restricted response variables, not only including the usually assumed formats but allowing for a wider range of models.
Abstract: Regression models are widely used on a diversity of application areas to describe associations between explanatory and response variables. The initially and frequently adopted Gaussian linear model was gradually extended to accommodate dierent kinds of response variables. These models were latter described as particular cases of the generalized linear models (GLM). The GLM family allows for a diversity of formats for the response variable and functions linking the parameters of the distribution to a linear predictor. This model structure became a benchmark for several further extensions and developments in statistical modelling such as generalized additive, overdispersed, zero inated, among other models. Response variables with values restricted to an interval, often (0; 1), are usual in social sciences, agronomy, psychometrics among other areas. Beta or Simplex distributions are often used although other options are mentioned in the literature. In this paper, a generic structure is used to dene a set of regression models for restricted response variables, not only including the usually assumed formats but allowing for a wider range of models. Individual models are dened by choosing three components: the probability distribution for the response; the function linking the parameter of the distribution of choice with the linear predictor; and the transformation function for the response. We report results of the analysis of four dierent

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI
04 Nov 2020
TL;DR: In this paper, the authors show that the unit-Rayleigh distribution is much more interesting than it might at first glance, revealing closed-form expressions of important functions, and new desirable properties for application purposes.
Abstract: The unit-Rayleigh distribution is a one-parameter distribution with support on the unit interval. It is defined as the so-called unit-Weibull distribution with a shape parameter equal to two. As a particular case among others, it seems that it has not been given special attention. This paper shows that the unit-Rayleigh distribution is much more interesting than it might at first glance, revealing closed-form expressions of important functions, and new desirable properties for application purposes. More precisely, on the theoretical level, we contribute to the following aspects: (i) we bring new characteristics on the form analysis of its main probabilistic and reliability functions, and show that the possible mode has a simple analytical expression, (ii) we prove new stochastic ordering results, (iii) we expose closed-form expressions of the incomplete and probability weighted moments at the basis of various probability functions and measures, (iv) we investigate distributional properties of the order statistics, (v) we show that the reliability coefficient can have a simple ratio expression, (vi) we provide a tractable expansion for the Tsallis entropy and (vii) we propose some bivariate unit-Rayleigh distributions. On a practical level, we show that the maximum likelihood estimate has a quite simple closed-form. Three data sets are analyzed and adjusted, revealing that the unit-Rayleigh distribution can be a better alternative to standard one-parameter unit distributions, such as the one-parameter Kumaraswamy, Topp–Leone, one-parameter beta, power and transmuted distributions.

19 citations

Journal ArticleDOI
TL;DR: A flexible class of regression models for continuous bounded data based on second-moment assumptions that can easily handle data with exact zeroes and ones in a unified way and has the Bernoulli mean and variance relationship as a limiting case.
Abstract: We propose a flexible class of regression models for continuous bounded data based on second-moment assumptions The mean structure is modelled by means of a link function and a linear predictor, w

10 citations


Cites background from "Regression models with responses on..."

  • ...A typical example of this approach can be found in Bonat et al. (2012), where the authors compared the fit of four different distributions for the analysis of four datasets....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a quasi-beta longitudinal regression model is proposed to deal with longitudinal continuous bounded data, where the covariance structure is defined in terms of a matrix linear predictor composed by known matrices.
Abstract: We propose a new class of regression models to deal with longitudinal continuous bounded data. The model is specified using second-moment assumptions, and we employ an estimating function approach for parameter estimation and inference. The main advantage of the proposed approach is that it does not need to assume a multivariate probability distribution for the response vector. The fitting procedure is easily implemented using a simple and efficient Newton scoring algorithm. Thus, the quasi-beta longitudinal regression model can easily handle data in the unit interval, including exact zeros and ones. The covariance structure is defined in terms of a matrix linear predictor composed by known matrices. A simulation study was conducted to check the properties of the estimating function estimators of the regression and dispersion parameter estimators. The NORTA algorithm (NORmal To Anything) was used to simulate correlated beta random variables. The results of this simulation study showed that the estimators are consistent and unbiased for large samples. The model is motivated by a data set concerning the water quality index, whose goal is to investigate the effect of dams on the water quality index measured on power plant reservoirs. Furthermore, diagnostic techniques were adapted to the proposed models, such as DFFITS, DFBETAS, Cook’s distance and half-normal plots with simulated envelope. The R code and data set are available in the supplementary material.

7 citations

Posted Content
TL;DR: Results show that the INLA approach is suitable and faster to t the proposed beta mixed models producing results similar to alternative algorithms and with easier handling of modeling alternatives.
Abstract: Generalized linear mixed models (GLMMs) encompass large class of statistical models, with a vast range of applications areas. GLMMs extend the linear mixed models allowing for dierent types of response variable. Three most common data types are continuous, counts and binary and standard distributions for these types of response variables are Gaussian, Poisson and binomial, respectively. Despite that exibility, there are situations where the response variable is continuous, but bounded, such as rates, percentages, indexes and proportions. In such situations the usual GLMMs are not adequate because bounds are ignored and the beta distribution can be used. Likelihood and Bayesian inference for beta mixed models are not straightforward demanding a computational overhead. Recently, a new algorithm for Bayesian inference called INLA (Integrated Nested Laplace Approximation) was proposed. INLA allows computation of many Bayesian GLMMs in a reasonable amount time allowing extensive comparison among models. We explore Bayesian inference for beta mixed models by INLA. We discuss the choice of prior distributions, sensitivity analysis and model selection measures through a real data set. The results obtained from INLA are compared with those obtained by an MCMC algorithm and likelihood analysis. We analyze data from an study on a life quality index of industry workers collected according to a hierarchical sampling scheme. Results show that the INLA approach is suitable and faster to t the proposed beta mixed models producing results similar to alternative algorithms and with easier handling of modeling alternatives. Sensitivity analysis, measures of goodness of t and model choice are discussed.

6 citations


Cites methods from "Regression models with responses on..."

  • ...Recently, Bonat et al. (2012) contrasted beta regression models with other approaches to model response variable on the unity interval, such that, Simplex, Kumaraswamy and Trans-Gaussian models....

    [...]

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

Book
01 Jan 1983
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,215 citations

Journal ArticleDOI
TL;DR: In this article, Lindley et al. make the less restrictive assumption that such a normal, homoscedastic, linear model is appropriate after some suitable transformation has been applied to the y's.
Abstract: [Read at a RESEARCH METHODS MEETING of the SOCIETY, April 8th, 1964, Professor D. V. LINDLEY in the Chair] SUMMARY In the analysis of data it is often assumed that observations Yl, Y2, *-, Yn are independently normally distributed with constant variance and with expectations specified by a model linear in a set of parameters 0. In this paper we make the less restrictive assumption that such a normal, homoscedastic, linear model is appropriate after some suitable transformation has been applied to the y's. Inferences about the transformation and about the parameters of the linear model are made by computing the likelihood function and the relevant posterior distribution. The contributions of normality, homoscedasticity and additivity to the transformation are separated. The relation of the present methods to earlier procedures for finding transformations is discussed. The methods are illustrated with examples.

12,158 citations


"Regression models with responses on..." refers background in this paper

  • ...The classical Box-Cox (Box & Cox, 1964) family defines power transformations which are not directly applicable to responses on restricted intervals....

    [...]

Journal ArticleDOI
TL;DR: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences, and it is thoroughly enjoyable to read.
Abstract: This is the Ž rst book on generalized linear models written by authors not mostly associated with the biological sciences. Subtitled “With Applications in Engineering and the Sciences,” this book’s authors all specialize primarily in engineering statistics. The Ž rst author has produced several recent editions of Walpole, Myers, and Myers (1998), the last reported by Ziegel (1999). The second author has had several editions of Montgomery and Runger (1999), recently reported by Ziegel (2002). All of the authors are renowned experts in modeling. The Ž rst two authors collaborated on a seminal volume in applied modeling (Myers and Montgomery 2002), which had its recent revised edition reported by Ziegel (2002). The last two authors collaborated on the most recent edition of a book on regression analysis (Montgomery, Peck, and Vining (2001), reported by Gray (2002), and the Ž rst author has had multiple editions of his own regression analysis book (Myers 1990), the latest of which was reported by Ziegel (1991). A comparable book with similar objectives and a more speciŽ c focus on logistic regression, Hosmer and Lemeshow (2000), reported by Conklin (2002), presumed a background in regression analysis and began with generalized linear models. The Preface here (p. xi) indicates an identical requirement but nonetheless begins with 100 pages of material on linear and nonlinear regression. Most of this will probably be a review for the readers of the book. Chapter 2, “Linear Regression Model,” begins with 50 pages of familiar material on estimation, inference, and diagnostic checking for multiple regression. The approach is very traditional, including the use of formal hypothesis tests. In industrial settings, use of p values as part of a risk-weighted decision is generally more appropriate. The pedagologic approach includes formulas and demonstrations for computations, although computing by Minitab is eventually illustrated. Less-familiar material on maximum likelihood estimation, scaled residuals, and weighted least squares provides more speciŽ c background for subsequent estimation methods for generalized linear models. This review is not meant to be disparaging. The authors have packed a wealth of useful nuggets for any practitioner in this chapter. It is thoroughly enjoyable to read. Chapter 3, “Nonlinear Regression Models,” is arguably less of a review, because regression analysis courses often give short shrift to nonlinear models. The chapter begins with a great example on the pitfalls of linearizing a nonlinear model for parameter estimation. It continues with the effective balancing of explicit statements concerning the theoretical basis for computations versus the application and demonstration of their use. The details of maximum likelihood estimation are again provided, and weighted and generalized regression estimation are discussed. Chapter 4 is titled “Logistic and Poisson Regression Models.” Logistic regression provides the basic model for generalized linear models. The prior development for weighted regression is used to motivate maximum likelihood estimation for the parameters in the logistic model. The algebraic details are provided. As in the development for linear models, some of the details are pushed into an appendix. In addition to connecting to the foregoing material on regression on several occasions, the authors link their development forward to their following chapter on the entire family of generalized linear models. They discuss score functions, the variance-covariance matrix, Wald inference, likelihood inference, deviance, and overdispersion. Careful explanations are given for the values provided in standard computer software, here PROC LOGISTIC in SAS. The value in having the book begin with familiar regression concepts is clearly realized when the analogies are drawn between overdispersion and nonhomogenous variance, or analysis of deviance and analysis of variance. The authors rely on the similarity of Poisson regression methods to logistic regression methods and mostly present illustrations for Poisson regression. These use PROC GENMOD in SAS. The book does not give any of the SAS code that produces the results. Two of the examples illustrate designed experiments and modeling. They include discussion of subset selection and adjustment for overdispersion. The mathematic level of the presentation is elevated in Chapter 5, “The Family of Generalized Linear Models.” First, the authors unify the two preceding chapters under the exponential distribution. The material on the formal structure for generalized linear models (GLMs), likelihood equations, quasilikelihood, the gamma distribution family, and power functions as links is some of the most advanced material in the book. Most of the computational details are relegated to appendixes. A discussion of residuals returns one to a more practical perspective, and two long examples on gamma distribution applications provide excellent guidance on how to put this material into practice. One example is a contrast to the use of linear regression with a log transformation of the response, and the other is a comparison to the use of a different link function in the previous chapter. Chapter 6 considers generalized estimating equations (GEEs) for longitudinal and analogous studies. The Ž rst half of the chapter presents the methodology, and the second half demonstrates its application through Ž ve different examples. The basis for the general situation is Ž rst established using the case with a normal distribution for the response and an identity link. The importance of the correlation structure is explained, the iterative estimation procedure is shown, and estimation for the scale parameters and the standard errors of the coefŽ cients is discussed. The procedures are then generalized for the exponential family of distributions and quasi-likelihood estimation. Two of the examples are standard repeated-measures illustrations from biostatistical applications, but the last three illustrations are all interesting reworkings of industrial applications. The GEE computations in PROC GENMOD are applied to account for correlations that occur with multiple measurements on the subjects or restrictions to randomizations. The examples show that accounting for correlation structure can result in different conclusions. Chapter 7, “Further Advances and Applications in GLM,” discusses several additional topics. These are experimental designs for GLMs, asymptotic results, analysis of screening experiments, data transformation, modeling for both a process mean and variance, and generalized additive models. The material on experimental designs is more discursive than prescriptive and as a result is also somewhat theoretical. Similar comments apply for the discussion on the quality of the asymptotic results, which wallows a little too much in reports on various simulation studies. The examples on screening and data transformations experiments are again reworkings of analyses of familiar industrial examples and another obvious motivation for the enthusiasm that the authors have developed for using the GLM toolkit. One can hope that subsequent editions will similarly contain new examples that will have caused the authors to expand the material on generalized additive models and other topics in this chapter. Designating myself to review a book that I know I will love to read is one of the rewards of being editor. I read both of the editions of McCullagh and Nelder (1989), which was reviewed by Schuenemeyer (1992). That book was not fun to read. The obvious enthusiasm of Myers, Montgomery, and Vining and their reliance on their many examples as a major focus of their pedagogy make Generalized Linear Models a joy to read. Every statistician working in any area of applied science should buy it and experience the excitement of these new approaches to familiar activities.

10,520 citations


"Regression models with responses on..." refers background in this paper

  • ...Nelder & Wedderburn (1972) and McCullagh & Nelder (1989) are benchmarks for advances in regression models, unifying several model specifications under the flexible class of generalised linear models (GLM)....

    [...]

Journal ArticleDOI
01 May 1972
TL;DR: In this paper, the authors used iterative weighted linear regression to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation.
Abstract: JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact support@jstor.org. Blackwell Publishing and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series A (General). SUMMARY The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log-likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components). The implications of the approach in designing statistics courses are discussed.

8,793 citations