scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Generalized additive models for location, scale and shape

01 Jun 2005-Journal of The Royal Statistical Society Series C-applied Statistics (Wiley/Blackwell (10.1111))-Vol. 54, Iss: 3, pp 507-554
TL;DR: The generalized additive model for location, scale and shape (GAMLSS) as mentioned in this paper is a general class of statistical models for a univariate response variable, which assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects.
Abstract: Summary. A general class of statistical models for a univariate response variable is presented which we call the generalized additive model for location, scale and shape (GAMLSS). The model assumes independent observations of the response variable y given the parameters, the explanatory variables and the values of the random effects. The distribution for the response variable in the GAMLSS can be selected from a very general family of distributions including highly skew or kurtotic continuous and discrete distributions. The systematic part of the model is expanded to allow modelling not only of the mean (or location) but also of the other parameters of the distribution of y, as parametric and/or additive nonparametric (smooth) functions of explanatory variables and/or random-effects terms. Maximum (penalized) likelihood estimation is used to fit the (non)parametric models. A Newton–Raphson or Fisher scoring algorithm is used to maximize the (penalized) likelihood. The additive terms in the model are fitted by using a backfitting algorithm. Censored data are easily incorporated into the framework. Five data sets from different fields of application are analysed to emphasize the generality of the GAMLSS class of models.
Citations
More filters
Posted Content
TL;DR: In this paper, the authors provide a unified and comprehensive theory of structural time series models, including a detailed treatment of the Kalman filter for modeling economic and social time series, and address the special problems which the treatment of such series poses.
Abstract: In this book, Andrew Harvey sets out to provide a unified and comprehensive theory of structural time series models. Unlike the traditional ARIMA models, structural time series models consist explicitly of unobserved components, such as trends and seasonals, which have a direct interpretation. As a result the model selection methodology associated with structural models is much closer to econometric methodology. The link with econometrics is made even closer by the natural way in which the models can be extended to include explanatory variables and to cope with multivariate time series. From the technical point of view, state space models and the Kalman filter play a key role in the statistical treatment of structural time series models. The book includes a detailed treatment of the Kalman filter. This technique was originally developed in control engineering, but is becoming increasingly important in fields such as economics and operations research. This book is concerned primarily with modelling economic and social time series, and with addressing the special problems which the treatment of such series poses. The properties of the models and the methodological techniques used to select them are illustrated with various applications. These range from the modellling of trends and cycles in US macroeconomic time series to to an evaluation of the effects of seat belt legislation in the UK.

4,252 citations

Journal ArticleDOI
TL;DR: This work considers approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non‐Gaussian response variables and can directly compute very accurate approximations to the posterior marginals.
Abstract: Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality, which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.

4,164 citations

Journal ArticleDOI
TL;DR: Spirometric prediction equations for the 3–95-age range are now available that include appropriate age-dependent lower limits of normal for spirometric indices, which can be applied globally to different ethnic groups.
Abstract: The aim of the Task Force was to derive continuous prediction equations and their lower limits of normal for spirometric indices, which are applicable globally. Over 160,000 data points from 72 centres in 33 countries were shared with the European Respiratory Society Global Lung Function Initiative. Eliminating data that could not be used (mostly missing ethnic group, some outliers) left 97,759 records of healthy nonsmokers (55.3% females) aged 2.5-95 yrs. Lung function data were collated and prediction equations derived using the LMS method, which allows simultaneous modelling of the mean (mu), the coefficient of variation (sigma) and skewness (lambda) of a distribution family. After discarding 23,572 records, mostly because they could not be combined with other ethnic or geographic groups, reference equations were derived for healthy individuals aged 3-95 yrs for Caucasians (n=57,395), African-Americans (n=3,545), and North (n=4,992) and South East Asians (n=8,255). Forced expiratory value in 1 s (FEV(1)) and forced vital capacity (FVC) between ethnic groups differed proportionally from that in Caucasians, such that FEV(1)/FVC remained virtually independent of ethnic group. For individuals not represented by these four groups, or of mixed ethnic origins, a composite equation taken as the average of the above equations is provided to facilitate interpretation until a more appropriate solution is developed. Spirometric prediction equations for the 3-95-age range are now available that include appropriate age-dependent lower limits of normal. They can be applied globally to different ethnic groups. Additional data from the Indian subcontinent and Arabic, Polynesian and Latin American countries, as well as Africa will further improve these equations in the future.

3,975 citations

Journal ArticleDOI
TL;DR: The Box‐Cox power exponential (BCPE) method, with curve smoothing by cubic splines, was used to construct the curves and the concordance between smoothed percentile curves and empirical percentiles was excellent and free of bias.
Abstract: Aim: To describe the methods used to construct the WHO Child Growth Standards based on length/height, weight and age, and to present resulting growth charts. Methods: The WHO Child Growth Standards were derived from an international sample of healthy breastfed infants and young children raised in environments that do not constrain growth. Rigorous methods of data collection and standardized procedures across study sites yielded very high-quality data. The generation of the standards followed methodical, state-of-the-art statistical methodologies. The Box-Cox power exponential (BCPE) method, with curve smoothing by cubic splines, was used to construct the curves. The BCPE accommodates various kinds of distributions, from normal to skewed or kurtotic, as necessary. A set of diagnostic tools was used to detect possible biases in estimated percentiles or z-score curves. Results: There was wide variability in the degrees of freedom required for the cubic splines to achieve the best model. Except for length/height-for-age, which followed a normal distribution, all other standards needed to model skewness but not kurtosis. Length-for-age and height-for-age standards were constructed by fitting a unique model that reflected the 0.7-cm average difference between these two measurements. The concordance between smoothed percentile curves and empirical percentiles was excellent and free of bias. Percentiles and z-score curves for boys and girls aged 0–60 mo were generated for weight-for-age, length/height-for-age, weight-for-length/height (45 to 110 cm and 65 to 120 cm, respectively) and body mass index-for-age. Conclusion: The WHO Child Growth Standards depict normal growth under optimal environmental conditions and can be used to assess children everywhere, regardless of ethnicity, socio-economic status and type of feeding.

2,850 citations


Cites methods from "Generalized additive models for loc..."

  • ...This method is included in a broader methodology, the GAMLSS [29], which offers a general framework that includes a wide range of known methods for constructing growth curves....

    [...]

  • ...The first four distributions were fitted using the GAMLSS (Generalized Additive Models for Location, Scale and Shape) software [27] and the last using the ‘‘xriml’’ module in the STATA software [28]....

    [...]

  • ...Using GAMLSS, comparisons were carried out for length/height-for-age, weight-for-age and weight-forlength/height....

    [...]

  • ...The GAMLSS allows for modelling the mean (or location) of the growth variable under consideration as well as other parameters of its distribution that determine scale and shape....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Abstract: The history of the development of statistical hypothesis testing in time series analysis is reviewed briefly and it is pointed out that the hypothesis testing procedure is not adequately defined as the procedure for statistical model identification. The classical maximum likelihood estimation procedure is reviewed and a new estimate minimum information theoretical criterion (AIC) estimate (MAICE) which is designed for the purpose of statistical identification is introduced. When there are several competing models the MAICE is defined by the model and the maximum likelihood estimates of the parameters which give the minimum of AIC defined by AIC = (-2)log-(maximum likelihood) + 2(number of independently adjusted parameters within the model). MAICE provides a versatile procedure for statistical model identification which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure. The practical utility of MAICE in time series analysis is demonstrated with some numerical examples.

47,133 citations


"Generalized additive models for loc..." refers methods in this paper

  • ...(a) Procedure 1: estimate the hyperparameters λ by one of the methods (i) minimizing a profile generalized Akaike information criterion GAIC over λ, (ii) minimizing a profile generalized cross-validation criterion over λ, (iii) maximizing the approximate marginal density (or profile marginal likelihood) for λ by using a Laplace approximation or (iv) approximately maximizing the marginal likelihood for λ by using an (approximate) EM algorithm. (b) Procedure 2: for fixed current hyperparameters λ, use the GAMLSS (RS or CG) algorithm to obtain posterior mode (MAP) estimates of .β, γ/. Procedure 2 is nested within procedure 1 and a numerical algorithm is used to estimate λ. We now consider the methods in more detail. A.2.1. Minimizing a profile generalized Akaike information criterion over λ GAIC (Akaike, 1983) was considered by Hastie and Tibshirani (1990), pages 160 and 261, for hyperparameter estimation in GAMs. In GAMs a cubic smoothing spline function h.x/ is used to model the dependence of a predictor on explanatory variable x. For a single smoothing spline term, since λ is related to the smoothing degrees of freedom df = tr.S/ through equation (6), selection (or estimation) of λ may be achieved by minimizing GAIC.#/, which is defined in Section 6.2, over λ. When the model contains p cubic smoothing spline functions in different explanatory variables, then the corresponding p smoothing hyperparameters λ= .λ1, λ2, . . . , λp/ can be jointly estimated by minimizing GAIC.#/ over λ. However, with multiple smoothing splines Σpj=1 tr.Sj/ is only an approximation to the full model complexity degrees of freedom. The GAIC.#/ criterion can be applied more generally to estimate hyperparameters λ in the distribution of random-effects terms. The (model complexity) degrees of freedom df need to be obtained for models with random-effects terms. This has been considered by Hodges and Sargent (2001). The degrees of freedom of a model with a single random-effects term can be defined as the trace of the random-effect (shrinkage) smoother S, i.e. df= tr.S/, where S is given by equation (6). As with smoothing terms, when there are other terms in the model Σpj=1tr.Sj/ is only an approximation to the full model complexity degrees of freedom. The full model complexity degrees of freedom for model (1) are given by df= tr.A−1B/ where A is defined in Appendix C and B is obtained from A by omitting the matrices Gjk for j =1, 2, . . . , Jk and k =1, 2, . . . , p. A.2.2. Minimizing a generalized cross-validation over λ The generalized cross-validation criterion was considered by Hastie and Tibshirani (1990), pages 259– 263, for hyperparameter estimation in GAMs....

    [...]

  • ...Using an Akaike information criterion, i.e. GAIC.2/, for hyperparameter selection, as discussed in Section 6.2 and Appendix A.2.1, led to the conclusion that the random-effect parameters for ν and τ are not needed, i.e. σ3 =σ4 =0. The remaining random-effect parameters were estimated by using the approximate marginal likelihood approach, which is described in Appendix A.2.3, giving fitted parameter values σ̂1 = 13:14 and σ̂2 = 0:0848 with corresponding fixed effects parameter values β̂1 =164:8, β̂2 =−2:213, β̂3 =−0:0697 and β̂4 =2:148 and an approximate marginal deviance of 3118.62 obtained from equation (14) in Appendix A.2.3. This was the chosen fitted model. Since ν̂ = β̂3 =−0:0697 is close to 0, the fitted conditional distribution of yij is approximately defined by σ̂−1 ij log.yij=μ̂ij/∼ tτ̂ , a t-distribution with τ̂ =exp.β̂4/=8:57 degrees of freedom, for i=1, 2, . . . , nj and j =1, 2, . . . , J . Fig. 5 plots the sample and fitted medians (μ) of prind against state (ordered by the sample median). The fitted values of σ (which are not shown here) vary very little. The heterogeneity in the sample variances of prind between the states (in Fig. 4) seems to be primarily due to sampling variation caused by the high skewness and kurtosis in the conditional distribution of y (rather than either the variance–mean relationship or the random effect in σ). Fig. 6 provides marginal (Laplace-approximated) profile deviance plots, as described in Section 6.2, for each of ν and τ , for fixed hyperparameters, giving 95% intervals .−0:866, 0:788/ for ν and .4:6, 196:9/ for τ , indicating considerable uncertainty about these parameters. (The fitted model suggests a log-transformation for y, whereas the added variable plot that was used by Hodges (1998) suggested a Box–Cox transformation parameter ν =0:67 which, although rather different, still lies within the 95% interval for ν....

    [...]

  • ...The Akaike information criterion AIC (Akaike, 1974) and the Schwarz Bayesian criterion SBC (Schwarz, 1978) are special cases of the GAIC....

    [...]

  • ...(a) Procedure 1: estimate the hyperparameters λ by one of the methods (i) minimizing a profile generalized Akaike information criterion GAIC over λ, (ii) minimizing a profile generalized cross-validation criterion over λ, (iii) maximizing the approximate marginal density (or profile marginal likelihood) for λ by using a Laplace approximation or (iv) approximately maximizing the marginal likelihood for λ by using an (approximate) EM algorithm. (b) Procedure 2: for fixed current hyperparameters λ, use the GAMLSS (RS or CG) algorithm to obtain posterior mode (MAP) estimates of .β, γ/. Procedure 2 is nested within procedure 1 and a numerical algorithm is used to estimate λ. We now consider the methods in more detail. A.2.1. Minimizing a profile generalized Akaike information criterion over λ GAIC (Akaike, 1983) was considered by Hastie and Tibshirani (1990), pages 160 and 261, for hyperparameter estimation in GAMs....

    [...]

  • ...Using an Akaike information criterion, i.e. GAIC.2/, for hyperparameter selection, as discussed in Section 6.2 and Appendix A.2.1, led to the conclusion that the random-effect parameters for ν and τ are not needed, i.e. σ3 =σ4 =0. The remaining random-effect parameters were estimated by using the approximate marginal likelihood approach, which is described in Appendix A.2.3, giving fitted parameter values σ̂1 = 13:14 and σ̂2 = 0:0848 with corresponding fixed effects parameter values β̂1 =164:8, β̂2 =−2:213, β̂3 =−0:0697 and β̂4 =2:148 and an approximate marginal deviance of 3118.62 obtained from equation (14) in Appendix A.2.3. This was the chosen fitted model. Since ν̂ = β̂3 =−0:0697 is close to 0, the fitted conditional distribution of yij is approximately defined by σ̂−1 ij log.yij=μ̂ij/∼ tτ̂ , a t-distribution with τ̂ =exp.β̂4/=8:57 degrees of freedom, for i=1, 2, . . . , nj and j =1, 2, . . . , J . Fig. 5 plots the sample and fitted medians (μ) of prind against state (ordered by the sample median). The fitted values of σ (which are not shown here) vary very little. The heterogeneity in the sample variances of prind between the states (in Fig. 4) seems to be primarily due to sampling variation caused by the high skewness and kurtosis in the conditional distribution of y (rather than either the variance–mean relationship or the random effect in σ). Fig. 6 provides marginal (Laplace-approximated) profile deviance plots, as described in Section 6.2, for each of ν and τ , for fixed hyperparameters, giving 95% intervals .−0:866, 0:788/ for ν and .4:6, 196:9/ for τ , indicating considerable uncertainty about these parameters. (The fitted model suggests a log-transformation for y, whereas the added variable plot that was used by Hodges (1998) suggested a Box–Cox transformation parameter ν =0:67 which, although rather different, still lies within the 95% interval for ν. Furthermore the wide interval for τ suggests that a conditional distribution model for yij defined by σ −1 ij log.yij=μij/∼N.0, 1/ may provide a reasonable model. This model has σ̂1 =13:07 and σ̂2 =0:105.) Fig. 7(a) provides a normal QQ-plot for the (normalized quantile) residuals, which were defined in Section 6.2, for the chosen model. Fig. 7(a) indicates an adequate model for the conditional distribution of y. The outlier case for Washington state, identified by Hodges (1998), does not appear to be an outlier in this analysis....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Abstract: The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.

38,681 citations

01 Jan 2005
TL;DR: In this paper, the problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion.
Abstract: The problem of selecting one of a number of models of different dimensions is treated by finding its Bayes solution, and evaluating the leading terms of its asymptotic expansion. These terms are a valid large-sample criterion beyond the Bayesian context, since they do not depend on the a priori distribution.

36,760 citations


"Generalized additive models for loc..." refers methods in this paper

  • ...In our opinion the Schwarz Bayesian criterion (SBC) is too conservative (i.e. restrictive) in its model selection, leading to bias in the selected functions for µ, σ, ν and τ (particularly at turning-points), whereas the AIC is too liberal, leading to rough (or erratic) selected functions....

    [...]

  • ...The conclusion, by looking at the SBC values, is that the Box–Cox t-distribution family fits best, indicating that the distribution of the river flow is both skew and leptokurtic....

    [...]

  • ...There is strong support for including a smoothing term for loglos as indicated by the reduction in AIC and SBC for model III compared with model II....

    [...]

  • ...This model resulted in an SBC of 3325....

    [...]

  • ...Column I of Table 3 shows the global deviance GD and SBC for the resulting model I given by {µ= poly.r1, 2/ + r2 + r3 + .tw + p + p1 Å t90/ Å tp − .tw + p + p1 Å t90/}, for a variety of distribution families, where any scale and shape parameters (e.g. σ, ν and τ ) were modelled as constants....

    [...]

Book
01 Jan 1983
TL;DR: In this paper, a generalization of the analysis of variance is given for these models using log- likelihoods, illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables), and gamma (variance components).
Abstract: The technique of iterative weighted linear regression can be used to obtain maximum likelihood estimates of the parameters with observations distributed according to some exponential family and systematic effects that can be made linear by a suitable transformation. A generalization of the analysis of variance is given for these models using log- likelihoods. These generalized linear models are illustrated by examples relating to four distributions; the Normal, Binomial (probit analysis, etc.), Poisson (contingency tables) and gamma (variance components).

23,215 citations

Book
28 Jul 2013
TL;DR: In this paper, the authors describe the important ideas in these areas in a common conceptual framework, and the emphasis is on concepts rather than mathematics, with a liberal use of color graphics.
Abstract: During the past decade there has been an explosion in computation and information technology. With it have come vast amounts of data in a variety of fields such as medicine, biology, finance, and marketing. The challenge of understanding these data has led to the development of new tools in the field of statistics, and spawned new areas such as data mining, machine learning, and bioinformatics. Many of these tools have common underpinnings but are often expressed with different terminology. This book describes the important ideas in these areas in a common conceptual framework. While the approach is statistical, the emphasis is on concepts rather than mathematics. Many examples are given, with a liberal use of color graphics. It is a valuable resource for statisticians and anyone interested in data mining in science or industry. The book's coverage is broad, from supervised learning (prediction) to unsupervised learning. The many topics include neural networks, support vector machines, classification trees and boosting---the first comprehensive treatment of this topic in any book. This major new edition features many topics not covered in the original, including graphical models, random forests, ensemble methods, least angle regression and path algorithms for the lasso, non-negative matrix factorization, and spectral clustering. There is also a chapter on methods for ``wide'' data (p bigger than n), including multiple testing and false discovery rates. Trevor Hastie, Robert Tibshirani, and Jerome Friedman are professors of statistics at Stanford University. They are prominent researchers in this area: Hastie and Tibshirani developed generalized additive models and wrote a popular book of that title. Hastie co-developed much of the statistical modeling software and environment in R/S-PLUS and invented principal curves and surfaces. Tibshirani proposed the lasso and is co-author of the very successful An Introduction to the Bootstrap. Friedman is the co-inventor of many data-mining tools including CART, MARS, projection pursuit and gradient boosting.

19,261 citations