scispace - formally typeset
Search or ask a question

Showing papers on "Mixed model published in 2016"


Journal ArticleDOI
TL;DR: A new method is named as Fixed and random model Circulating Probability Unification (FarmCPU) that improves statistical power compared to current methods and avoids model over-fitting problem in FEM.
Abstract: False positives in a Genome-Wide Association Study (GWAS) can be effectively controlled by a fixed effect and random effect Mixed Linear Model (MLM) that incorporates population structure and kinship among individuals to adjust association tests on markers; however, the adjustment also compromises true positives. The modified MLM method, Multiple Loci Linear Mixed Model (MLMM), incorporates multiple markers simultaneously as covariates in a stepwise MLM to partially remove the confounding between testing markers and kinship. To completely eliminate the confounding, we divided MLMM into two parts: Fixed Effect Model (FEM) and a Random Effect Model (REM) and use them iteratively. FEM contains testing markers, one at a time, and multiple associated markers as covariates to control false positives. To avoid model over-fitting problem in FEM, the associated markers are estimated in REM by using them to define kinship. The P values of testing markers and the associated markers are unified at each iteration. We named the new method as Fixed and random model Circulating Probability Unification (FarmCPU). Both real and simulated data analyses demonstrated that FarmCPU improves statistical power compared to current methods. Additional benefits include an efficient computing time that is linear to both number of individuals and number of markers. Now, a dataset with half million individuals and half million markers can be analyzed within three days.

803 citations


Journal ArticleDOI
06 Jun 2016-PLOS ONE
TL;DR: Sommer as discussed by the authors is an open-source R package to facilitate the use of mixed models for genomic selection and hybrid prediction purposes using more than one variance component and allowing specification of covariance structures.
Abstract: Most traits of agronomic importance are quantitative in nature, and genetic markers have been used for decades to dissect such traits. Recently, genomic selection has earned attention as next generation sequencing technologies became feasible for major and minor crops. Mixed models have become a key tool for fitting genomic selection models, but most current genomic selection software can only include a single variance component other than the error, making hybrid prediction using additive, dominance and epistatic effects unfeasible for species displaying heterotic effects. Moreover, Likelihood-based software for fitting mixed models with multiple random effects that allows the user to specify the variance-covariance structure of random effects has not been fully exploited. A new open-source R package called sommer is presented to facilitate the use of mixed models for genomic selection and hybrid prediction purposes using more than one variance component and allowing specification of covariance structures. The use of sommer for genomic prediction is demonstrated through several examples using maize and wheat genotypic and phenotypic data. At its core, the program contains three algorithms for estimating variance components: Average information (AI), Expectation-Maximization (EM) and Efficient Mixed Model Association (EMMA). Kernels for calculating the additive, dominance and epistatic relationship matrices are included, along with other useful functions for genomic analysis. Results from sommer were comparable to other software, but the analysis was faster than Bayesian counterparts in the magnitude of hours to days. In addition, ability to deal with missing data, combined with greater flexibility and speed than other REML-based software was achieved by putting together some of the most efficient algorithms to fit models in a gentle environment such as R.

413 citations


01 Jan 2016

356 citations


Journal ArticleDOI
TL;DR: An R package, robustlmm, is introduced, designed to robustly fit linear mixed-effects models, to provide estimates where contamination has only little influence and to detect and flag contamination.
Abstract: As any real-life data, data modeled by linear mixed-effects models often contain outliers or other contamination. Even little contamination can drive the classic estimates far away from what they would be without the contamination. At the same time, datasets that require mixed-effects modeling are often complex and large. This makes it difficult to spot contamination. Robust estimation methods aim to solve both problems: to provide estimates where contamination has only little influence and to detect and flag contamination. We introduce an R package, robustlmm, to robustly fit linear mixed-effects models. The package's functions and methods are designed to closely equal those offered by lme4, the R package that implements classic linear mixed-effects model estimation in R. The robust estimation method in robustlmm is based on the random effects contamination model and the central contamination model. Contamination can be detected at all levels of the data. The estimation method does not make any assumption on the data's grouping structure except that the model parameters are estimable. robustlmm supports hierarchical and non-hierarchical (e.g., crossed) grouping structures. The robustness of the estimates and their asymptotic efficiency is fully controlled through the function interface. Individual parts (e.g., fixed effects and variance components) can be tuned independently. In this tutorial, we show how to fit robust linear mixed-effects models using robustlmm, how to assess the model fit, how to detect outliers, and how to compare different fits.

340 citations


Journal ArticleDOI
TL;DR: Neuroscientists have been slower than others in changing their statistical habits and are now urged to act, because mixed models clearly provide a better framework.

262 citations


Journal ArticleDOI
TL;DR: The software Selegen-REML/BLUP uses mixed models, and was developed to optimize the routine of plant breeding programs, and is friendly, easy to use and interpret, and allows dealing efficiently with most of the situations in plant breeding.
Abstract: The software Selegen-REML/BLUP uses mixed models, and was developed to optimize the routine of plant breeding programs. It addresses the following plants categories: allogamous, automagous, of mixed mating system, and of clonal propagation. It considers several experimental designs, mating designs, genotype x environment interaction, experiments repeated over sites, repeated measures, progenies belonging to several populations, among other factors. The software adjusts effects, estimates variance components, genetic additive, dominance and genotypic values of individuals, genetic gain with selection, effective population size, and other parameters of interest to plant breeding. It allows testing the significance of the effects by means of likelihood ratio test (LRT) and analysis of deviance. It addresses continuous variables (linear models) and categorical variables (generalized linear models). Selegen-REML/ BLUP is friendly, easy to use and interpret, and allows dealing efficiently with most of the situations in plant breeding. It is free and available at http://www. det.ufv.br/ppestbio/corpo_docente.php under the author?s name.

258 citations


Journal ArticleDOI
26 Jan 2016-JAMA
TL;DR: This Guide to Statistics and Methods discusses analyzing repeated measurements using mixed models and the importance of knowing which models to use for each measurement.
Abstract: This Guide to Statistics and Methods discusses analyzing repeated measurements using mixed models.

226 citations


Journal ArticleDOI
TL;DR: Compared to current standard REML software based on the mixed model equation, the method is substantially faster and particularly useful for multivariate analysis, including multi-trait models and random regression models for studying reaction norms.
Abstract: We have developed an algorithm for genetic analysis of complex traits using genome-wide SNPs in a linear mixed model framework. Compared to current standard REML software based on the mixed model equation, our method is substantially faster. The advantage is largest when there is only a single genetic covariance structure. The method is particularly useful for multivariate analysis, including multi-trait models and random regression models for studying reaction norms. We applied our proposed method to publicly available mice and human data and discuss the advantages and limitations.

162 citations


Journal ArticleDOI
01 Nov 2016-Genetics
TL;DR: The accuracy of the resulting method for evolutionary prediction by simulation is demonstrated, and known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of these expressions.
Abstract: Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioral traits, have inherently nonnormal distributions. The generalized linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for nonnormal traits. However, whereas GLMMs provide inference on a statistically convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGglmm. We show that known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation and apply our approach to data from a wild pedigreed vertebrate population.

148 citations



Journal ArticleDOI
TL;DR: A more general LMM with two random effects—one based on genomic variants and one based on easily measured spatial location as a proxy for environmental effects to address the issue of “missing heritability” in the sense that much of the heritability previously thought to be missing was fictional.
Abstract: The linear mixed model (LMM) is now routinely used to estimate heritability. Unfortunately, as we demonstrate, LMM estimates of heritability can be inflated when using a standard model. To help reduce this inflation, we used a more general LMM with two random effects—one based on genomic variants and one based on easily measured spatial location as a proxy for environmental effects. We investigated this approach with simulated data and with data from a Uganda cohort of 4,778 individuals for 34 phenotypes including anthropometric indices, blood factors, glycemic control, blood pressure, lipid tests, and liver function tests. For the genomic random effect, we used identity-by-descent estimates from accurately phased genome-wide data. For the environmental random effect, we constructed a covariance matrix based on a Gaussian radial basis function. Across the simulated and Ugandan data, narrow-sense heritability estimates were lower using the more general model. Thus, our approach addresses, in part, the issue of “missing heritability” in the sense that much of the heritability previously thought to be missing was fictional. Software is available at https://github.com/MicrosoftGenomics/FaST-LMM.

Posted Content
TL;DR: A problem that tends to be ignored in the statistical analysis of experimental data in the language sciences is that responses often constitute time series, which raises the problem of autocorrelated errors.
Abstract: A problem that tends to be ignored in the statistical analysis of experimental data in the language sciences is that responses often constitute time series, which raises the problem of autocorrelated errors. If the errors indeed show autocorrelational structure, evaluation of the significance of predictors in the model becomes problematic due to potential anti-conservatism of p-values. This paper illustrates two tools offered by Generalized Additive Mixed Models (GAMMs) (Lin and Zhang, 1999; Wood, 2006, 2011, 2013) for dealing with autocorrelated errors, as implemented in the current version of the fourth author's mgcv package (1.8.9): the possibility to specify an ar(1) error model for Gaussian models, and the possibility of using factor smooths for random-effect factors such as subject and item. These factor smooths are set up to have the same smoothing parameters, and are penalized to yield the non-linear equivalent of random intercepts and random slopes in the classical linear framework. Three case studies illustrate these issues.

Journal ArticleDOI
TL;DR: The epsilon-method has been implemented by us as a generic option in the open-source Template Model Builder software, and could be adapted within other mixed-effects modeling tools such as Automatic Differentiation Model Builder for random effects.

Journal ArticleDOI
TL;DR: A multilevel mixed effects model is proposed that takes advantage of all available dose-response data and yields a significant reduction of extreme IC50 estimates, an increase in precision and it runs orders of magnitude faster.
Abstract: Aim: Experimental variation in dose–response data of drugs tested on cell lines result in inaccuracies in the estimate of a key drug sensitivity characteristic: the IC50. We aim to improve the precision of the half-limiting dose (IC50) estimates by simultaneously employing all dose–responses across all cell lines and drugs, rather than using a single drug–cell line response. Materials & methods: We propose a multilevel mixed effects model that takes advantage of all available dose–response data. Results: The new estimates are highly concordant with the currently used Bayesian model when the data are well behaved. Otherwise, the multilevel model is clearly superior. Conclusion: The multilevel model yields a significant reduction of extreme IC50 estimates, an increase in precision and it runs orders of magnitude faster.

Journal ArticleDOI
TL;DR: This work presents a method to compare estimates of variance components across different relationship models, and shows that heritabilities from identity-by-state and kernel-based relationships are overestimated.

Journal ArticleDOI
TL;DR: In this article, the authors study the behavior of the restricted maximum likelihood (REML) estimator under a misspecified linear mixed model (LMM) that has received much attention in recent genomewide association studies.
Abstract: We study behavior of the restricted maximum likelihood (REML) estimator under a misspecified linear mixed model (LMM) that has received much attention in recent genome-wide association studies. The asymptotic analysis establishes consistency of the REML estimator of the variance of the errors in the LMM, and convergence in probability of the REML estimator of the variance of the random effects in the LMM to a certain limit, which is equal to the true variance of the random effects multiplied by the limiting proportion of the nonzero random effects present in the LMM. The asymptotic results also establish convergence rate (in probability) of the REML estimators as well as a result regarding convergence of the asymptotic conditional variance of the REML estimator. The asymptotic results are fully supported by the results of empirical studies, which include extensive simulation studies that compare the performance of the REML estimator (under the misspecified LMM) with other existing methods, and real data applications (only one example is presented) that have important genetic implications.

Journal ArticleDOI
TL;DR: In this paper, a generalized additive mixed model (GAMM) is used to decompose population change into a long-term, smooth, trend component and a component for short-term fluctuations.
Abstract: Summary Estimating trends of populations distributed across wide areas is important for conservation and management of animals. Surveys in the form of annually repeated counts across a number of sites are used in many monitoring programmes, and from these, nonlinear trends may be estimated using generalized additive models (GAM). I use generalized additive mixed models (GAMM) to decompose population change into a long-term, smooth, trend component and a component for short-term fluctuations. The long-term population trend is modelled as a smooth function of time and short-term fluctuations as temporal random effects. The methods are applied to analyse trends in goldcrest and greenfinch populations in Sweden using data from the Swedish Breeding Bird Survey. I use simulations to investigate statistical properties of the model. The model separates short-term fluctuations from longer term population change. Depending on the amount of noise in the population fluctuations, estimated long-term trends can differ markedly from estimates based on standard GAMs. For the goldcrest with wide among-year fluctuations, trends estimated with GAMs suggest that the population has in recent years recovered from a decline. When filtering out, short-term fluctuations analyses suggest that the population has been in steady decline since the beginning of the survey. Simulations suggest that trend estimation using the GAMM model reduces spurious detection of long-term population change found with estimates from a GAM model, but gives similar mean square errors. The simulations therefore suggest that the GAMM model, which decomposes population change, estimates uncertainty of long-term trends more accurately at little cost in detecting them. Policy implications. Filtering out short-term fluctuations in the estimation of long-term smooth trends using temporal random effects in a generalized additive mixed model provides more robust inference about the long-term trends compared to when such random effects are not used. This can have profound effects on management decisions, as illustrated in an example for goldcrest in the Swedish breeding bird survey. In the example, if temporal random effects were not used, red listing would be highly influenced by the specific year in which it was done. When temporal random effects are used, red listing is stable over time. The methods are available in an R-package, poptrend.

Journal ArticleDOI
TL;DR: Three extensions of the two-part random effects models, allowing the positive values to follow a generalized gamma distribution, a log-skew-normal distribution, and a normal distribution after the Box-Cox transformation are considered, finding that all three models provide a significantly better fit than the log-normal model, and there exists strong evidence for heteroscedasticity.
Abstract: Two-part random effects models (Olsen and Schafer,(1) Tooze et al.(2)) have been applied to repeated measures of semi-continuous data, characterized by a mixture of a substantial proportion of zero values and a skewed distribution of positive values. In the original formulation of this model, the natural logarithm of the positive values is assumed to follow a normal distribution with a constant variance parameter. In this article, we review and consider three extensions of this model, allowing the positive values to follow (a) a generalized gamma distribution, (b) a log-skew-normal distribution, and (c) a normal distribution after the Box-Cox transformation. We allow for the possibility of heteroscedasticity. Maximum likelihood estimation is shown to be conveniently implemented in SAS Proc NLMIXED. The performance of the methods is compared through applications to daily drinking records in a secondary data analysis from a randomized controlled trial of topiramate for alcohol dependence treatment. We find that all three models provide a significantly better fit than the log-normal model, and there exists strong evidence for heteroscedasticity. We also compare the three models by the likelihood ratio tests for non-nested hypotheses (Vuong(3)). The results suggest that the generalized gamma distribution provides the best fit, though no statistically significant differences are found in pairwise model comparisons.

Journal ArticleDOI
TL;DR: Investigators interested in single time point comparisons should use a mixed models for repeated measures with a contrast to gain power and unbiased estimation of treatment effects instead of a complete-case two sample t-test.
Abstract: The primary analysis in a longitudinal randomized controlled trial is sometimes a comparison of arms at a single time point. While a two-sample t-test is often used, missing data are common in longitudinal studies and decreases power by reducing sample size. Mixed models for repeated measures (MMRM) can test treatment effects at specific time points, have been shown to give unbiased estimates in certain missing data contexts, and may be more powerful than a two sample t-test. We conducted a simulation study to compare the performance of a complete-case t-test to a MMRM in terms of power and bias under different missing data mechanisms. Impact of within- and between-person variance, dropout mechanism, and variance-covariance structure were all considered. While both complete-case t-test and MMRM provided unbiased estimation of treatment differences when data were missing completely at random, MMRM yielded an absolute power gain of up to 12 %. The MMRM provided up to 25 % absolute increased power over the t-test when data were missing at random, as well as unbiased estimation. Investigators interested in single time point comparisons should use a MMRM with a contrast to gain power and unbiased estimation of treatment effects instead of a complete-case two sample t-test.

Journal ArticleDOI
TL;DR: In this article, an example-based approach, built upon a number of datasets, covering the main types of LTEs, with increasing levels of complexity, is outlined to build statistical models for data analysis that is useful for all LTEs characterised by the simultaneous presence of all rotation phases in all years, together with within-year replication.

Journal ArticleDOI
TL;DR: In this paper, a comprehensive framework for additive regression models for non-Gaussian functional responses, allowing for multiple nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data as well as linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response.
Abstract: We propose a comprehensive framework for additive regression models for non-Gaussian functional responses, allowing for multiple (partially) nested or crossed functional random effects with flexible correlation structures for, e.g., spatial, temporal, or longitudinal functional data as well as linear and nonlinear effects of functional and scalar covariates that may vary smoothly over the index of the functional response. Our implementation handles functional responses from any exponential family distribution as well as many others like Beta- or scaled and shifted t-distributions. Development is motivated by and evaluated on an application to large-scale longitudinal feeding records of pigs. Results in extensive simulation studies as well as replications of two previously published simulation studies for generalized functional mixed models demonstrate the good performance of our proposal. The approach is implemented in well-documented open source software in the pffr function in R-package refund.

Journal ArticleDOI
TL;DR: In this paper, the empirical best predictor (EBP) of weighted sums of probabilities is calculated and compared with plug-in estimators, and an approximation to the mean-squared error (MSE) of the EBP is derived and a bias-corrected MSE estimator is given.
Abstract: The article applies unit-level logit mixed models to estimating small-area weighted sums of probabilities. The model parameters are estimated by the method of simulated moments (MSM). The empirical best predictor (EBP) of weighted sums of probabilities is calculated and compared with plug-in estimators. An approximation to the mean-squared error (MSE) of the EBP is derived and a bias-corrected MSE estimator is given and compared with parametric bootstrap alternatives. Some simulation experiments are carried out to study the empirical behavior of the model parameter MSM estimators, the EBP and plug-in estimators and the MSE estimators. An application to the estimation of poverty proportions in the counties of the region of Valencia, Spain, is given.

Journal ArticleDOI
TL;DR: It is shown how not accounting for the drivers of spatial heterogeneity in statistical models can cause contradictory findings regarding relationship direction across spatial scales, and how mixed effects models can remedy this multiscaling issue.
Abstract: Not accounting for spatial heterogeneity in ecological analyses can cause modeled relationships to vary across spatial scales, specifically different levels of spatial resolution. These varying results hinder both the utility of data collected at one spatial scale for analyses at others and the determination of underlying processes. To briefly review existing methods for analyzing data collected at multiple scales, highlight the effects of spatial heterogeneity on the utility of these methods, and to illustrate a practical statistical method to account for the sources of spatial heterogeneity when they are unknown. Using simulated examples, we show how not accounting for the drivers of spatial heterogeneity in statistical models can cause contradictory findings regarding relationship direction across spatial scales. We then show how mixed effects models can remedy this multiscaling issue. Ignoring sources of spatial heterogeneity in statistical models with coarse spatial scales produced contradictory results to the true underlying relationship. Treating drivers of spatial heterogeneity as random effects in a mixed effects model, however, allowed us to uncover this true relationship. Mixed effects models is advantageous as it is not always necessary to know the influential explanatory variables that cause spatial heterogeneity and no additional data are required. Furthermore, this approach is well documented, can be applied to data having various distribution types, and is easily executable using multiple statistical packages.

Journal ArticleDOI
TL;DR: A novel statistical model is developed that generalizes standard mixed models for longitudinal data that include flexible mean functions as well as combined compound symmetry (CS) and autoregressive (AR) covariance structures.
Abstract: Practical Bayesian nonparametric methods have been developed across a wide variety of contexts. Here, we develop a novel statistical model that generalizes standard mixed models for longitudinal data that include flexible mean functions as well as combined compound symmetry (CS) and autoregressive (AR) covariance structures. AR structure is often specified through the use of a Gaussian process (GP) with covariance functions that allow longitudinal data to be more correlated if they are observed closer in time than if they are observed farther apart. We allow for AR structure by considering a broader class of models that incorporates a Dirichlet process mixture (DPM) over the covariance parameters of the GP. We are able to take advantage of modern Bayesian statistical methods in making full predictive inferences and about characteristics of longitudinal profiles and their differences across covariate combinations. We also take advantage of the generality of our model, which provides for estimation ...

Journal ArticleDOI
TL;DR: A number of pharmacokinetic models which are known to be non-identifiable at an individual level but can become identifiable at the population level if a number of specific assumptions on the probabilistic model hold.
Abstract: We discuss the question of model identifiability within the context of nonlinear mixed effects models. Although there has been extensive research in the area of fixed effects models, much less attention has been paid to random effects models. In this context we distinguish between theoretical identifiability, in which different parameter values lead to non-identical probability distributions, structural identifiability which concerns the algebraic properties of the structural model, and practical identifiability, whereby the model may be theoretically identifiable but the design of the experiment may make parameter estimation difficult and imprecise. We explore a number of pharmacokinetic models which are known to be non-identifiable at an individual level but can become identifiable at the population level if a number of specific assumptions on the probabilistic model hold. Essentially if the probabilistic models are different, even though the structural models are non-identifiable, then they will lead to different likelihoods. The findings are supported through simulations.

Journal ArticleDOI
TL;DR: The article illustrates the use of linear and nonlinear mixed models for dose-response relationships accounting for heterogeneous residual variances, discusses important diagnostics and their implications for inference, and provides practical recommendations for computational troubleshooting.
Abstract: Advanced methods for dose-response assessments are used to estimate the minimum concentrations of a nutrient that maximizes a given outcome of interest, thereby determining nutritional requirements for optimal performance. Contrary to standard modeling assumptions, experimental data often present a design structure that includes correlations between observations (i.e., blocking, nesting, etc.) as well as heterogeneity of error variances; either can mislead inference if disregarded. Our objective is to demonstrate practical implementation of linear and nonlinear mixed models for dose-response relationships accounting for correlated data structure and heterogeneous error variances. To illustrate, we modeled data from a randomized complete block design study to evaluate the standardized ileal digestible (SID) Trp:Lys ratio dose-response on G:F of nursery pigs. A base linear mixed model was fitted to explore the functional form of G:F relative to Trp:Lys ratios and assess model assumptions. Next, we fitted 3 competing dose-response mixed models to G:F, namely a quadratic polynomial (QP) model, a broken-line linear (BLL) ascending model, and a broken-line quadratic (BLQ) ascending model, all of which included heteroskedastic specifications, as dictated by the base model. The GLIMMIX procedure of SAS (version 9.4) was used to fit the base and QP models and the NLMIXED procedure was used to fit the BLL and BLQ models. We further illustrated the use of a grid search of initial parameter values to facilitate convergence and parameter estimation in nonlinear mixed models. Fit between competing dose-response models was compared using a maximum likelihood-based Bayesian information criterion (BIC). The QP, BLL, and BLQ models fitted on G:F of nursery pigs yielded BIC values of 353.7, 343.4, and 345.2, respectively, thus indicating a better fit of the BLL model. The BLL breakpoint estimate of the SID Trp:Lys ratio was 16.5% (95% confidence interval [16.1, 17.0]). Problems with the estimation process rendered results from the BLQ model questionable. Importantly, accounting for heterogeneous variance enhanced inferential precision as the breadth of the confidence interval for the mean breakpoint decreased by approximately 44%. In summary, the article illustrates the use of linear and nonlinear mixed models for dose-response relationships accounting for heterogeneous residual variances, discusses important diagnostics and their implications for inference, and provides practical recommendations for computational troubleshooting.

Journal ArticleDOI
TL;DR: A multivariate mixed model from quantitative genetics is expanded in order to estimate the magnitude of climate effects in a global sample of recent human crania, and the multivariate model incorporating a climate predictor is preferred in model comparison.
Abstract: Objectives We expand upon a multivariate mixed model from quantitative genetics in order to estimate the magnitude of climate effects in a global sample of recent human crania. In humans, genetic distances are correlated with distances based on cranial form, suggesting that population structure influences both genetic and quantitative trait variation. Studies controlling for this structure have demonstrated significant underlying associations of cranial distances with ecological distances derived from climate variables. However, to assess the biological importance of an ecological predictor, estimates of effect size and uncertainty in the original units of measurement are clearly preferable to significance claims based on units of distance. Unfortunately, the magnitudes of ecological effects are difficult to obtain with distance-based methods, while models that produce estimates of effect size generally do not scale to high-dimensional data like cranial shape and form. Methods Using recent innovations that extend quantitative genetics mixed models to highly multivariate observations, we estimate morphological effects associated with a climate predictor for a subset of the Howells craniometric dataset. Results Several measurements, particularly those associated with cranial vault breadth, show a substantial linear association with climate, and the multivariate model incorporating a climate predictor is preferred in model comparison. Conclusions Previous studies demonstrated the existence of a relationship between climate and cranial form. The mixed model quantifies this relationship concretely. Evolutionary questions that require population structure and phylogeny to be disentangled from potential drivers of selection may be particularly well addressed by mixed models. Am J Phys Anthropol, 2015. © 2015 Wiley Periodicals, Inc.

Journal ArticleDOI
TL;DR: In this article, the authors derived optimal designs for the prediction of individual response curves within the framework of hierarchical linear mixed models and showed that the so-obtained optimal designs may differ substantially from those propagated in the literature so far and that the latter may become useless.
Abstract: Summary Characterizations of optimal designs are derived for the prediction of individual response curves within the framework of hierarchical linear mixed models. It is shown that the so-obtained optimal designs may differ substantially from those propagated in the literature so far and that the latter may become useless in terms of their performance.

Journal ArticleDOI
01 Jun 2016-Catena
TL;DR: In this paper, a 3D model of the natural logarithm of the soil was compared with 2D depth-interval specific models, and 5 environmental covariates were used as predictors in modeling the lateral trend.
Abstract: For mapping soil properties in three dimensions the simplest option is to choose a series of depth intervals, and to calibrate a two-dimensional (2-D) model for each interval. The alternative is to calibrate a full three dimensional (3-D) model that describes the variation in lateral and vertical direction. In 3-D modeling we must anticipate possible changes with depth of the effects of environmental covariates on the soil property of interest. This can be achieved by including interactions between the environmental covariates and depth. Also we must anticipate possible non-stationarity of the residual variance with depth. This can be achieved by fitting a 3-D correlation function, and multiplying the correlation between two points by the residual standard deviations at these two points that are a function of depth. In this paper various 3-D models of the natural logarithms of SOC are compared with 2-D depth-interval specific models. Five environmental covariates are used as predictors in modeling the lateral trend. In the 3-D models also depth was used as a predictor, either categorical, with categories equal to the depth intervals (3-Dcat), or continuous (3-Dcon). The covariance of the residuals in 3-D is modeled by a sum-metric covariance function. Both stationary and non-stationary variance models are fitted. In the non-stationary variance models the residual standard deviations are modeled either as a stepwise function or as a linear function of depth. In the 2-D models the regression coefficients differed largely between the depth intervals. In the 3-Dcat model extreme values for the regression coefficients were leveled out, and in the 3-Dcon model only the coefficients of NDVI and aspect changed with depth. The 3-Dcon model with a residual standard deviation that is a stepwise function of depth had the largest residual log-likelihood and smallest AIC among all 3-D models. Based on the cross-validation root mean squared error (RMSE) there was no single best model. Based on the mean and median of the standardized squared error (MSSE, MedSSE) the 2-D models outperformed all 3-D models. Overestimation of the prediction error variance by the kriging variance was less strong with the non-stationary variance models compared to the stationary variance models. 3-D modeling is required for realistic geostatistical simulation in spatial uncertainty analyses.

Journal ArticleDOI
TL;DR: In this article, an estimation approach to analyse correlated functional data, which are observed on unequal grids or even sparsely, is proposed, based on dimension reduction via functional principal component analysis and on mixed model methodology, allowing the decomposition of the variability in the data as well as the estimation of mean effects of interest.
Abstract: We propose an estimation approach to analyse correlated functional data, which are observed on unequal grids or even sparsely. The model we use is a functional linear mixed model, a functional analogue of the linear mixed model. Estimation is based on dimension reduction via functional principal component analysis and on mixed model methodology. Our procedure allows the decomposition of the variability in the data as well as the estimation of mean effects of interest, and borrows strength across curves. Confidence bands for mean effects can be constructed conditionally on estimated principal components. We provide R-code implementing our approach in an online appendix. The method is motivated by and applied to data from speech production research.