scispace - formally typeset
Search or ask a question
Journal ArticleDOI

So Many Variables: Joint Modeling in Community Ecology.

TL;DR: This work demonstrates the potential of a new class of multivariate models for ecology to specify a statistical model for abundances jointly across many taxa, to simultaneously explore interactions across taxa and the response of abundance to environmental variables, and discusses recent computation tools and future directions.
Abstract: Technological advances have enabled a new class of multivariate models for ecology, with the potential now to specify a statistical model for abundances jointly across many taxa, to simultaneously explore interactions across taxa and the response of abundance to environmental variables. Joint models can be used for several purposes of interest to ecologists, including estimating patterns of residual correlation across taxa, ordination, multivariate inference about environmental effects and environment-by-trait interactions, accounting for missing predictors, and improving predictions in situations where one can leverage knowledge of some species to predict others. We demonstrate this by example and discuss recent computation tools and future directions.
Citations
More filters
Book
01 Sep 2017
TL;DR: In this article, the authors introduce the key stages of niche-based habitat suitability model building, evaluation and prediction required for understanding and predicting future patterns of species and biodiversity, including the main theory behind ecological niches and species distributions.
Abstract: This book introduces the key stages of niche- based habitat suitability model building, evaluation and prediction required for understanding and predicting future patterns of species and biodiversity. Beginning with the main theory behind ecological niches and species distributions, the book proceeds through all major steps of model building, from conceptualization and model training to model evaluation and spatio- temporal predictions. Extensive examples using R support graduate students and researchers in quantifying ecological niches and predicting species distributions with their own data, and help to address key environmental and conservation problems. Reflecting this highly active field of research, the book incorporates the latest developments from informatics and statistics, as well as using data from remote sources such as satellite imagery. A website at www.unil.ch/ hsdm contains the codes and supporting material required to run the examples and teach courses. All three authors are recognized specialists of and have contributed substantially to the development of spatial prediction methods for species’ habitat suitability and distribution modeling. They published a large number of papers, overall cumulating tens of thousands of citations, and are ISI Highly Cited Researchers.

632 citations

Journal ArticleDOI
TL;DR: HMSC is operationalise the HMSC framework as a hierarchical Bayesian joint species distribution model, and is implemented as R- and Matlab-packages which enable computationally efficient analyses of large data sets.
Abstract: Community ecology aims to understand what factors determine the assembly and dynamics of species assemblages at different spatiotemporal scales. To facilitate the integration between conceptual and statistical approaches in community ecology, we propose Hierarchical Modelling of Species Communities (HMSC) as a general, flexible framework for modern analysis of community data. While non-manipulative data allow for only correlative and not causal inference, this framework facilitates the formulation of data-driven hypotheses regarding the processes that structure communities. We model environmental filtering by variation and covariation in the responses of individual species to the characteristics of their environment, with potential contingencies on species traits and phylogenetic relationships. We capture biotic assembly rules by species-to-species association matrices, which may be estimated at multiple spatial or temporal scales. We operationalise the HMSC framework as a hierarchical Bayesian joint species distribution model, and implement it as R- and Matlab-packages which enable computationally efficient analyses of large data sets. Armed with this tool, community ecologists can make sense of many types of data, including spatially explicit data and time-series data. We illustrate the use of this framework through a series of diverse ecological examples.

588 citations


Cites background or methods from "So Many Variables: Joint Modeling i..."

  • ...If we exclude the environmental covariates X from the analysis, then the latent variables behind an association matrix can be viewed as a model-based ordination (Warton et al. 2015b)....

    [...]

  • ...To overcome these limitations, community ecologists are showing increasing interest in model-based approaches (Warton et al. 2015a,b)....

    [...]

  • ...To facilitate the estimation of such matrices, we use a latent variable approach, which allows a parameter-sparse representation of the matrix X through latent factors and their loadings (for mathematical details see Warton et al. 2015b; Ovaskainen et al. 2016a,b)....

    [...]

  • ...If environmental covariates are included in the analysis, then an association matrix corresponds to a residual ordination, which describes those co-occurrences that cannot be explained by shared responses to environmental covariates (Hui et al. 2015; Warton et al. 2015b)....

    [...]

  • ...Another way is the use of joint species distribution models, which explicitly acknowledge the multivariate nature of species assemblages, allowing one to gather more mechanistic and predictive insights into assembly processes (Warton et al. 2015b)....

    [...]

Journal ArticleDOI
TL;DR: A series of arguments based on probability, sampling, food web and coexistence theories supporting that significant spatial associations between species (or lack thereof) is a poor proxy for ecological interactions are presented.
Abstract: There is a rich amount of information in co-occurrence (presence-absence) data that could be used to understand community assembly. This proposition first envisioned by Forbes (1907) and then Diamond (1975) prompted the development of numerous modelling approaches (e.g. null model analysis, co-occurrence networks and, more recently, joint species distribution models). Both theory and experimental evidence support the idea that ecological interactions may affect co-occurrence, but it remains unclear to what extent the signal of interaction can be captured in observational data. It is now time to step back from the statistical developments and critically assess whether co-occurrence data are really a proxy for ecological interactions. In this paper, we present a series of arguments based on probability, sampling, food web and coexistence theories supporting that significant spatial associations between species (or lack thereof) is a poor proxy for ecological interactions. We discuss appropriate interpretations of co-occurrence, along with potential avenues to extract as much information as possible from such data.

332 citations


Cites background or methods from "So Many Variables: Joint Modeling i..."

  • ...Although this argument may suggest that using models that account for environmental filtering is appropriate (e.g. JSDMs, Ovaskainen et al., 2010; Warton et al., 2015; D’Amen et al., 2018), it should not be interpreted this way....

    [...]

  • ...A potentially interesting way to approach this problem is to use latent variable models (e.g. Warton et al., 2015; Ovaskainen et al., 2017) because latent variables may be able to capture some unmeasured environmental variables....

    [...]

  • ..., 2014) were developed and predict the distribution of a set of species that are potentially interdependent based on abiotic factors using the entire incidence matrix (€ Ozesmi & € Ozesmi 1999; Latimer et al., 2009; Ovaskainen et al., 2010, 2016, 2017; Clark et al., 2014; Kaldhusdal et al., 2015; Warton et al., 2015; Hui, 2016; Clark et al., 2017; Staniczenko et al., 2017)....

    [...]

  • ...…are potentially interdependent based on abiotic factors using the entire incidence matrix (€Ozesmi & €Ozesmi 1999; Latimer et al., 2009; Ovaskainen et al., 2010, 2016, 2017; Clark et al., 2014; Kaldhusdal et al., 2015; Warton et al., 2015; Hui, 2016; Clark et al., 2017; Staniczenko et al., 2017)....

    [...]

Journal ArticleDOI
TL;DR: This paper adopts Random Forest to select the important feature in classification and compares the result of the dataset with and without essential features selection by RF methods varImp(), Boruta, and Recursive Feature Elimination to get the best percentage accuracy and kappa.
Abstract: Feature selection becomes prominent, especially in the data sets with many variables and features. It will eliminate unimportant variables and improve the accuracy as well as the performance of classification. Random Forest has emerged as a quite useful algorithm that can handle the feature selection issue even with a higher number of variables. In this paper, we use three popular datasets with a higher number of variables (Bank Marketing, Car Evaluation Database, Human Activity Recognition Using Smartphones) to conduct the experiment. There are four main reasons why feature selection is essential. First, to simplify the model by reducing the number of parameters, next to decrease the training time, to reduce overfilling by enhancing generalization, and to avoid the curse of dimensionality. Besides, we evaluate and compare each accuracy and performance of the classification model, such as Random Forest (RF), Support Vector Machines (SVM), K-Nearest Neighbors (KNN), and Linear Discriminant Analysis (LDA). The highest accuracy of the model is the best classifier. Practically, this paper adopts Random Forest to select the important feature in classification. Our experiments clearly show the comparative study of the RF algorithm from different perspectives. Furthermore, we compare the result of the dataset with and without essential features selection by RF methods varImp(), Boruta, and Recursive Feature Elimination (RFE) to get the best percentage accuracy and kappa. Experimental results demonstrate that Random Forest achieves a better performance in all experiment groups.

271 citations


Cites background from "So Many Variables: Joint Modeling i..."

  • ...Climate, Ecology, and Environ‐ mental The analysis of noisy ecological data [25], variables in ecology modelling [26], number of counts termites [27], community ecology and integrating species, traits, environmental, space [28], parameter in rainfall forecasting [29, 30], global climate zone [31], local climate zone [32], environmental noise pollution [33], urban pollution [34, 35], rainfall spatial temporal [36], flash flood hazard [37, 38], landslide [39], earthquake damage detection using curvilinear features [40], earthquake classifiers using stochastic reconstruction [41] and tsunami [42]...

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.
Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

50,607 citations


"So Many Variables: Joint Modeling i..." refers methods in this paper

  • ...This is not straightforward to implement (for lme4, using a modular approach as in [89])....

    [...]

Journal ArticleDOI
TL;DR: In this article, an extension of generalized linear models to the analysis of longitudinal data is proposed, which gives consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence.
Abstract: SUMMARY This paper proposes an extension of generalized linear models to the analysis of longitudinal data. We introduce a class of estimating equations that give consistent estimates of the regression parameters and of their variance under mild assumptions about the time dependence. The estimating equations are derived without specifying the joint distribution of a subject's observations yet they reduce to the score equations for multivariate Gaussian outcomes. Asymptotic theory is presented for the general class of estimators. Specific cases in which we assume independence, m-dependence and exchangeable correlation structures from each subject are discussed. Efficiency of the proposed estimators in two simple situations is considered. The approach is closely related to quasi-likelih ood. Some key ironh: Estimating equation; Generalized linear model; Longitudinal data; Quasi-likelihood; Repeated measures.

17,111 citations


"So Many Variables: Joint Modeling i..." refers background in this paper

  • ...The generalized estimating equations (GEE) approach [40] can and has been used for a similar purpose [41], although it is best suited to situations where the correlation is treated as a nuisance rather than being of interest in itself [42]....

    [...]

Book
01 Jan 1995
TL;DR: Detailed notes on Bayesian Computation Basics of Markov Chain Simulation, Regression Models, and Asymptotic Theorems are provided.
Abstract: FUNDAMENTALS OF BAYESIAN INFERENCE Probability and Inference Single-Parameter Models Introduction to Multiparameter Models Asymptotics and Connections to Non-Bayesian Approaches Hierarchical Models FUNDAMENTALS OF BAYESIAN DATA ANALYSIS Model Checking Evaluating, Comparing, and Expanding Models Modeling Accounting for Data Collection Decision Analysis ADVANCED COMPUTATION Introduction to Bayesian Computation Basics of Markov Chain Simulation Computationally Efficient Markov Chain Simulation Modal and Distributional Approximations REGRESSION MODELS Introduction to Regression Models Hierarchical Linear Models Generalized Linear Models Models for Robust Inference Models for Missing Data NONLINEAR AND NONPARAMETRIC MODELS Parametric Nonlinear Models Basic Function Models Gaussian Process Models Finite Mixture Models Dirichlet Process Models APPENDICES A: Standard Probability Distributions B: Outline of Proofs of Asymptotic Theorems C: Computation in R and Stan Bibliographic Notes and Exercises appear at the end of each chapter.

16,079 citations


"So Many Variables: Joint Modeling i..." refers background in this paper

  • ...The most common approaches (as usual) are maximum likelihood [78] and Bayesian estimation [79]....

    [...]

Posted Content
TL;DR: In this article, a model is described in an lmer call by a formula, in this case including both fixed-and random-effects terms, and the formula and data together determine a numerical representation of the model from which the profiled deviance or the profeatured REML criterion can be evaluated as a function of some of model parameters.
Abstract: Maximum likelihood or restricted maximum likelihood (REML) estimates of the parameters in linear mixed-effects models can be determined using the lmer function in the lme4 package for R. As for most model-fitting functions in R, the model is described in an lmer call by a formula, in this case including both fixed- and random-effects terms. The formula and data together determine a numerical representation of the model from which the profiled deviance or the profiled REML criterion can be evaluated as a function of some of the model parameters. The appropriate criterion is optimized, using one of the constrained optimization functions in R, to provide the parameter estimates. We describe the structure of the model, the steps in evaluating the profiled deviance or REML criterion, and the structure of classes or types that represents such a model. Sufficient detail is included to allow specialization of these structures by users who wish to write functions to fit specialized linear mixed models, such as models incorporating pedigrees or smoothing splines, that are not easily expressible in the formula language used by lmer.

14,433 citations

Journal ArticleDOI
TL;DR: In this article, a non-parametric method for multivariate analysis of variance, based on sums of squared distances, is proposed. But it is not suitable for most ecological multivariate data sets.
Abstract: Hypothesis-testing methods for multivariate data are needed to make rigorous probability statements about the effects of factors and their interactions in experiments. Analysis of variance is particularly powerful for the analysis of univariate data. The traditional multivariate analogues, however, are too stringent in their assumptions for most ecological multivariate data sets. Non-parametric methods, based on permutation tests, are preferable. This paper describes a new non-parametric method for multivariate analysis of variance, after McArdle and Anderson (in press). It is given here, with several applications in ecology, to provide an alternative and perhaps more intuitive formulation for ANOVA (based on sums of squared distances) to complement the description pro- vided by McArdle and Anderson (in press) for the analysis of any linear model. It is an improvement on previous non-parametric methods because it allows a direct additive partitioning of variation for complex models. It does this while maintaining the flexibility and lack of formal assumptions of other non-parametric methods. The test- statistic is a multivariate analogue to Fisher's F-ratio and is calculated directly from any symmetric distance or dissimilarity matrix. P-values are then obtained using permutations. Some examples of the method are given for tests involving several factors, including factorial and hierarchical (nested) designs and tests of interactions.

12,328 citations