scispace - formally typeset
Search or ask a question

Showing papers in "Methods in Ecology and Evolution in 2013"


Journal ArticleDOI
TL;DR: In this article, the authors make a case for the importance of reporting variance explained (R2) as a relevant summarizing statistic of mixed-effects models, which is rare, even though R2 is routinely reported for linear models and also generalized linear models (GLM).
Abstract: Summary The use of both linear and generalized linear mixed-effects models (LMMs and GLMMs) has become popular not only in social and medical sciences, but also in biological sciences, especially in the field of ecology and evolution. Information criteria, such as Akaike Information Criterion (AIC), are usually presented as model comparison tools for mixed-effects models. The presentation of ‘variance explained’ (R2) as a relevant summarizing statistic of mixed-effects models, however, is rare, even though R2 is routinely reported for linear models (LMs) and also generalized linear models (GLMs). R2 has the extremely useful property of providing an absolute value for the goodness-of-fit of a model, which cannot be given by the information criteria. As a summary statistic that describes the amount of variance explained, R2 can also be a quantity of biological interest. One reason for the under-appreciation of R2 for mixed-effects models lies in the fact that R2 can be defined in a number of ways. Furthermore, most definitions of R2 for mixed-effects have theoretical problems (e.g. decreased or negative R2 values in larger models) and/or their use is hindered by practical difficulties (e.g. implementation). Here, we make a case for the importance of reporting R2 for mixed-effects models. We first provide the common definitions of R2 for LMs and GLMs and discuss the key problems associated with calculating R2 for mixed-effects models. We then recommend a general and simple method for calculating two types of R2 (marginal and conditional R2) for both LMMs and GLMMs, which are less susceptible to common problems. This method is illustrated by examples and can be widely employed by researchers in any fields of research, regardless of software packages used for fitting mixed-effects models. The proposed method has the potential to facilitate the presentation of R2 for a wide range of circumstances.

7,749 citations


Journal ArticleDOI
TL;DR: Geomorph as discussed by the authors is a software package for performing geometric morphometric shape analysis in the R statistical computing environment, where a set of shape variables are obtained from landmark coordinates following a Procrustes superimposition.
Abstract: Summary 1. Many ecological and evolutionary studies seek to explain patterns of shape variation and its covariation with other variables. Geometric morphometrics is often used for this purpose, where a set of shape variables are obtained from landmark coordinates following a Procrustes superimposition. 2. We introduce geomorph: a software package for performing geometric morphometric shape analysis in the R statistical computing environment. 3. Geomorph provides routines for all stages of landmark-based geometric morphometric analyses in two and three-dimensions. It is an open source package to read, manipulate, and digitize landmark data, generate shape variables via Procrustes analysis for points, curves and surfaces, perform statistical analyses of shape variation and covariation, and to provide graphical depictions of shapes and patterns of shape variation. An important contribution of geomorph is the ability to perform Procrustes superimposition on landmark points, as well as semilandmarks from curves and surfaces. 4. A wide range of statistical methods germane to testing ecological and evolutionary hypotheses of shape variation are provided. These include standard multivariate methods such as principal components analysis, and approaches for multivariate regression and group comparison. Methods for more specialized analyses, such as for assessing shape allometry, comparing shape trajectories, examining morphological integration, and for assessing phylogenetic signal, are also included. 5. Several functions are provided to graphically visualize results, including routines for examining variation in shape space, visualizing allometric trajectories, comparing specific shapes to one another and for plotting phylogenetic changes in morphospace. 6. Finally, geomorph participates to make available advanced geometric morphometric analyses through the R statistical computing platform.

1,561 citations


Journal ArticleDOI
TL;DR: A new R package, diveRsity, for the calculation of various diversity statistics, including common diversity partitioning statistics (θ, GST) and population differentiation statistics (DJost, GST ' , χ2 test for population heterogeneity), among others.
Abstract: Summary We present a new R package, diveRsity, for the calculation of various diversity statistics, including common diversity partitioning statistics (θ, GST) and population differentiation statistics (DJost, GST ', χ2 test for population heterogeneity), among others. The package calculates these estimators along with their respective bootstrapped confidence intervals for loci, sample population pairwise and global levels. Various plotting tools are also provided for a visual evaluation of estimated values, allowing users to critically assess the validity and significance of statistical tests from a biological perspective. diveRsity has a set of unique features, which facilitate the use of an informed framework for assessing the validity of the use of traditional F-statistics for the inference of demography, with reference to specific marker types, particularly focusing on highly polymorphic microsatellite loci. However, the package can be readily used for other co-dominant marker types (e.g. allozymes, SNPs). Detailed examples of usage and descriptions of package capabilities are provided. The examples demonstrate useful strategies for the exploration of data and interpretation of results generated by diveRsity. Additional online resources for the package are also described, including a GUI web app version intended for those with more limited experience using R for statistical analysis.

998 citations


Journal ArticleDOI
TL;DR: ITSx is introduced, a Perl‐based software tool to extract ITS1, 5.8S and ITS2 – as well as full‐length ITS sequences – from both Sanger and high‐throughput sequencing data sets, and is rich in features and written to be easily incorporated into automated sequence analysis pipelines.
Abstract: Summary 1. The nuclear ribosomal internal transcribed spacer (ITS) region is the primary choice for molecular identification of fungi. Its two highly variable spacers (ITS1 and ITS2) are usually species specific, whereas the intercalary 5.8S gene is highly conserved. For sequence clustering and BLAST searches, it is often advantageous to rely on either one of the variable spacers but not the conserved 5.8S gene. To identify and extract ITS1 and ITS2 from large taxonomic and environmental data sets is, however, often difficult, and many ITS sequences are incorrectly delimited in the public sequence databases. 2. We introduce ITSx, a Perl-based software tool to extract ITS1, 5.8S and ITS2 – as well as full-length ITS sequences – from both Sanger and high-throughput sequencing data sets. ITSx uses hidden Markov models computed from large alignments of a total of 20 groups of eukaryotes, including fungi, metazoans and plants, and the sequence extraction is based on the predicted positions of the ribosomal genes in the sequences. 3. ITSx has a very high proportion of true-positive extractions and a low proportion of false-positive extractions. Additionally, process parallelization permits expedient analyses of very large data sets, such as a one million sequence amplicon pyrosequencing data set. ITSx is rich in features and written to be easily incorporated into automated sequence analysis pipelines. 4. ITSx paves the way for more sensitive BLAST searches and sequence clustering operations for the ITS region in eukaryotes. The software also permits elimination of non-ITS sequences from any data set. This is particularly useful for amplicon-based next-generation sequencing data sets, where insidious non-target sequences are often found among the target sequences. Such non-target sequences are difficult to find by other means and would contribute noise to diversity estimates if left in the data set.

901 citations


Journal ArticleDOI
TL;DR: There are many misconceptions concerning the use of presence-only models, including the misunderstanding that MAXENT, and other presence- only methods, relieve users from the constraints of survey design, and a series of recommendations that researchers analyse data in a presence–absence framework whenever possible, because fewer assumptions are required and inferences are made about clearly defined parameters such as occurrence probability.
Abstract: Summary Recently, interest in species distribution modelling has increased following the development of new methods for the analysis of presence-only data and the deployment of these methods in user-friendly and powerful computer programs. However, reliable inference from these powerful tools requires that several assumptions be met, including the assumptions that observed presences are the consequence of random or representative sampling and that detectability during sampling does not vary with the covariates that determine occurrence probability. Based on our interactions with researchers using these tools, we hypothesized that many presence-only studies were ignoring important assumptions of presence-only modelling. We tested this hypothesis by reviewing 108 articles published between 2008 and 2012 that used the MAXENT algorithm to analyse empirical (i.e. not simulated) data. We chose to focus on these articles because MAXENT has been the most popular algorithm in recent years for analysing presence-only data. Many articles (87%) were based on data that were likely to suffer from sample selection bias; however, methods to control for sample selection bias were rarely used. In addition, many analyses (36%) discarded absence information by analysing presence–absence data in a presence-only framework, and few articles (14%) mentioned detection probability. We conclude that there are many misconceptions concerning the use of presence-only models, including the misunderstanding that MAXENT, and other presence-only methods, relieve users from the constraints of survey design. In the process of our literature review, we became aware of other factors that raised concerns about the validity of study conclusions. In particular, we observed that 83% of articles studies focused exclusively on model output (i.e. maps) without providing readers with any means to critically examine modelled relationships and that MAXENT's logistic output was frequently (54% of articles) and incorrectly interpreted as occurrence probability. We conclude with a series of recommendations foremost that researchers analyse data in a presence–absence framework whenever possible, because fewer assumptions are required and inferences can be made about clearly defined parameters such as occurrence probability.

590 citations


Journal ArticleDOI
TL;DR: Oligotyping is described, a novel supervised computational method that allows researchers to investigate the diversity of closely related but distinct bacterial organisms in final operational taxonomic units identified in environmental data sets through 16S ribosomal RNA gene data by the canonical approaches.
Abstract: Summary 1. Bacteria comprise the most diverse domain of life on Earth, where they occupy nearly every possible ecological niche and play key roles in biological and chemical processes. Studying the composition and ecology of bacterial ecosystems and understanding their function are of prime importance. High-throughput sequencing technologies enable nearly comprehensive descriptions of bacterial diversity through 16S ribosomal RNA gene amplicons. Analyses of these communities generally rely upon taxonomic assignments through reference data bases or clustering approaches using de facto sequence similarity thresholds to identify operational taxonomic units. However, these methods often fail to resolve ecologically meaningful differences between closely related organisms in complex microbial data sets. 2. In this paper, we describe oligotyping, a novel supervised computational method that allows researchers to investigate the diversity of closely related but distinct bacterial organisms in final operational taxonomic units identified in environmental data sets through 16S ribosomal RNA gene data by the canonical approaches. 3. Our analysis of two data sets from two different environments demonstrates the capacity of oligotyping at discriminating distinct microbial populations of ecological importance. 4. Oligotyping can resolve the distribution of closely related organisms across environments and unveil previously overlooked ecological patterns for microbial communities. The URL http://oligotyping.org offers an open-source software pipeline for oligotyping.

558 citations


Journal ArticleDOI
TL;DR: Poo, an R package that facilitates the organization, visualization and analysis of spectral data in a cohesive framework, is introduced and an exact solution for the calculation of colour volume overlap in colourspace is presented, thus expanding previously published methodologies.
Abstract: Summary Recent technical and methodological advances have led to a dramatic increase in the use of spectrometry to quantify reflectance properties of biological materials, as well as models to determine how these colours are perceived by animals, providing important insights into ecological and evolutionary aspects of animal visual communication. Despite this growing interest, a unified cross-platform framework for analysing and visualizing spectral data has not been available. We introduce pavo, an R package that facilitates the organization, visualization and analysis of spectral data in a cohesive framework. pavo is highly flexible, allowing users to (a) organize and manipulate data from a variety of sources, (b) visualize data using R's state-of-the-art graphics capabilities and (c) analyse data using spectral curve shape properties and visual system modelling for a broad range of taxa. In this paper, we present a summary of the functions implemented in pavo and how they integrate in a workflow to explore and analyse spectral data. We also present an exact solution for the calculation of colour volume overlap in colourspace, thus expanding previously published methodologies. As an example of pavo's capabilities, we compare the colour patterns of three African glossy starling species, two of which have diverged very recently. We demonstrate how both colour vision models and direct spectral measurement analysis can be used to describe colour attributes and differences between these species. Different approaches to visual models and several plotting capabilities exemplify the package's versatility and streamlined workflow. pavo provides a cohesive environment for handling spectral data and addressing complex sensory ecology questions, while integrating with R's modular core for a broader and comprehensive analytical framework, automated management of spectral data and reproducible workflows for colour analysis.

535 citations


Journal ArticleDOI
TL;DR: In this paper, the authors show that the simple and partial Mantel tests are not valid in this case, and their bias remains close to that of the simple Mantel test, and that strong biases are expected under a sampling design and spatial correlation parameter drawn from an actual study.
Abstract: Summary 1. The simple and partial Mantel tests are routinely used in many areas of evolutionary biology to assess the significance of the association between two or more matrices of distances relative to the same pairs of individuals or demes. Partial Mantel tests rather than simple Mantel tests are widely used to assess the relationship between two variables displaying some form of structure. 2. We show that contrary to a widely shared belief, partial Mantel tests are not valid in this case, and their bias remains close to that of the simple Mantel test. 3. We confirm that strong biases are expected under a sampling design and spatial correlation parameter drawn from an actual study. 4. The Mantel tests should not be used in case autocorrelation is suspected in both variables compared under the null hypothesis. We outline alternative strategies. The R code used for our computer simulations is distributed as supporting material.

428 citations


Journal ArticleDOI
TL;DR: In this article, changes in the balance between soil carbon storage and release can significantly amplify or attenuate global warming, and a lot of progress has been made in determining potential drivers of global warming.
Abstract: 1. Changes in the balance between soil carbon storage and release can significantly amplify or attenuate global warming. Although a lot of progress has been made in determining potential drivers of ...

382 citations


Journal ArticleDOI
TL;DR: This work presents a method, ‘SURFACE’, that uses the Ornstein‐Uhlenbeck stabilizing selection model to identify cases of convergent evolution using only continuous phenotypic characters and a phylogenetic tree, and demonstrates the method with an application to Hawaiian Tetragnatha spiders.
Abstract: Summary 1. We present a method, ‘SURFACE’, that uses the Ornstein-Uhlenbeck stabilizing selection model to identify cases of convergent evolution using only continuous phenotypic characters and a phylogenetic tree. 2. SURFACE uses stepwise Akaike Information Criterion first to locate regime shifts on a tree, then to identify whether shifts are towards convergent regimes. Simulations can be used to test the hypothesis that a clade contains more convergence than expected by chance. 3. We demonstrate the method with an application to HawaiianTetragnatha spiders, and present numerical simulations showing that the method has desirable statistical properties given data for multiple traits. 4. The R packagesurface is available as open source software from the Comprehensive RA rchive Network.

309 citations


Journal ArticleDOI
TL;DR: It is demonstrated that single-visit deposition of pollen on virgin stigmas is a practical measure of pollinator effectiveness, using 13 temperate and tropical plant species and the most effective pollinator measured was as predicted from its pollination syndrome based on traditional advertisement and reward traits.
Abstract: Summary The relative importance of specialized and generalized plant-pollinator relationships is contentious, yet analyses usually avoid direct measures of pollinator quality (effectiveness), citing difficulties in collecting such data in the field and so relying on visitation data alone. We demonstrate that single-visit deposition (SVD) of pollen on virgin stigmas is a practical measure of pollinator effectiveness, using 13 temperate and tropical plant species. For each flower the most effective pollinator measured from SVD was as predicted from its pollination syndrome based on traditional advertisement and reward traits. Overall, c.n40% of visitors were not effective pollinators (range 0n78% for different flowers); thus, flowernpollinator relationships are substantially more specialized than visitation alone can reveal. Analyses at species level are crucial, as significant variation in SVD occurred within both higher-level taxonomic groups (genus, family) and within functional groups. Other measures sometimes used to distinguish visitors from pollinators (visit duration, frequency, or feeding behaviour in flowers) did not prove to be suitable proxies. Distinguishing between lpollinatorsr and lvisitorsr is therefore crucial, and true lpollination networksr should include SVD to reveal pollinator effectiveness (PE). Generating such networks, now underway, could avoid potential misinterpretations of the conservation values of flower visitors, and of possible extinction threats as modelled in existing networks.

Journal ArticleDOI
TL;DR: A novel approach to estimating re‐association rates of time between frequently sampled individuals is included, which bridges a gap in the tools that are available to biologists wishing to analyse animal social networks in R.
Abstract: Summary The sampling of animals for the purpose of measuring associations and interactions between individuals has led to the development of several statistical methods to deal with biases inherent in these data. However, these methods are typically computationally intensive and complex to implement. Here, I provide a software package that supports a range of these analyses in the R statistical computing environment. This package includes a novel approach to estimating re-association rates of time between frequently sampled individuals. I include extended demonstration of the syntax and examples of the ability for this software to interface with existing network analysis packages in R. This bridges a gap in the tools that are available to biologists wishing to analyse animal social networks in R.

Journal ArticleDOI
TL;DR: The novel partition separating two components of abundance-based dissimilarity may be useful to assess biodiversity patterns and to explore their causes, as substitution and loss of individuals are patterns that can derive from completely different processes.
Abstract: Summary Dissimilarity measures can be formulated using matching components that can be defined as the intersection in terms of species composition of both sets (a) and the relative complements of each set (b and c respectively). Previous work has extended these matching components to abundance-based measures of dissimilarity. Using these matching components in terms of species abundances I provide a novel partition separating two components of abundance-based dissimilarity: (i) balanced variation in abundance, whereby the individuals of some species in one site are substituted by the same number of individuals of different species in another site; and (ii) abundance gradients, whereby some individuals are lost from one site to the other. New indices deriving from the additive partition of Bray-Curtis dissimilarity are presented, each one accounting separately for these two antithetic components of assemblage variation. An example comparing the patterns of increase of assemblage dissimilarity with spatial distance in two tropical forests is provided to illustrate the usefulness of the novel partition to discern the different sources of assemblage variation. The widely used Bray-Curtis index of dissimilarity is the result of summing these two sources of dissimilarity, and therefore might consider equivalent patterns that are markedly different. Therefore, the novel partition may be useful to assess biodiversity patterns and to explore their causes, as substitution and loss of individuals are patterns that can derive from completely different processes.

Journal ArticleDOI
TL;DR: The results demonstrate that this method is remarkably stable under a wide array of circumstances, including most phylogenetic reconstruction methods, high singleton presence (up to 95%), taxon richness (above five species) and the presence of gaps in intraspecific sampling coverage (removal of intermediate haplotypes).
Abstract: Summary 1. The generalized mixed Yule-coalescent (GMYC) model has become one of the most popular approaches for species delimitation based on single-locus data, and it is widely used in biodiversity assessments and phylogenetic community ecology. We here examine an array of factors affecting GMYC resolution (tree reconstruction method, taxon sampling coverage/taxon richness and geographic sampling intensity/geographic scale). 2. We test GMYC performance based on empirical data (DNA barcoding of the Romanian butterflies) on a solid taxonomic framework (i.e. all species are thought to be described and can be determined with independent sources of evidence). The data set is comprehensive (176 species), and intensely and homogeneously sampled (1303 samples representing the main populations of butterflies in this country). Taxonomy was assessed based on morphology, including linear and geometric morphometry when needed. 3. The number of GMYC entities obtained constantly exceeds the total number of morphospecies in the data set. We show that c. 80% of the species studied are recognized as entities by GMYC. Interestingly, we show that this percentage is practically the maximum that a single-threshold method can provide for this data set. Thus, the c. 20% of failures are attributable to intrinsic properties of the COI polymorphism: overlap in inter- and intraspecific divergences and non-monophyly of the species likely because of introgression or lack of independent lineage sorting. 4. Our results demonstrate that this method is remarkably stable under a wide array of circumstances, including most phylogenetic reconstruction methods, high singleton presence (up to 95%), taxon richness (above five species) and the presence of gaps in intraspecific sampling coverage (removal of intermediate haplotypes). Hence, the method is useful to designate an optimal divergence threshold in an objective manner and to pinpoint potential cryptic species that are worth being studied in detail. However, the existence of a substantial percentage of species wrongly delimited indicates that GMYC cannot be used as sufficient evidence for evaluating the specific status of particular cases without additional data. 5. Finally, we provide a set of guidelines to maximize efficiency in GMYC analyses and discuss the range of studies that can take advantage of the method.

Journal ArticleDOI
TL;DR: This work considers spatial modelling techniques that may be advantageous to applied ecologists such as quantification of uncertainty in a two-stage model and smoothing in areas with complex boundaries and considers a popular approach based on generalized additive models.
Abstract: Summary 1. Our understanding of a biological population can be greatly enhanced by modelling their distribution in space and as a function of environmental covariates. Such models can be used to investigate the relationships between distribution and environmental covariates as well as reliably estimate abundances and create maps of animal/ plant distribution. 2. Density surface models consist of a spatial model of the abundance of a biological population which has been corrected for uncertain detection via distance sampling methods. 3. We review recent developments in the field and consider the likely directions of future research before focussing on a popular approach based on generalized additive models. In particular, we consider spatial modelling techniques that may be advantageous to applied ecologists such as quantification of uncertainty in a two-stage model and smoothing in areas with complex boundaries. 4. The methods discussed are available in an R package developed by the authors (dsm) and are largely implemented in the popular Windows software Distance.

Journal ArticleDOI
TL;DR: Two different graphical methods for visualizing phenotypic evolution on the tree using a type of projection of the tree into morphospace called a ‘traitgram’ should prove useful in summarizing complex comparative inferences about ancestral character reconstruction.
Abstract: Summary Modern phylogenetic comparative biology uses data from the relationships between species (phylogeny) combined with comparative information for phenotypic traits to draw model-based statistical inferences about the evolutionary past. Recent years have seen phylogeny methods for evolutionary inference become central in the study of organic evolution. Here, I present two different graphical methods for visualizing phenotypic evolution on the tree. Method 1 is a new approach for plotting the posterior density of stochastically mapped character histories for a binary (two-state) phenotypic trait on a phylogeny. Method 2 is a closely related technique that uses ancestral character estimation to visualize historical character states for a continuous trait along the branches of a tree. One shortcoming of Method 2 is that by mapping the point estimates of ancestral states along the branches of the tree, we have effectively ignored the uncertainty associated with ancestral character estimation of continuous traits. To alleviate this issue, I propose a new method for visualizing ancestral state uncertainty using a type of projection of the tree into morphospace called a ‘traitgram.’ All of these approaches should prove useful in summarizing complex comparative inferences about ancestral character reconstruction. They are implemented in the freely available and open-source R phylogenetics package ‘phytools.’

Journal ArticleDOI
TL;DR: In this article, the authors present mixed models as a particularly useful tool for analysing nested designs, and highlight the value of the estimated random variance as a quantity of biological interest, which can be used to facilitate the transition from classical ANOVAs to mixed models in dealing with categorical data.
Abstract: 1. Nested data structures are ubiquitous in the study of ecology and evolution, and such structures need to be modelled appropriately. Mixed-effects models offer a powerful framework to do so.Nested effects can usually be fitted using the syntax for crossed effects in mixed models, provided that the coding reflects implicit nesting. But the experimental design (either nested or crossed) affects the interpretation of the results. 2. The key difference between nested and crossed effects in mixed models is the estimation and interpretation of the interaction variance. With nested data structures, the interaction variance is pooled with the main effect variance of the nested factor. Crossed designs are required to separate the two components. This difference between nested and crossed data is determined by the experimental design (thus by the nature of data sets) and not by the coding of the statistical model. 3. Data can be nested by design in the sense that it would have been technically feasible and biologically relevant to collect the data in a crossed design. In such cases, the pooling of the variances needs to be clearly acknowledged. In other situations, it might be impractical or even irrelevant to apply a crossed design. We call such situations naturally nested, a case in which the pooling of the interaction variance will be less of an issue. 4. The interpretation of results should reflect the fact that the interaction variance inflates the main effect variance when dealing with nested data structures. Whether or not this distinction is critical depends on the research question and the system under study. 5. We present mixed models as a particularly useful tool for analysing nested designs, and we highlight the value of the estimated random variance as a quantity of biological interest. Important insights can be gained if random-effect variances are appropriately interpreted. We hope that our paper facilitates the transition from classical ANOVAs to mixed models in dealing with categorical data.

Journal ArticleDOI
TL;DR: The analyses demonstrate the benefits of site occupancy models as a simple and powerful tool to estimate detection and site occupancy (species prevalence) probabilities despite imperfect detection.
Abstract: Summary 1. The use of environmental DNA (eDNA) to detect species in aquatic environments such as ponds and streams is a powerful new technique with many benefits. However, species detection in eDNA-based surveys is likely to be imperfect, which can lead to underestimation of the distribution of a species. 2. Site occupancy models account for imperfect detection and can be used to estimate the proportion of sites where a species occurs from presence/absence survey data, making them ideal for the analysis of eDNA-based surveys. Imperfect detection can result from failure to detect the species during field work (e.g. by water samples) or during laboratory analysis (e.g. by PCR). 3. To demonstrate the utility of site occupancy models for eDNA surveys, we reanalysed a data set estimating the occurrence of the amphibian chytrid fungus Batrachochytrium dendrobatidis using eDNA. Our reanalysis showed that the previous estimation of species occurrence was low by 5–10%. Detection probability was best explained by an index of the number of hosts (frogs) in ponds. 4. Per-visit availability probability in water samples was estimated at 0� 45 (95% CRI 0� 32, 0� 58) and per-PCR detection probability at 0� 85 (95% CRI 0� 74, 0� 94), and six water samples from a pond were necessary for a cumulative detection probability >95%. A simulation study showed that when using site occupancy analysis, researchers need many fewer samples to reliably estimate presence and absence of species than without use of site occupancy modelling. 5. Our analyses demonstrate the benefits of site occupancy models as a simple and powerful tool to estimate detection and site occupancy (species prevalence) probabilities despite imperfect detection. As species detection from eDNA becomes more common, adoption of appropriate statistical methods, such as site occupancy models, will become crucial to ensure that reliable inferences are made from eDNA-based surveys.

Journal ArticleDOI
TL;DR: The extreme sensitivity of pyrosequencing using rare species spiked into plankton samples is demonstrated and it is proposed that the method is a powerful tool for detection of rare native and/or alien species.
Abstract: Concerns regarding the rapid loss of endemic biodiversity, and introduction and spread of non-indigenous species, have focused attention on the need and ability to detect species present in communities at low abundance. However, detection of rare species poses immense technical challenges, especially for morphologically cryptic species, microscopic taxa and those beneath the water surface in aquatic ecosystems. Next-generation sequencing technology provides a robust tool to assess biodiversity, especially for detection of rare species. Here, we assess the sensitivity of 454 pyrosequencing for detection of rare species using known indicator species spiked into existing complex plankton samples. In addition, we develop universal small subunit ribosomal DNA primers for amplification of a wide range of taxa for detailed description of biodiversity in complex communities. A universality test of newly designed primers for the hypervariable V4 region of the nuclear small subunit ribosomal DNA (V4-nSSU) using a plankton sample collected from Hamilton Harbor showed that 454 pyrosequencing based on this universal primer pair can recover a wide range of taxa, including animals, plants (algae), fungi, blue-green algae and protists. A sensitivity test showed that 454 pyrosequencing based on newly designed universal V4-nSSU primers was extremely sensitive for detection of very rare species. Pyrosequencing was able to recover spiked indicator species with biomass percentage as low as approximately 2 center dot 3x10-5% when 24 artificially assembled samples were tagged and sequenced in one PicoTiter plate (i.e. sequencing depth of an equivalent of 1/24 PicoTiter plate). In addition, spiked rare species were sometimes recovered as singletons (i.e. Operational Taxonomic Units represented by a single sequence), suggesting that at least some singletons are informative for recovering unique lineages in rare biospheres'. The method established here allows biologists to better investigate the composition of aquatic communities, especially for detection of rare taxa. Despite a small-scale pyrosequencing effort, we demonstrate the extreme sensitivity of pyrosequencing using rare species spiked into plankton samples. We propose that the method is a powerful tool for detection of rare native and/or alien species.

Journal ArticleDOI
TL;DR: In this paper, a Monte Carlo simulation of mixing polygons is used to evaluate the point-in-polygon assumption of stable isotope analysis in mixing models incorporating uncertainty, for both two and three-isotope systems.
Abstract: Summary Stable isotope analysis is often used to identify the relative contributions of various food resources to a consumer's diet. Some Bayesian isotopic mixing models now incorporate uncertainty in the isotopic signatures of consumers, sources and trophic enrichment factors (e.g. SIAR, MixSIR). This had made model outputs more comprehensive, but at the expense of simple model evaluation, and there is no quantitative method for determining whether a proposed mixing model is likely to explain the isotopic signatures of all consumers, before the model is run. Earlier linear mixing models (e.g. IsoSource) are easier to evaluate, such that if a consumer's isotopic signature is outside the mixing polygon bounding the proposed dietary sources, then mass balance cannot be established and there is no logical solution. This can be used to identify consumers for exclusion or to reject a model outright. This point-in-polygon assumption is not inherent in the Bayesian mixing models, because the source data are distributions not average values, and these models will quantify source contributions even when the solution is very unlikely. We use a Monte Carlo simulation of mixing polygons to apply the point-in-polygon assumption to these models. Convex hulls (‘mixing polygons’) are iterated using the distributions of the proposed dietary sources and trophic enrichment factors, and the proportion of polygons that have a solution (i.e. that satisfy point-in-polygon) is calculated. This proportion can be interpreted as the frequentist probability that the proposed mixing model can calculate source contributions to explain a consumer's isotopic signature. The mixing polygon simulation is visualised with a mixing region, which is calculated by testing a grid of values for point-in-polygon. The simulation method enables users to quantitatively explore assumptions of stable isotope analysis in mixing models incorporating uncertainty, for both two- and three-isotope systems. It provides a quantitative basis for model rejection, for consumer exclusion (those outside the 95% mixing region) and for the correction of trophic enrichment factors. The simulation is demonstrated using a two-isotope study (15N, 13C) of an Australian freshwater food web.

Journal ArticleDOI
TL;DR: This study uses a time‐calibrated phylogeny of living and fossil Mammaliaformes as a framework to test novel models of body size evolution derived from palaeontological theory, and finds that a model comprising an Ornstein–Uhlenbeck process until the K‐Pg event and a Brownian motion process from the Cenozoic onwards was the best supported model.
Abstract: Summary Phylogenetic comparative methods provide a powerful way of addressing classic questions about tempo and mode of phenotypic evolution in the fossil record, such as whether mammals increased in body size diversity after the Cretaceous-Palaeogene (K-Pg) extinction. Most often, these kinds of questions are addressed in the context of variation in evolutionary rates. Shifts in the mode of phenotypic evolution provide an alternative and, in some cases, more realistic explanation for patterns of trait diversity in the fossil record, but these kinds of processes are rarely tested for. In this study, I use a time-calibrated phylogeny of living and fossil Mammaliaformes as a framework to test novel models of body size evolution derived from palaeontological theory. Specifically, I ask whether the K-Pg extinction resulted in a change in rates of body size evolution or release from a constrained adaptive zone. I found that a model comprising an Ornstein–Uhlenbeck process until the K-Pg event and a Brownian motion process from the Cenozoic onwards was the best supported model for these data. Surprisingly, results indicate a lower absolute rate of body size evolution during the Cenozoic than during the Mesozoic. This is explained by release from a stationary OU process that constrained realized disparity. Despite a lower absolute rate, body size disparity has in fact been increasing since the K-Pg event. The use of time-calibrated phylogenies of living and extinct taxa and realistic, process-based models provides unparalleled power in testing evolutionary hypotheses. However, researchers should take care to ensure that the models they use are appropriate to the question being tested and that the parameters estimated are interpreted in the context of the best fitting model.

Journal ArticleDOI
TL;DR: In this paper, the authors compare the performance of the sine method and the tangent method for tree height estimation in a Neotropical moist forest in Panama, using laser rangefinders.
Abstract: Summary Tree height is a key variable for estimating tree biomass and investigating tree life history, but it is difficult to measure in forests with tall, dense canopies and wide crowns. The traditional method, which we refer to as the ‘tangent method’, involves measuring horizontal distance to the tree and angles from horizontal to the top and base of the tree, while standing at a distance of perhaps one tree height or greater. Laser rangefinders enable an alternative method, which we refer to as the ‘sine method’; it involves measuring the distances to the top and base of the tree, and the angles from horizontal to these, and can be carried out from under the tree or from some distance away. We quantified systematic and random errors of these two methods as applied by five technicians to a size-stratified sample of 74 trees between 5.7 and 39.2 m tall in a Neotropical moist forest in Panama. We measured actual heights using towers adjacent to these trees. The tangent method produced unbiased height estimates, but random error was high, and in 6 of the 370 measurements, heights were overestimated by more than 100%. The sine method was faster to learn, displayed less variation in heights among technicians, and had lower random error, but resulted in systematic underestimation by 20% on average. We recommend the sine method for most applications in tropical forests. However, its underestimation, which is likely to vary with forest and instrument type, must be corrected if actual heights are needed.

Journal ArticleDOI
TL;DR: The method uses the statistical relationship between predator and prey body size to infer the matrix of potential interactions among a pool of species, and gives robust predictions of the structure of food webs and its efficiency is increased when the strength of the body-size relationship between predators and preys increases.
Abstract: 1. Current global changes make it important to be able to predict which interactions will occur in the emerging ecosystems. Most of the current methods to infer the existence of interactions between two species require a good knowledge of their behaviour or a direct observation of interactions. In this paper, we overcome these limitations by developing a method, inspired from the niche model of food web structure, using the statistical relationship between predator and prey body size to infer the matrix of potential interactions among a pool of species. 2. The novelty of our approach is to infer, for any species of a given species pool, the three species-specific parameters of the niche model. The method applies to both local and metaweb scales. It allows one to evaluate the feeding interactions of a new species entering the community. 3. We find that this method gives robust predictions of the structure of food webs and that its efficiency is increased when the strength of the body-size relationship between predators and preys increases. 4. We finally illustrate the potential of the method to infer the metaweb structure of pelagic fishes of the Mediterranean sea under different global change scenarios.

Journal ArticleDOI
TL;DR: The sensitivity, robustness and high accuracy of automated acoustic methods demonstrate that they offer a suitable and extremely efficient alternative to field observer point counts for species monitoring.
Abstract: Free to read 1. Autonomous acoustic recorders are widely available and can provide a highly efficient method of species monitoring, especially when coupled with software to automate data processing. However, the adoption of these techniques is restricted by a lack of direct comparisons with existing manual field surveys. 2. We assessed the performance of autonomous methods by comparing manual and automated examination of acoustic recordings with a field-listening survey, using commercially available autonomous recorders and custom call detection and classification software. We compared the detection capability, time requirements, areal coverage and weather condition bias of these three methods using an established call monitoring programme for a nocturnal bird, the little spotted kiwi(Apteryx owenii). 3. The autonomous recorder methods had very high precision (>98%) and required <3% of the time needed for the field survey. They were less sensitive, with visual spectrogram inspection recovering 80% of the total calls detected and automated call detection 40%, although this recall increased with signal strength. The areal coverage of the spectrogram inspection and automatic detection methods were 85% and 42% of the field survey. The methods using autonomous recorders were more adversely affected by wind and did not show a positive association between ground moisture and call rates that was apparent from the field counts. However, all methods produced the same results for the most important conservation information from the survey: the annual change in calling activity. 4. Autonomous monitoring techniques incur different biases to manual surveys and so can yield different ecological conclusions if sampling is not adjusted accordingly. Nevertheless, the sensitivity, robustness and high accuracy of automated acoustic methods demonstrate that they offer a suitable and extremely efficient alternative to field observer point counts for species monitoring.

Journal ArticleDOI
TL;DR: The package R Individual Specialization (RInSp) for the free open-source statistical software r.InSp provides a comprehensive set of classical and recently proposed indices for quantifying the degree of individual specialization using both categorical and continuous resource use data.
Abstract: Summary In the last decade, an increasing number of papers testifies a renewed interest in the topic of individual specialization in resource use and its implication at higher levels of ecological organization. We present the package R Individual Specialization (RInSp) for the free open-source statistical software r. RInSp provides a comprehensive set of classical and recently proposed indices for quantifying the degree of individual specialization using both categorical and continuous resource use data. The package also includes tools for ad hoc Monte Carlo and jackknife resampling procedures for significance testing, plotting and input/output data manipulation. The use of RInSp is demonstrated by two examples. In addition, the potential of the package to be implemented beyond its original scope for multi-level quantitative analyses of individual trait variance in natural communities is illustrated.

Journal ArticleDOI
TL;DR: A systematic comparison between parent–offspring regression and animal model estimates is advocated to detect potentially missing non‐transgenerational environmental effects.
Abstract: Summary Estimating heritability of traits in wild populations is a major prerequisite to understand their evolution. Until recently, most heritability estimates had been obtained using parent-offspring regressions. However, the popularity of animal models, that is, (generalized) linear mixed models assessing the genetic variance component based on population pedigree information, has markedly increased in the past few years. Animal models are claimed to perform better than parent–offspring regressions mainly because they use full between-individual relatedness information and they allow explicit modelling of the environmental effects shared by individuals. However, the differences between heritability estimates obtained using both approaches are not straight forward, and the factors influencing these differences remain unclear. We performed a simulation study to evaluate and compare the accuracy and precision of estimates obtained from parent–offspring regressions and animal models using both Frequentist (REML, PQL) and Bayesian (MCMC) estimation methods. We explored the influence of (i) the presence and type of shared environmental effects (non-transgenerational or transgenerational), (ii) the distribution of the phenotypic trait considered (Gaussian or binary trait) and (iii) data quantity and quality (sample size, pedigree connectivity) on heritability estimates obtained from the two approaches for different levels of true heritability. In the absence of shared environmental effects, the animal model using the REML method performed best for a Gaussian trait, while the animal model using MCMC was more appropriate for a binary trait. For low quantity and quality data, and a binary trait, the parent–offspring regression yielded very imprecise estimates. Estimates from the parent–offspring regression were not influenced by a non-transgenerational shared environmental effect, whereas estimates from animal models in which environmental effects are ignored were affected by both non-transgenerational and transgenerational effects. We discuss the relevance of each approach and estimation method for estimating heritability in wild populations. Importantly, because most effects fitted in animal models are, in fact, non-transgenerational (including environmental maternal effects), we advocate a systematic comparison between parent–offspring regression and animal model estimates to detect potentially missing non-transgenerational environmental effects.

Journal ArticleDOI
TL;DR: In this paper, the authors used 14 different approaches to quantify beta diversity, among them dataset-wide multiplicative partitioning and pairwise site x site dissimilarities, and calculated correlations of the dissimilarity measures of undersampled data with complete data of sites.
Abstract: Beta diversity is a conceptual link between diversity at local and regional scales. Various additional methodologies of quantifying this and related phenomena have been applied. Among them, measures of pairwise (dis)similarity of sites are particularly popular. Undersampling, i.e. not recording all taxa present at a site, is a common situation in ecological data. Bias in many metrics related to beta diversity must be expected, but only few studies have explicitly investigated the properties of various measures under undersampling conditions. On the basis of an empirical data set, representing near-complete local inventories of the Lepidoptera from an isolated Pacific island, as well as simulated communities with varying properties, we mimicked different levels of undersampling. We used 14 different approaches to quantify beta diversity, among them dataset-wide multiplicative partitioning (i.e. ‘true beta diversity’) and pairwise site x site dissimilarities. We compared their values from incomplete samples to true results from the full data. We used these comparisons to quantify undersampling bias and we calculated correlations of the dissimilarity measures of undersampled data with complete data of sites. Almost all tested metrics showed bias and low correlations under moderate to severe undersampling conditions (as well as deteriorating precision, i.e. large chance effects on results). Measures that used only species incidence were very sensitive to undersampling, while abundance-based metrics with high dependency on the distribution of the most common taxa were particularly robust. Simulated data showed sensitivity of results to the abundance distribution, confirming that data sets of high evenness and/or the application of metrics that are strongly affected by rare species are particularly sensitive to undersampling. The class of beta measure to be used should depend on the research question being asked as different metrics can lead to quite different conclusions even without undersampling effects. For each class of metric, there is a trade-off between robustness to undersampling and sensitivity to rare species. In consequence, using incidence-based metrics carries a particular risk of false conclusions when undersampled data are involved. Developing bias corrections for such metrics would be desirable.

Journal ArticleDOI
TL;DR: This study provides a methodological framework that provides the accurate confidence intervals associated with forest AGB estimates made from inventory data and believes that in the light of the Reducing Emissions from Deforestation and Degradation debate, it is a crucial step in monitoring carbon stocks and their spatio-temporal evolution.
Abstract: Reliable above-ground biomass (AGB) estimates are required for studies of carbon fluxes and stocks. However, there is a huge lack of knowledge concerning the precision of AGB estimates and the sources of this uncertainty. At the tree level, the tree height is predicted using the tree diameter at breast height (DBH) and a height sub-model. The wood-specific gravity (WSG) is predicted with taxonomic information and a WSG sub-model. The tree mass is predicted using the predicted height, the predicted WSG and the biomass sub-model. Our models were inferred with Bayesian methods and the uncertainty propagated with a Monte Carlo scheme. The uncertainties in the predictions of tree height, tree WSG and tree mass were neglected sequentially to quantify their contributions to the uncertainty in AGB. The study was conducted in French Guiana where long-term research on forest ecosystems provided an outstanding data collection on tree height, tree dynamics, tree mass and species WSG. We found that the uncertainty in the AGB estimates was found to derive primarily from the biomass sub-model. The models used to predict the tree heights and WSG contributed negligible uncertainty to the final estimate. Considering our results, a poor knowledge of WSG and the height-diameter relationship does not increase the uncertainty in AGB estimates. However, it could lead to bias. Therefore, models and databases should be used with care. This study provides a methodological framework that can be broadly used by foresters and plant ecologist. It provides the accurate confidence intervals associated with forest AGB estimates made from inventory data. When estimating region-scale AGB values (through spatial interpolation, spatial modelling or satellite signal treatment), the uncertainty of the forest AGB value in the reference forest plots has to be taken in account. We believe that in the light of the Reducing Emissions from Deforestation and Degradation debate, our method is a crucial step in monitoring carbon stocks and their spatio-temporal evolution. (Resume d'auteur)

Journal ArticleDOI
TL;DR: It is shown that there is a close relationship between Shannon entropy and the species accumulation curve, which depicts the cumulative number of observed species as a function of sample size, and the resulting entropy estimator is nearly unbiased.
Abstract: Summary 1. Estimating Shannon entropy and its exponential from incomplete samples is a central objective of many research fields. However, empirical estimates of Shannon entropy and its exponential depend strongly on sample size and typically exhibit substantial bias. This work use san ovel method to obtain an accurate, low-bias analytic estimator of entropy, based on species frequency counts. Our estimator does not require prior knowledge of the number of species. 2. We show that there is a close relationship between Shannon entropy and the species accumulation curve, which depicts the cumulative number of observed species as a function of sample size. We reformulate entropy in terms of the expected discovery rates of new species with respect to sample size, that is, the successive slopes of the species accumulation curve. Our estimator is obtained by applying slope estimators derived from an improved Good-Turing frequency formula. Our method is also applied to estimate mutual information. 3. Extensive simulations from theoretical models and real surveys show that if sample size is not unreasonably small, the resulting entropy estimator is nearly unbiased. Our estimator generally outperforms previous methods in terms of bias and accuracy (low mean squared error) especially when species richness is large and there is a large fraction of undetected species in samples. 4. We discuss the extension of our approach to estimate Shannon entropy for multiple incidence data. The use of our estimator in constructing an integrated rarefaction and extrapolation curve of entropy (or mutual information) as a function of sample size or sample coverage (an aspect of sample completeness) is also discussed.

Journal ArticleDOI
TL;DR: Knowledge of physiological ecology is used to identify major issues confronting the modeller and to make recommendations about how energy budgets for use in ABMs should be constructed.
Abstract: Summary 1. Agent-based models (ABMs) are widely used to predict how populations respond to changing environments. As the availability of food varies in space and time, individuals should have their own energy budgets, but there is no consensus as to how these should be modelled. Here, we use knowledge of physiological ecology to identify major issues confronting the modeller and to make recommendations about how energy budgets for use in ABMs should be constructed. 2. Our proposal is that modelled animals forage as necessary to supply their energy needs for maintenance, growth and reproduction. If there is sufficient energy intake, an animal allocates the energy obtained in the order: maintenance, growth, reproduction, energy storage, until its energy stores reach an optimal level. If there is a shortfall, the priorities for maintenance and growth/reproduction remain the same until reserves fall to a critical threshold below which all are allocated to maintenance. Rates of ingestion and allocation depend on body mass and temperature. We make suggestions for how each of these processes should be modelled mathematically. 3. Mortality rates vary with body mass and temperature according to known relationships, and these can be used to obtain estimates of background mortality rate. 4. If parameter values cannot be obtained directly, then values may provisionally be obtained by parameter borrowing, pattern-oriented modelling, artificial evolution or from allometric equations. 5. The development of ABMs incorporating individual energy budgets is essential for realistic modelling of populations affected by food availability. Such ABMs are already being used to guide conservation planning of nature reserves and shell fisheries, to assess environmental impacts of building proposals including wind farms and highways and to assess the effects on nontarget organisms of chemicals for the control of agricultural pests. Keywords: bioenergetics; energy budget; individual-based models; population dynamics.