scispace - formally typeset
Search or ask a question

Showing papers by "Anne Chao published in 2016"


Journal ArticleDOI
TL;DR: In this article, the authors present an R package iNEXT (iNterpolation/EXTrapolation) which provides simple functions to compute and plot the seamless rarefaction and extrapolation sampling curves for the three most widely used members of the Hill number family.
Abstract: Summary Hill numbers (or the effective number of species) have been increasingly used to quantify the species/taxonomic diversity of an assemblage. The sample-size- and coverage-based integrations of rarefaction (interpolation) and extrapolation (prediction) of Hill numbers represent a unified standardization method for quantifying and comparing species diversity across multiple assemblages. We briefly review the conceptual background of Hill numbers along with two approaches to standardization. We present an R package iNEXT (iNterpolation/EXTrapolation) which provides simple functions to compute and plot the seamless rarefaction and extrapolation sampling curves for the three most widely used members of the Hill number family (species richness, Shannon diversity and Simpson diversity). Two types of biodiversity data are allowed: individual-based abundance data and sampling-unit-based incidence data. Several applications of the iNEXT packages are reviewed: (i) Non-asymptotic analysis: comparison of diversity estimates for equally large or equally complete samples. (ii) Asymptotic analysis: comparison of estimated asymptotic or true diversities. (iii) Assessment of sample completeness (sample coverage) across multiple samples. (iv) Comparison of estimated point diversities for a specified sample size or a specified level of sample coverage. Two examples are demonstrated, using the data (one for abundance data and the other for incidence data) included in the package, to illustrate all R functions and graphical displays.

2,170 citations


OtherDOI
05 Aug 2016
TL;DR: In this article, the authors present two approaches to infer species richness and make fair comparisons among multiple assemblages based on possibly unequal-sampling effort and incomplete samples that miss many rare species.
Abstract: On the basis of the sampling data from an assemblage, estimation of species richness (observed plus undetected) is statistically difficult especially for highly-diverse assemblages with many rare species. Simple counts of species richness in samples typically underestimate and strongly depend on sampling effort and sample completeness. There are two approaches to infer species richness and make fair comparisons among multiple assemblages based on possibly unequal-sampling effort and incomplete samples that miss many species. (1) An asymptotic approach: this approach compares the estimated asymptotes of species accumulation curves. It is based on statistical sampling-theory methods of estimating species richness. Both parametric and nonparametric methods are reviewed. We focus on the nonparametric estimators which are universally valid for all species abundance distributions. (2) A non-asymptotic approach: this approach compares the estimated species richnesses of standardized samples with a common finite sample size or sample completeness. It is based on the seamless sample-sizeand coverage-based rarefaction and extrapolation sampling curves. This approach aims to compare species richness estimates for equally-large or equallycomplete samples. These two approaches allow researchers to efficiently use all data to make robust and detailed inferences about species richness. Two R packages (SpadeR and iNEXT) are applied to rainforest tree data for illustration. Species richness (i.e., the number of species) is the simplest, most intuitive and most frequently used measure for characterizing the diversity of an assemblage (see Diversity measures). Species richness possesses intuitive mathematical properties, and features prominently in foundational models of community ecology. In biogeographic studies, species range maps and local and regional floras and faunas generally provide only species presence-absence information for each locality. For these studies, species richness thus becomes the only measure that can be used to quantify diversity. Even when species abundances are available,

218 citations


Reference EntryDOI
16 May 2016
TL;DR: This work focuses on the nonparametric estimators that are universally valid for all species abundance distributions and thus are more robust than parametric estimator that are based on specified parametric abundance models.
Abstract: Species richness (the number of species) in an assemblage is a key metric in many research fields of ecology. Simple counts of species in samples typically underestimate the true species richness and strongly depend on sampling effort and sample completeness. Based on possibly unequal-sampling effort and incomplete samples that miss many species, there are two approaches to infer species richness and make fair comparisons among multiple assemblages,: (1) An asymptotic approach via species richness estimation. This approach aims to compare species richness estimates across assemblages. We focus on the nonparametric estimators that are universally valid for all species abundance distributions. (2) A non-asymptotic approach via the sample-size- and coverage-based rarefaction and extrapolation on the basis of standardised sample size or sample completeness (as measured by sample coverage). This approach aims to compare species richness estimates for equally large or equally complete samples. Two R packages (SpadeR and iNEXT) are applied to beetle data for illustration. Key Concepts Due to sampling limitation, there are undetected species in almost every biodiversity survey. Empirical species counts underestimate species richness and highly depend on sampling efforts and sample completeness. Based on incomplete samples, species richness (observed plus undetected) is statistically difficult to estimate accurately especially for highly diverse assemblages with many rare species. Abundant species (which are certain to be detected in samples) contain almost no information about the undetected species richness. Rare species (which are likely to be either undetected or infrequently detected) contain nearly all the information about the undetected species richness. Most nonparametric estimators of the number of undetected species are based on the frequency counts of the detected rare species, e.g. singletons and doubletons for abundance data. Nonparametric estimators of species richness are universally valid for all species abundance distributions and thus are more robust than parametric estimators that are based on specified parametric abundance models. Rarefaction and extrapolation methods allow for fair and meaningful comparison of species richness estimates for standardised samples on the basis of sample size or sample completeness. Sample-size-based rarefaction and extrapolation methods aim to compare species richness estimates for equally large samples determined by samplers. Coverage-based rarefaction and extrapolation methods aim to compare species richness estimates for equally complete samples or equal fractions of population individuals reliably estimated from data. Keywords: abundance data; diversity; extrapolation; incidence data; interpolation; prediction; rarefaction; sample coverage; species richness; standardisation

114 citations


Journal ArticleDOI
01 Feb 2016-PeerJ
TL;DR: In the two approaches, replacing the spurious singleton count by the estimated count can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in the simulation results and in applying the method to analyze sequencing data from viral metagenomes.
Abstract: Estimating and comparing microbial diversity are statistically challenging due to limited sampling and possible sequencing errors for low-frequency counts, producing spurious singletons. The inflated singleton count seriously affects statistical analysis and inferences about microbial diversity. Previous statistical approaches to tackle the sequencing errors generally require different parametric assumptions about the sampling model or about the functional form of frequency counts. Different parametric assumptions may lead to drastically different diversity estimates. We focus on nonparametric methods which are universally valid for all parametric assumptions and can be used to compare diversity across communities. We develop here a nonparametric estimator of the true singleton count to replace the spurious singleton count in all methods/approaches. Our estimator of the true singleton count is in terms of the frequency counts of doubletons, tripletons and quadrupletons, provided these three frequency counts are reliable. To quantify microbial alpha diversity for an individual community, we adopt the measure of Hill numbers (effective number of taxa) under a nonparametric framework. Hill numbers, parameterized by an order q that determines the measures' emphasis on rare or common species, include taxa richness (q = 0), Shannon diversity (q = 1, the exponential of Shannon entropy), and Simpson diversity (q = 2, the inverse of Simpson index). A diversity profile which depicts the Hill number as a function of order q conveys all information contained in a taxa abundance distribution. Based on the estimated singleton count and the original non-singleton frequency counts, two statistical approaches (non-asymptotic and asymptotic) are developed to compare microbial diversity for multiple communities. (1) A non-asymptotic approach refers to the comparison of estimated diversities of standardized samples with a common finite sample size or sample completeness. This approach aims to compare diversity estimates for equally-large or equally-complete samples; it is based on the seamless rarefaction and extrapolation sampling curves of Hill numbers, specifically for q = 0, 1 and 2. (2) An asymptotic approach refers to the comparison of the estimated asymptotic diversity profiles. That is, this approach compares the estimated profiles for complete samples or samples whose size tends to be sufficiently large. It is based on statistical estimation of the true Hill number of any order q ≥ 0. In the two approaches, replacing the spurious singleton count by our estimated count, we can greatly remove the positive biases associated with diversity estimates due to spurious singletons and also make fair comparisons across microbial communities, as illustrated in our simulation results and in applying our method to analyze sequencing data from viral metagenomes.

75 citations


Journal ArticleDOI
TL;DR: A bridge is established by extending and modifying each approach so that both lead to the same classes of similarity/differentiation measures, which range in the interval [0, 1] and which can be compared across multiple sets of communities.
Abstract: Summary There are many concepts and measures of beta diversity and related similarity/differentiation indices. The variance framework (derived from the total variance of a community species abundance matrix) and diversity decomposition (based on partitioning gamma diversity into alpha and beta components) are two major approaches. There have been no bridges/links between the two approaches. Here, we establish a bridge by extending and modifying each approach so that both lead to the same classes of similarity/differentiation measures, which range in the interval [0, 1] and which can be compared across multiple sets of communities. Our extension/modification in each approach is based on the following major differences between the two approaches. (i) In the decomposition approach, a diversity order q that controls sensitivity to species abundances is used, whereas there is no such order involved in the variance approach. (ii) Transformations of raw abundances are typically used in the variance approach, whereas abundances are not transformed in diversity decomposition. (iii) The variance-based beta for non-transformed data is implicitly related to (and constrained by) alpha, gamma and total abundance. Namely, the attained maximum value of this beta when communities are completely distinct (no shared species) is not a fixed constant; the maximum varies with alpha, gamma and total abundance. By contrast, the beta component obtained from the multiplicative decomposition is not constrained by alpha, gamma and total abundance. To construct the bridge, we extend the variance of community data to a class of divergence measures (parameterized by an order q) and use normalization to remove these measures' constraints by alpha, gamma and total abundance. The resulting normalized divergence measures are legitimate differentiation measures. In the decomposition approach, we adopt a modified multiplicative decomposition scheme; the resulting beta component can be transformed to quantify compositional similarity/differentiation among communities. Then, the similarity/differentiation measures obtained from the extended variance framework turn out to be identical to those from the modified diversity decomposition, establishing the bridge. Other types of similarity/differentiation measures (e.g. N-community Bray–Curtis type) and extension to phylogenetic and functional versions are discussed. A real example using corals is given for illustration.

68 citations


Book ChapterDOI
01 Jan 2016
TL;DR: The framework is applied to a real dataset to illustrate how to use phylogenetic diversity profiles to completely convey species abundances and phylogenetic information among species in an assemblage; and how toUse phylogenetic similarity (or differentiation) profiles to assess phylogenetic resemblance or difference among multiple assemblages.
Abstract: Conservation biologists need robust, intuitive mathematical tools to quantify and assess patterns and changes in biodiversity. Here we review some commonly used abundance-based species diversity measures and their phylogenetic generalizations. Most of the previous abundance-sensitive measures and their phylogenetic generalizations lack an essential property, the replication principle or doubling property. This often leads to inconsistent or counter-intuitive interpretations, especially in conservation applications. Hill numbers or the “effective number of species” obey the replication principle and thus resolve many of the interpretational problems. Hill numbers were recently extended to incorporate phylogeny; the resulting measures take into account phylogenetic differences between species while still satisfying the replication principle. We review the framework of phylogenetic diversity measures based on Hill numbers and their decomposition into independent alpha and beta components. Both additive and multiplicative decompositions lead to the same classes of normalized phylogenetic similarity or differentiation measures. These classes include multiple-assemblage phylogenetic generalizations of the Jaccard, Sorensen, Horn and Morisita-Horn measures. For two assemblages, these classes also include the commonly used UniFrac and PhyloSor indices as special cases. Our approach provides a mathematically rigorous, self-consistent, ecologically meaningful set of tools for conservationists who must assess the phylogenetic diversity and complementarity of potential protected areas. Our framework is applied to a real dataset to illustrate (i) how to use phylogenetic diversity profiles to completely convey species abundances and phylogenetic information among species in an assemblage; and (ii) how to use phylogenetic similarity (or differentiation) profiles to assess phylogenetic resemblance or difference among multiple assemblages.

43 citations


Journal ArticleDOI
TL;DR: A unified approach to assessing and comparing species/taxonomic diversity and phylogenetic diversity can be established by developing both theoretical formulae and analytic estimators for seamless rarefaction and extrapolation for this class of abundance‐sensitive phylogenetic measures, which includes simple transformations of phylogenetic entropy and of quadratic entropy.
Abstract: Measures of phylogenetic diversity are basic tools in many studies of systematic biology. Faith’s PD (sum of branch lengths of a phylogenetic tree connecting all focal species) is the most widely used phylogenetic measure. Like species richness, Faith’s PD based on sampling data is highly dependent on sample size and sample completeness. The sample-size- and sample-coverage-based integration of rarefaction and extrapolation of Faith’s PD was recently developed to make fair comparison across multiple assemblages. However, species abundances are not considered in Faith’s PD. Based on the framework of Hill numbers, Faith’s PD was generalized to a class of phylogenetic diversity measures that incorporates species abundances. In this article, we develop both theoretical formulae and analytic estimators for seamless rarefaction and extrapolation for this class of abundance-sensitive phylogenetic measures, which includes simple transformations of phylogenetic entropy and of quadratic entropy. This work generalizes the previous rarefaction/extrapolation model of Faith’s PD to incorporate species abundance, and also extends the previous rarefaction/extrapolation model of Hill numbers to include phylogenetic differences among species. Thus a unified approach to assessing and comparing species/taxonomic diversity and phylogenetic diversity can be established. A bootstrap method is suggested for constructing confidence intervals around the phylogenetic diversity, facilitating the comparison of multiple assemblages. Our formulation and estimators can be extended to incidence data collected from multiple sampling units. We also illustrate the formulae and estimators using bacterial sequence data from the human distal esophagus and phyllostomid bat data from three habitats.

28 citations


Journal ArticleDOI
TL;DR: In this paper, the authors tested Mason's hypothesis for the first time, using a sample of 1,056 Paleoindian points from eastern North America and employing paradigmatic classification and rigorous statistical tools used in the quantification of ecological biodiversity.
Abstract: Ronald Mason’s hypothesis from the 1960s that the southeastern United States possesses greater Paleoindian projectile-point diversity than other regions is regularly cited, and often assumed to be true, but in fact has never been quantitatively tested. Even if valid, however, the evolutionary meaning of this diversity is contested. Point diversity is often linked to Clovis “origins,” but point diversity could also arise from group fissioning and drift, admixture, adaptation, or multiple founding events, among other possibilities. Before archaeologists can even begin to discuss these scenarios, it is paramount to ensure that what we think we know is representative of reality. To this end, we tested Mason’s hypothesis for the first time, using a sample of 1,056 Paleoindian points from eastern North America arui employing paradigmatic classification and rigorous statistical tools used in the quantification of ecological biodiversity. Our first set of analyses, which compared the Southeast to the Northeast, showed that the Southeast did indeed possess significantly greater point-class richness. Although this result was consistent with Mason’s hypothesis, our second set of analyses, which compared the Upper Southeast to the Lower Southeast and the Northeast showed that in terms of point-class richness the Upper Southeast > Lower Southeast > Northeast. Given current chronometrie evidence, we suggest that this latter result is consistent with the suggestion that the area of the Ohio, Cumberland, and Tennessee River valleys, as well as the mid-Atlantic coastal plain, were possible initial and secondary “staging areas” for colonizing Paleoindian foragers moving from western to eastern North America.

27 citations


Journal ArticleDOI
TL;DR: This article applied sample size and coverage-based rarefaction to analyse the elevational richness pattern in New Caledonian tree communities and suggested pooling small plot data to effectively assess/detect the diversity pattern.
Abstract: Ibanez et al. (Journal of Vegetation Science, this issue) applied sample size- and coverage-based rarefaction to analyse the elevational richness pattern in New Caledonian tree communities. We comment on the statistical assumptions behind rarefaction/extrapolation and suggest pooling small plot data to effectively assess/detect the diversity pattern. Broadening the analysis to include abundance-sensitive diversity measures and phylogenetic information can provide important additional insights.

27 citations


Journal ArticleDOI
TL;DR: Next-generation sequencing was used to characterize the CDR-H3 sequences in paired siblings of 4 families in which only one member of each pair had chronic HBV infection and revealed a huge network of sequence-related CDR -H3 clones found almost exclusively among carriers.
Abstract: The repertoire of IgG antibody responses to infection and vaccination varies depending on the characteristics of the immunogen and the ability of the host to mount a protective immune response. Chronic hepatitis B virus (HBV) infections are marked by persistent infection and immune tolerance to vaccination. This disease offers a unique opportunity to discover key repertoire signatures during infection and in response to vaccination. Complementarity determining region 3 of an antibody heavy chain (CDR-H3) has a major impact on the antigenic specificity of an antibody. We used next-generation sequencing to characterize the CDR-H3 sequences in paired siblings of 4 families in which only one member of each pair had chronic HBV infection. Blood samples were obtained before and 2 weeks after HBV vaccination. The analysis revealed a huge network of sequence-related CDR-H3 clones found almost exclusively among carriers. In contrast, vaccination induced significant increases of CDR-H3 cluster diversities among siblings without hepatitis B. Several vaccination-associated clone clusters were identified. Similar findings of vaccination-associated clone networks were observed in healthy adults receiving HBV boosters. These strategies can be used to identify signatures of other infectious diseases and accelerate discoveries of antibody sequences with important biomedical implications.

16 citations



Journal ArticleDOI
TL;DR: The paper “Evaluation of tracheal intubation: A retrospective study of skill acquisition by medical students in the operating theater” which was published recently in the Journal of the Formosan Medical Association raised three issues.