scispace - formally typeset
Open AccessProceedings ArticleDOI

Convergence of Chao Unseen Species Estimator

Reads0
Chats0
TLDR
In this article, the authors analyze the Chao estimator and show that its worst case mean squared error (MSE) is smaller than the MSE of the plug-in estimator by a factor of
Abstract
Support size estimation and the related problem of unseen species estimation have wide applications in ecology and database analysis. Perhaps the most used support size estimator is the Chao estimator. Despite its widespread use, little is known about its theoretical properties. We analyze the Chao estimator and show that its worst case mean squared error (MSE) is smaller than the MSE of the plug-in estimator by a factor of ${\mathcal{O}}\left( {{{\left( {k/n} \right)}^2}} \right)$. Our main technical contribution is a new method to analyze rational estimators for discrete distribution properties, which may be of independent interest.

read more

Citations
More filters
Book ChapterDOI

Estimating the Number of Unseen Species: How Many Words did Shakespeare Know?

TL;DR: Efron and Thisted as discussed by the authors studied the frequency distribution of words in the Shakespearean canon and found that the expected number of words that occur x ≥ 1 times in a large sample of n words is
References
More filters
Proceedings Article

Sampling-Based Estimation of the Number of Distinct Values of an Attribute

TL;DR: This appears to be the first extensive comparison of distinct-value estimators in either the database or statistical literature, and is certainly the first to use highlyskewed data of the sort frequently encountered in database applications.
Journal ArticleDOI

Rethinking microbial diversity analysis in the high throughput sequencing era.

TL;DR: An important finding of this study was the advantage of phylogenetic approaches for examining microbial communities with low sequence coverage, if the environments being compared were closely related, and a deeper sequencing would be necessary to detect the variation in the microbial composition.
Journal ArticleDOI

Predicting the molecular complexity of sequencing libraries

TL;DR: An empirical Bayesian method is introduced to accurately characterize the molecular complexity of a DNA sample for almost any sequencing application on the basis of limited preliminary sequencing.
Journal ArticleDOI

Did Shakespeare write a newly-discovered poem?

TL;DR: This paper examined the consistency of the word usage in a previously unknown nine-stanza poem attributed to Shakespeare with that of the Shakespearean canon using a nonparametric empirical Bayes model.
Journal ArticleDOI

Predicting the number of new species in further taxonomic sampling

TL;DR: Solow and Polasky as mentioned in this paper proposed a modified estimator that incorporates a measure of heterogeneity among species abundances, which is statistically justified from a Bayesian approach, although the estimator exhibits moderate negative bias for predicting larger samples in highly heterogeneous communities.