scispace - formally typeset
Open AccessProceedings ArticleDOI

Convergence of Chao Unseen Species Estimator

Reads0
Chats0
TLDR
In this article, the authors analyze the Chao estimator and show that its worst case mean squared error (MSE) is smaller than the MSE of the plug-in estimator by a factor of
Abstract
Support size estimation and the related problem of unseen species estimation have wide applications in ecology and database analysis. Perhaps the most used support size estimator is the Chao estimator. Despite its widespread use, little is known about its theoretical properties. We analyze the Chao estimator and show that its worst case mean squared error (MSE) is smaller than the MSE of the plug-in estimator by a factor of ${\mathcal{O}}\left( {{{\left( {k/n} \right)}^2}} \right)$. Our main technical contribution is a new method to analyze rational estimators for discrete distribution properties, which may be of independent interest.

read more

Citations
More filters
Book ChapterDOI

Estimating the Number of Unseen Species: How Many Words did Shakespeare Know?

TL;DR: Efron and Thisted as discussed by the authors studied the frequency distribution of words in the Shakespearean canon and found that the expected number of words that occur x ≥ 1 times in a large sample of n words is
References
More filters
Journal Article

Nonparametric estimation of the number of classes in a population

TL;DR: On applique la methode d'Efron (1981, 1982) a la construction d'intervalles de confiance bases sur des distributions du bootstrap as discussed by the authors.
Journal ArticleDOI

Bacterial Diversity in Human Subgingival Plaque

TL;DR: The purpose of this study was to determine the bacterial diversity in the human subgingival plaque by using culture-independent molecular methods as part of an ongoing effort to obtain full 16S rRNA sequences for all cultivable and not-yet-cultivated species of human oral bacteria.
Journal ArticleDOI

Models and estimators linking individual-based and sample-based rarefaction, extrapolation and comparison of assemblages

TL;DR: In this paper, the authors provide new unconditional variance estimators for classical, individual-based rarefaction and for Coleman Rarefaction under two sampling models: sampling-theoretic predictors for the number of species in a larger sample (multinomial model), a larger area (Poisson model) or a larger number of sampling units (Bernoulli product model), based on an estimate of asymptotic species richness.
Journal ArticleDOI

Counting the Uncountable: Statistical Approaches to Estimating Microbial Diversity.

TL;DR: New genetic techniques have revealed extensive microbial diversity that was previously undetected with culture-dependent methods and morphological methods, which have revealed how well a sample reflects a community's “true” diversity.
Proceedings ArticleDOI

A large-scale study of web password habits

TL;DR: The study involved half a million users over athree month period and gets extremely detailed data on password strength, the types and lengths of passwords chosen, and how they vary by site.