Convergence of Chao Unseen Species Estimator
Nived Rajaraman,Prafulla Chandra,Andrew Thangaraj,Ananda Theertha Suresh +3 more
- pp 46-50
TLDR
In this article, the authors analyze the Chao estimator and show that its worst case mean squared error (MSE) is smaller than the MSE of the plug-in estimator by a factor ofAbstract:
Support size estimation and the related problem of unseen species estimation have wide applications in ecology and database analysis. Perhaps the most used support size estimator is the Chao estimator. Despite its widespread use, little is known about its theoretical properties. We analyze the Chao estimator and show that its worst case mean squared error (MSE) is smaller than the MSE of the plug-in estimator by a factor of ${\mathcal{O}}\left( {{{\left( {k/n} \right)}^2}} \right)$. Our main technical contribution is a new method to analyze rational estimators for discrete distribution properties, which may be of independent interest.read more
Citations
More filters
Book ChapterDOI
Estimating the Number of Unseen Species: How Many Words did Shakespeare Know?
TL;DR: Efron and Thisted as discussed by the authors studied the frequency distribution of words in the Shakespearean canon and found that the expected number of words that occur x ≥ 1 times in a large sample of n words is