scispace - formally typeset
Book ChapterDOI

Estimating the Number of Unseen Species: How Many Words did Shakespeare Know?

Peter McCullagh
- pp 104-118
TLDR
Efron and Thisted as discussed by the authors studied the frequency distribution of words in the Shakespearean canon and found that the expected number of words that occur x ≥ 1 times in a large sample of n words is
Abstract
This paper is the first of two written by Brad Efron and Ron Thisted studying the frequency distribution of words in the Shakespearean canon. The key idea due to Fisher in the context of sampling of species is simple and elegant. When applied to Shakespeare the idea appears to be preposterous: an author has a personal vocabulary of word species represented by a distribution G, and text is generated by sampling from this distribution. Most results do not require successive words to be sampled independently, which leaves room for individual style and context, but stationarity is needed for prediction and inference. The expected number of words that occur x ≥ 1 times in a large sample of n words is

read more

Citations
More filters
Journal ArticleDOI

Comprehensive assessment of T-cell receptor β-chain diversity in αβ T cells

TL;DR: A novel experimental and computational approach is developed to measure TCR CDR3 diversity based on single-molecule DNA sequencing, and it is found that total TCRbeta receptor diversity is at least 4-fold higher than previous estimates, and the diversity in the subset of CD45RO(+) antigen-experienced alphabeta T cells is at at least 10-foldHigher than previously estimates.
Proceedings ArticleDOI

A large-scale study of web password habits

TL;DR: The study involved half a million users over athree month period and gets extremely detailed data on password strength, the types and lengths of passwords chosen, and how they vary by site.
BookDOI

Handbook of Capture-Recapture Analysis

TL;DR: This book aims to bridge the gap between field-based biologists and statisticians as new methods are developed to deal with more complex data by helping biologists understand state-of-the-art statistical methods for capture–recapture analysis.
Journal ArticleDOI

Age-Related Decrease in TCR Repertoire Diversity Measured with Deep and Normalized Sequence Profiling

TL;DR: It is demonstrated that TCR β diversity per 106 T cells decreases roughly linearly with age, with significant reduction already apparent by age 40, and the percentage of naive T cells showed a strong correlation with measured TCR diversity and decreased linearly up to age 70.
References
More filters
Journal ArticleDOI

The Relation Between the Number of Species and the Number of Individuals in a Random Sample of an Animal Population

TL;DR: It is shown that in a large collection of Lepidoptera captured in Malaya the frequency of the number of species represented by different numbers of individuals fitted somewhat closely to a hyperbola type of curve, so long as only the rarer species were considered.
Journal ArticleDOI

The sampling theory of selectively neutral alleles.

TL;DR: This paper considers deductive and subsequently inductive questions relating to a sample of genes from a selectively neutral locus, and the test of the hypothesis that the alleles being sampled are indeed selectively neutral will be considered.
BookDOI

Combinatorial Stochastic Processes

Jim Pitman
TL;DR: In this paper, the Brownian forest and the additive coalescent were constructed for random walks and random forests, respectively, and the Bessel process was used for random mappings.
Journal ArticleDOI

Comprehensive assessment of T-cell receptor β-chain diversity in αβ T cells

TL;DR: A novel experimental and computational approach is developed to measure TCR CDR3 diversity based on single-molecule DNA sequencing, and it is found that total TCRbeta receptor diversity is at least 4-fold higher than previous estimates, and the diversity in the subset of CD45RO(+) antigen-experienced alphabeta T cells is at at least 10-foldHigher than previously estimates.
Proceedings ArticleDOI

A large-scale study of web password habits

TL;DR: The study involved half a million users over athree month period and gets extremely detailed data on password strength, the types and lengths of passwords chosen, and how they vary by site.