Journal Article•

# Univariate Discrete Distributions

01 Jan 2006-Journal of the American Statistical Association (American Statistical Association)-Vol. 101, Iss: 475, pp 1319

TL;DR: Alho and Spencer as discussed by the authors published a book on statistical and mathematical demography, focusing on mature population models, the particular focus of the new author (see, e.g., Caswell 2000).

Abstract: Here are two books on a topic new to Technometrics: statistical and mathematical demography. The first author of Applied Mathematical Demography wrote the first two editions of this book alone. The second edition was published in 1985. Professor Keyfritz noted in the Preface (p. vii) that at age 90 he had no interest in doing another edition; however, the publisher encouraged him to find a coauthor. The result is an additional focus for the book in the world of biology that makes it much more relevant for the sciences. The book is now part of the publisher’s series on Statistics for Biology and Health. Much of it, of course, focuses on the many aspects of human populations. The new material focuses on mature population models, the particular focus of the new author (see, e.g., Caswell 2000). As one might expect from a book that was originally written in the 1970s, it does not include a lot of information on statistical computing. The new book by Alho and Spencer is focused on putting a better emphasis on statistics in the discipline of demography (Preface, p. vii). It is part of the publisher’s Series in Statistics. The authors are both statisticians, so the focus is on statistics as used for demographic problems. The authors are targeting human applications, so their perspective on science does not extend any further than epidemiology. The book actually strikes a good balance between statistical tools and demographic applications. The authors use the first two chapters to teach statisticians about the concepts of demography. The next four chapters are very similar to the statistics content found in introductory books on survival analysis, such as the recent book by Kleinbaum and Klein (2005), reported by Ziegel (2006). The next three chapters are focused on various aspects of forecasting demographic rates. The book concludes with chapters focusing on three areas of applications: errors in census numbers, financial applications, and small-area estimates.

##### Citations

More filters

••

TL;DR: GAMLSS as discussed by the authors is a general framework for fitting regression type models where the distribution of the response variable does not have to belong to the exponential family and includes highly skew and kurtotic continuous and discrete distribution.

Abstract: GAMLSS is a general framework for fitting regression type models where the distribution of the response variable does not have to belong to the exponential family and includes highly skew and kurtotic continuous and discrete distribution. GAMLSS allows all the parameters of the distribution of the response variable to be modelled as linear/non-linear or smooth functions of the explanatory variables. This paper starts by defining the statistical framework of GAMLSS, then describes the current implementation of GAMLSS in R and finally gives four different data examples to demonstrate how GAMLSS can be used for statistical modelling.

1,196 citations

### Cites background from "Univariate Discrete Distributions"

...Johnson, Kotz, and Balakrishnan (1994, 1995); Johnson, Kotz, and Kemp (2005) are the classical reference books for most of the distributions in Tables 1 and 2....

[...]

••

TL;DR: Since its first release in 2001 as mainly a software package for phylogenetic analysis, data analysis for molecular biology and evolution (DAMBE) has gained many new functions that may be classified into six categories.

Abstract: Since its first release in 2001 as mainly a software package for phylogenetic analysis, data analysis for molecular biology and evolution (DAMBE) has gained many new functions that may be classified into six categories: 1) sequence retrieval, editing, manipulation, and conversion among more than 20 standard sequence formats including MEGA, NEXUS, PHYLIP, GenBank, and the new NeXML format for interoperability, 2) motif characterization and discovery functions such as position weight matrix and Gibbs sampler, 3) descriptive genomic analysis tools with improved versions of codon adaptation index, effective number of codons, protein isoelectric point profiling, RNA and protein secondary structure prediction and calculation of minimum folding energy, and genomic skew plots with optimized window size, 4) molecular phylogenetics including sequence alignment, testing substitution saturation, distance-based, maximum parsimony, and maximum-likelihood methods for tree reconstructions, testing the molecular clock hypothesis with either a phylogeny or with relative-rate tests, dating gene duplication and speciation events, choosing the best-fit substitution models, and estimating rate heterogeneity over sites, 5) phylogeny-based comparative methods for continuous and discrete variables, and 6) graphic functions including secondary structure display, optimized skew plot, hydrophobicity plot, and many other plots of amino acid properties along a protein sequence, tree display and drawing by dragging nodes to each other, and visual searching of the maximum parsimony tree. DAMBE features a graphic, user-friendly, and intuitive interface and is freely available from http://dambe.bio.uottawa.ca (last accessed April 16, 2013).

989 citations

••

TL;DR: In this article, the authors show that the standard bootstrap is not valid for matching estimators, even in the simple case with a single continuous covariate where the estimator is root-N consistent and asymptotically normally distributed with zero as-ymptotic bias.

Abstract: Matching estimators are widely used in empirical economics for the evaluation of programs or treatments. Researchers using matching methods often apply the bootstrap to calculate the standard errors. However, no formal justification has been provided for the use of the bootstrap in this setting. In this article, we show that the standard bootstrap is, in general, not valid for matching estimators, even in the simple case with a single continuous covariate where the estimator is root-N consistent and asymptotically normally distributed with zero asymptotic bias. Valid inferential methods in this setting are the analytic asymptotic variance estimator of Abadie and Imbens (2006a) as well as certain modifications of the standard bootstrap, like the subsampling methods in Politis and Romano (1994).

964 citations

••

TL;DR: The MinHash Alignment Process (MHAP) is introduced for overlapping noisy, long reads using probabilistic, locality-sensitive hashing and can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.

Abstract: Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of D. melanogaster revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.

886 citations

••

TL;DR: In this article, the authors used a simple experiment to show that fitting to a power law distribution by using graphical methods based on linear fit on the log-log scale is biased and inaccurate.

Abstract: This short communication uses a simple experiment to show that fitting to a power law distribution by using graphical methods based on linear fit on the log-log scale is biased and inaccurate. It shows that using maximum likelihood estimation (MLE) is far more robust. Finally, it presents a new table for performing the Kolmogorov-Smirnov test for goodness-of-fit tailored to power-law distributions in which the power-law exponent is estimated using MLE. The techniques presented here will advance the application of complex network theory by allowing reliable estimation of power-law models from data and further allowing quantitative assessment of goodness-of-fit of proposed power-law models to empirical data.

659 citations

##### References

More filters

••

TL;DR: GAMLSS as discussed by the authors is a general framework for fitting regression type models where the distribution of the response variable does not have to belong to the exponential family and includes highly skew and kurtotic continuous and discrete distribution.

Abstract: GAMLSS is a general framework for fitting regression type models where the distribution of the response variable does not have to belong to the exponential family and includes highly skew and kurtotic continuous and discrete distribution. GAMLSS allows all the parameters of the distribution of the response variable to be modelled as linear/non-linear or smooth functions of the explanatory variables. This paper starts by defining the statistical framework of GAMLSS, then describes the current implementation of GAMLSS in R and finally gives four different data examples to demonstrate how GAMLSS can be used for statistical modelling.

1,196 citations

••

TL;DR: Since its first release in 2001 as mainly a software package for phylogenetic analysis, data analysis for molecular biology and evolution (DAMBE) has gained many new functions that may be classified into six categories.

Abstract: Since its first release in 2001 as mainly a software package for phylogenetic analysis, data analysis for molecular biology and evolution (DAMBE) has gained many new functions that may be classified into six categories: 1) sequence retrieval, editing, manipulation, and conversion among more than 20 standard sequence formats including MEGA, NEXUS, PHYLIP, GenBank, and the new NeXML format for interoperability, 2) motif characterization and discovery functions such as position weight matrix and Gibbs sampler, 3) descriptive genomic analysis tools with improved versions of codon adaptation index, effective number of codons, protein isoelectric point profiling, RNA and protein secondary structure prediction and calculation of minimum folding energy, and genomic skew plots with optimized window size, 4) molecular phylogenetics including sequence alignment, testing substitution saturation, distance-based, maximum parsimony, and maximum-likelihood methods for tree reconstructions, testing the molecular clock hypothesis with either a phylogeny or with relative-rate tests, dating gene duplication and speciation events, choosing the best-fit substitution models, and estimating rate heterogeneity over sites, 5) phylogeny-based comparative methods for continuous and discrete variables, and 6) graphic functions including secondary structure display, optimized skew plot, hydrophobicity plot, and many other plots of amino acid properties along a protein sequence, tree display and drawing by dragging nodes to each other, and visual searching of the maximum parsimony tree. DAMBE features a graphic, user-friendly, and intuitive interface and is freely available from http://dambe.bio.uottawa.ca (last accessed April 16, 2013).

989 citations

••

TL;DR: In this article, the authors show that the standard bootstrap is not valid for matching estimators, even in the simple case with a single continuous covariate where the estimator is root-N consistent and asymptotically normally distributed with zero as-ymptotic bias.

Abstract: Matching estimators are widely used in empirical economics for the evaluation of programs or treatments. Researchers using matching methods often apply the bootstrap to calculate the standard errors. However, no formal justification has been provided for the use of the bootstrap in this setting. In this article, we show that the standard bootstrap is, in general, not valid for matching estimators, even in the simple case with a single continuous covariate where the estimator is root-N consistent and asymptotically normally distributed with zero asymptotic bias. Valid inferential methods in this setting are the analytic asymptotic variance estimator of Abadie and Imbens (2006a) as well as certain modifications of the standard bootstrap, like the subsampling methods in Politis and Romano (1994).

964 citations

••

TL;DR: The MinHash Alignment Process (MHAP) is introduced for overlapping noisy, long reads using probabilistic, locality-sensitive hashing and can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.

Abstract: Long-read, single-molecule real-time (SMRT) sequencing is routinely used to finish microbial genomes, but available assembly methods have not scaled well to larger genomes. We introduce the MinHash Alignment Process (MHAP) for overlapping noisy, long reads using probabilistic, locality-sensitive hashing. Integrating MHAP with the Celera Assembler enabled reference-grade de novo assemblies of Saccharomyces cerevisiae, Arabidopsis thaliana, Drosophila melanogaster and a human hydatidiform mole cell line (CHM1) from SMRT sequencing. The resulting assemblies are highly continuous, include fully resolved chromosome arms and close persistent gaps in these reference genomes. Our assembly of D. melanogaster revealed previously unknown heterochromatic and telomeric transition sequences, and we assembled low-complexity sequences from CHM1 that fill gaps in the human GRCh38 reference. Using MHAP and the Celera Assembler, single-molecule sequencing can produce de novo near-complete eukaryotic assemblies that are 99.99% accurate when compared with available reference genomes.

886 citations

••

TL;DR: In this article, the authors used a simple experiment to show that fitting to a power law distribution by using graphical methods based on linear fit on the log-log scale is biased and inaccurate.

Abstract: This short communication uses a simple experiment to show that fitting to a power law distribution by using graphical methods based on linear fit on the log-log scale is biased and inaccurate. It shows that using maximum likelihood estimation (MLE) is far more robust. Finally, it presents a new table for performing the Kolmogorov-Smirnov test for goodness-of-fit tailored to power-law distributions in which the power-law exponent is estimated using MLE. The techniques presented here will advance the application of complex network theory by allowing reliable estimation of power-law models from data and further allowing quantitative assessment of goodness-of-fit of proposed power-law models to empirical data.

659 citations