scispace - formally typeset
Search or ask a question
Book

Design and inference in finite population sampling

TL;DR: The Horvitz-Thompson Estimator as mentioned in this paper has been used extensively for small area estimation, including in the context of finite population sampling, and is a data gathering tool for sensitive characteristics.
Abstract: A Unified Setup for Probability Sampling. Inference in Finite Population Sampling. The Horvitz--Thompson Estimator. Simple Random and Allied Sampling Designs. Uses of Auxiliary Size Measures in Survey Sampling: Strategies Based on Probability Proportional to Size Schemes of Sampling. Uses of Auxiliary Size Measures in Survey Sampling: Ratio and Regression Methods of Estimation. Cluster Sampling Designs. Systematic Sampling Designs. Stratified Sampling Designs. Superpopulation Approach to Inference in Finite Population Sampling. Randomized Response: A Data--Gathering Tool for Sensitive Characteristics. Special Topics: Small Area Estimation, Nonresponse Problems, and Resampling Techniques. Author Index. Subject Index.
Citations
More filters
Journal ArticleDOI
TL;DR: A data augmentation approach to the analysis of multinomial models with unknown index that provides for a generic and efficient Bayesian implementation and three examples involving estimating the size of an animal population, estimating the number of diabetes cases in a population using the Rasch model, and the motivating example of estimating thenumber of species in an animal community with latent probabilities of species occurrence and detection are described.
Abstract: Multinomial models with unknown index (“sample size”) arise in many practical settings. In practice, Bayesian analysis of such models has proved difficult because the dimension of the parameter space is not fixed, being in some cases a function of the unknown index. We describe a data augmentation approach to the analysis of this class of models that provides for a generic and efficient Bayesian implementation. Under this approach, the data are augmented with all-zero detection histories. The resulting augmented dataset is modeled as a zero-inflated version of the complete-data model where an estimable zero-inflation parameter takes the place of the unknown multinomial index. Interestingly, data augmentation can be justified as being equivalent to imposing a discrete uniform prior on the multinomial index. We provide three examples involving estimating the size of an animal population, estimating the number of diabetes cases in a population using the Rasch model, and the motivating example of estimating t...

262 citations

Journal ArticleDOI
TL;DR: An area of common ground between statistical and nonstatistical approaches emerges in the use of statistical likelihood as a measure of support for phylogenetic hypotheses, which requires the abandonment of classical notions of confidence limits by statistically oriented systematists and the acceptance of probabilistic models and likelihood by opponents of statistical methods.
Abstract: Despite widespread use, the bootstrap remains a controversial method for assessing confidence limits in phylogenies. Opposition to its use has centered on a small set of basic philo? sophical and statistical objections that have largely gone unanswered by advocates of statistical approaches to phylogeny reconstruction. The level of generality of these objections varies greatly, however. Some of the objections are merely technical, involving problems that are found in almost all statistical tests, such as bias in small data sets. Other objections are really associated not so much with a rejection of the bootstrap but with the rejection of statistical methods in phylogeny reconstruction, which resurrects an old debate. The most relevant aspects of this debate revolve around the issue of whether or not an unknown parameter, such as a tree, can have probabilities (confidence limits) associated with it. The relevant statistical aspects are reviewed, but because this issue remains controversial within statistical theory, it is unreasonable to expect it to be anything else in phylogenetic systematics. An area of common ground between statistical and nonstatistical approaches emerges in the use of statistical likelihood as a measure of support for phylogenetic hypotheses. This common ground requires the abandonment of classical notions of confidence limits by statistically oriented systematists and the acceptance of probabilistic models and likelihood by opponents of statistical methods. There remains a small set of objections directly germane to bootstrapping phylogenies per se. These objections involve issues of random sampling and whether or not character data are independent and identically distributed (HD). Nonrandom- sample bootstrapping is discussed, as are sample designs that impose the HD assumption on characters regardless of evolutionary nonindependence and nonidentical distribution of those data. Systematists wishing to use the bootstrap have an alternative to making explicit and rather strong evolutionary assumptions; they can consider the issue of character sampling designs much more carefully. (Phylogeny; bootstrap; statistical inference; confidence; cladistics.)

210 citations

Journal ArticleDOI
01 Apr 2008-Metrika
TL;DR: In this article, the authors proposed two new models (namely, the triangular and crosswise models) for survey sampling with the sensitive characteristics, and derived the maximum likelihood estimates (MLEs) and large-sample confidence intervals for the proportion of persons with sensitive characteristic.
Abstract: Sensitive topics or highly personal questions are often being asked in medical, psychological and sociological surveys. This paper proposes two new models (namely, the triangular and crosswise models) for survey sampling with the sensitive characteristics. We derive the maximum likelihood estimates (MLEs) and large-sample confidence intervals for the proportion of persons with sensitive characteristic. The modified MLEs and their asymptotic properties are developed. Under certain optimality criteria, the designs for the cooperative parameter are provided and the sample size formulas are given. We compare the efficiency of the two models based on the variance criterion. The proposed models have four advantages: neither model requires randomizing device, the models are easy to be implemented for both interviewer and interviewee, the interviewee does not face any sensitive questions, and both models can be applied to both face-to-face personal interviews and mail questionnaires.

208 citations

Proceedings ArticleDOI
20 Oct 2008
TL;DR: This paper is the first to take a census of edge hosts in the visible Internet since 1982, to evaluate the accuracy of active probing for address census and survey, and to quantify these aspects of the Internet.
Abstract: Prior measurement studies of the Internet have explored traffic and topology, but have largely ignored edge hosts. While the number of Internet hosts is very large, and many are hidden behind firewalls or in private address space, there is much to be learned from examining the population of visible hosts, those with public unicast addresses that respond to messages. In this paper we introduce two new approaches to explore the visible Internet. Applying statistical population sampling, we use censuses to walk the entire Internet address space, and surveys to probe frequently a fraction of that space. We then use these tools to evaluate address usage, where we find that only 3.6% of allocated addresses are actually occupied by visible hosts, and that occupancy is unevenly distributed, with a quarter of responsive /24 address blocks (subnets) less than 5% full, and only 9% of blocks more than half full. We show about 34 million addresses are very stable and visible to our probes (about 16% of responsive addresses), and we project from this up to 60 million stable Internet-accessible computers. The remainder of allocated addresses are used intermittently, with a median occupancy of 81 minutes. Finally, we show that many firewalls are visible, measuring significant diversity in the distribution of firewalled block size. To our knowledge, we are the first to take a census of edge hosts in the visible Internet since 1982, to evaluate the accuracy of active probing for address census and survey, and to quantify these aspects of the Internet.

185 citations

Journal ArticleDOI
TL;DR: This work measured natural selection caused by differential viability of hybrid larvae in wild populations where native California Tiger Salamander and introduced Barred Tiger Salamanders have been hybridizing for 50–60 years and found strong evidence of hybrid vigor; mixed-ancestry genotypes had higher survival rates than genotypes containing mostly native or mostly introduced alleles.
Abstract: ‡Hybridization between differentiated lineages can have many different consequences depending on fitness variation among hybrid offspring. When introduced organisms hybridize with natives, the ensuing evolutionary dynamics may substantially complicate conservation decisions. Understanding the fitness consequences of hybridization is an important first step in predicting its evolutionary outcome and conservation impact. Here, we measured natural selection caused by differential viability of hybrid larvae in wild populations where native California Tiger Salamanders (Ambystoma californiense) and introduced Barred Tiger Salamanders (Ambystoma tigrinum mavortium) have been hybridizing for 50 – 60 years. We found strong evidence of hybrid vigor; mixed-ancestry genotypes had higher survival rates than genotypes containing mostly native or mostly introduced alleles. Hybrid vigor may be caused by heterozygote advantage (overdominance) or recombinant hybrid vigor (due to epistasis or complementation). These genetic mechanisms are not mutually exclusive, and we find statistical support for both overdominant and recombinant contributions to hybrid vigor in larval tiger salamanders. Because recombinant homozygous genotypes can breed true, a single highly fit genotype with a mosaic of native and introduced alleles may eventually replace the historically pure California Tiger Salamander (listed as Threatened under the U.S. Endangered Species Act). The management implications of this outcome are complex: Genetically pure populations may not persist into the future, but average fitness and population viability of admixed California Tiger Salamanders may be enhanced. The ecological consequences for other native species are unknown. Ambystoma fitness hybridization invasive species genetics

139 citations