scispace - formally typeset
Search or ask a question
Topic

Population proportion

About: Population proportion is a research topic. Over the lifetime, 247 publications have been published within this topic receiving 4099 citations.


Papers
More filters
Book ChapterDOI
01 Jan 2005
TL;DR: In this paper, the authors address the statistical questions about appropriate sample size, and the goal of the study is to distinguish between hypotheses about the value of a parameter or function of parameters, or is the goal to provide a confidence interval estimate of the parameter such as the odds ratio or relative risk.
Abstract: When planning a research project an epidemiologist must consider how many subjects should be studied. While factors such as available budget certainly present constraints on the maximum-number of subjects that might actually be included in a study, statistical considerations are extremely important. To address the statistical questions about appropriate sample size, the researcher must first specify the study design, the nature of the outcome variable, the aims of the study, the planned analysis method, and the expected results of the study. Is the goal of the study to distinguish between hypotheses about the value of a parameter or function of parameters, or is the goal to provide a confidence interval estimate of a parameter such as the odds ratio or relative risk?

17 citations

Journal ArticleDOI
TL;DR: In this article, a Bayesian nonignorable selection model is proposed to estimate a finite population proportion using data from a possibly biased sample, assuming that the binary responses are independent and identically distributed Bernoulli random variables.

16 citations

Journal ArticleDOI
TL;DR: The authors showed that linear regression provides a consistent estimator of the population average treatment effect on the treated times the population proportion of the nontreated individuals plus the average treatment effects on the non-treated times the percentage of the treated individuals.
Abstract: In this paper I provide new evidence on the implications of treatment effect heterogeneity for least squares estimation when the effects are inappropriately assumed to be homogenous. I prove that under a set of benchmark assumptions linear regression provides a consistent estimator of the population average treatment effect on the treated times the population proportion of the nontreated individuals plus the population average treatment effect on the nontreated times the population proportion of the treated individuals. Consequently, in many empirical applications the linear regression estimates might not be close to any of the standard average treatment effects of interest.

16 citations

Journal ArticleDOI
TL;DR: A novel sampling model, called feature preserved sampling (FPS) that sequentially generates a high-quality sample over sliding windows that can be applied to infinite streams and finite datasets equally, and the generated samples can be used for various applications.
Abstract: In this article, we explore a novel sampling model, called feature preserved sampling (FPS) that sequentially generates a high-quality sample over sliding windows. The sampling quality we consider refers to the degree of consistency between the sample proportion and the population proportion of each attribute value in a window. Due to the time-variant nature of real-world datasets, users are more likely to be interested in the most recent data. However, previous works have not been able to generate a high-quality sample over sliding windows that precisely preserves up-to-date population characteristics. Motivated by this shortcoming, we have developed the FPS algorithm, which has several advantages: (1) it sequentially generates a sample from a time-variant data source over sliding windows; (2) the execution time of FPS is linear with respect to the database size; (3) the relative proportional differences between the sample proportions and population proportions of most distinct attribute values are guaranteed to be below a specified error threshold, e, while the relative proportion differences of the remaining attribute values are as close to e as possible, which ensures that the generated sample is of high quality; (4) the sample rate is close to the user specified rate so that a high quality sampling result can be obtained without increasing the sample size; (5) by a thorough analytical and empirical study, we prove that FPS has acceptable space overheads, especially when the attribute values have Zipfian distributions, and FPS can also excellently preserve the population proportion of multivariate features in the sample; and (6) FPS can be applied to infinite streams and finite datasets equally, and the generated samples can be used for various applications. Our experiments on both real and synthetic data validate that FPS can effectively obtain a high quality sample of the desired size. In addition, while using the sample generated by FPS in various mining applications, a significant improvement in efficiency can be achieved without compromising the model's precision.

15 citations

Journal ArticleDOI
TL;DR: Simulations show that some estimators, including the commonly-used plug-in maximum likelihood estimator, can have substantial bias for small or moderate sample sizes, and an adjustment is proposed that ensures estimates are always credible.

15 citations


Network Information
Related Topics (5)
Sample size determination
21.3K papers, 961.4K citations
73% related
Nonparametric statistics
19.9K papers, 844.1K citations
71% related
Multivariate statistics
18.4K papers, 1M citations
69% related
Missing data
21.3K papers, 784.9K citations
69% related
Regression analysis
31K papers, 1.7M citations
68% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202112
202017
201914
201813
201713
201613