scispace - formally typeset
Search or ask a question

Showing papers on "Population proportion published in 2009"


Book
01 Jan 2009
TL;DR: In this article, the authors present a method for estimating the probability of an event based on the distribution of the probability distribution of two populations in the sample set, and the confidence interval for the difference between two populations.
Abstract: Preface. 1. Introduction To Biostatistics. 1.1 Introduction. 1.2 Some Basic Concepts. 1.3 Measurement and Measurement Scales. 1.4 Sampling and Statistical Inference. 1.5 The Scientific Method and the Design of Experiments. 1.6 Computers and Biostatistical Analysis. 1.7 Summary. Review Questions and Exercises. References. 2. Descriptive Statistics. 2.1 Introduction. 2.2 The Ordered Array. 2.3 Grouped Data: The Frequency Distribution. 2.4 Descriptive Statistics: Measures of Central Tendency. 2.5 Descriptive Statistics: Measures of Dispersion. 2.6 Summary. Review Questions and Exercises. References. 3. Some Basic Probability Concepts. 3.1 Introduction. 3.2 Two Views of Probability: Objective and Subjective. 3.3 Elementary Properties of Probability. 3.4 Calculating the Probability of an Event. 3.5 Bayes' Theorem, Screening Tests, Sensitivity, Specificity, and Predictive Value Positive and Negative. Summary. Review Questions and Exercises. References. 4. Probability Distributions. 4.1 Introduction. 4.2 Probability Distributions of Discrete Variables. 4.3 The Binomial Distribution. 4.4 The Poisson Distribution. 4.5 Continuous Probability Distributions. 4.6 The Normal Distribution. 4.7 Normal Distribution Applications. 4.8 Summary. Review Questions and Exercises. References. 5. Some Important Sampling Distributions. 5.1 Introduction. 5.2 Sampling Distributions. 5.3 Distribution of the Sample Mean. 5.4 Distribution of the Difference Between Two Sample Means. 5.5 Distribution of the Sample Proportion. 5.6 Distribution of the Difference Between Two Sample Proportions. 5.7 Summary. Review Questions and Exercises. References. 6. Estimation. 6.1 Introduction. 6.2 Confidence Interval for a Population Mean. 6.3 The t Distribution. 6.4 Confidence Interval for the Difference Between Two Population Means. 6.5 Confidence Interval for a Population Proportion. 6.6 Confidence Interval for the Difference Between Two Population Proportions. 6.7 Determination of Sample Size for Estimating Means. 6.8 Determination of Sample Size for Estimating Proportions. 6.9 Confidence Interval for the Variance of a Normally Distributed Population. 6.10 Confidence Interval for the Ratio of the Variances of Two Normally Distributed Populations. 6.11 Summary. Review Questions and Exercises. References. 7. Hypothesis Testing. 7.1 Introduction. 7.2 Hypothesis Testing: A Single Population Mean. 7.3 Hypothesis Testing: The Difference Between Two Population Means. 7.4 Paired Comparisons. 7.5 Hypothesis Testing: A Single Population Proportion. 7.6 Hypothesis Testing: The Difference Between Two Population Proportions. 7.7 Hypothesis Testing: A Single Population Variance. 7.8 Hypothesis Testing: The Ratio of Two Population Variances. 7.9 The Type II Error and the Power of a Test. 7.10 Determining Sample Size to Control Type II Errors. 7.11 Summary. Review Questions and Exercises. References. 8. Analysis Of Variance. 8.1 Introduction. 8.2 The Completely Randomized Design. 8.3 The Randomized Complete Block Design. 8.4 The Repeated Measures Design. 8.5 The Factorial Experiment. 8.6 Summary. Review Questions and Exercises. References. 9. Simple Linear Regression And Correlation. 9.1 Introduction. 9.2 The Regression Model. 9.3 The Sample Regression Equation. 9.4 Evaluating the Regression Equation. 9.5 Using the Regression Equation. 9.6 The Correlation Model. 9.7 The Correlation Coefficient. 9.8 Some Precautions. 9.9 Summary. Review Questions and Exercises. References. 10. Multiple Regression And Correlation. 10.1 Introduction. 10.2 The Multiple Linear Regression Model. 10.3 Obtaining the Multiple Regression Equation. 10.4 Evaluating the Multiple Regression Equation. 10.5 Using the Multiple Regression Equation. 10.6 The Multiple Correlation Model. 10.7 Summary. Review Questions and Exercises. References. 11. Regression Analysis: Some Additional Techniques. 11.1 Introduction. 11.2 Qualitative Independent Variables. 11.3 Variable Selection Procedures. 11.4 Logistic Regression. 11.5 Summary. Review Questions and Exercises. References. 12. The Chi-Square Distribution And The Analysis Of Frequencies. 12.1 Introduction. 12.2 The Mathematical Properties of the Chi-Square Distribution. 12.3 Tests of Goodness-of-Fit. 12.4 Tests of Independence. 12.5 Tests of Homogeneity. 12.6 The Fisher Exact Test. 12.7 Relative Risk, Odds Ratio, and the Mantel-Haenszel Statistic. 12.8 Survival Analysis. 12.9 Summary. Review Questions and Exercises. References. 13. Nonparametric And Distribution-Free Statistics. 13.1 Introduction. 13.2 Measurement Scales. 13.3 The Sign Test. 13.4 The Wilcoxon Signed-Rank Test for Location. 13.5 The Median Test. 13.6 The Mann-Whitney Test. 13.7 The Kolmogorov-Smirnov Goodness-of-Fit Test. 13.8 The Kruskal-Wallis One-Way Analysis of Variance by Ranks. 13.9 The Friedman Two-Way Analysis of Variance by Ranks. 13.10 The Spearman Rank Correlation Coefficient. 13.11 Nonparametric Regression Analysis. 13.12 Summary. Review Questions and Exercises. References. 14. Vital Statistics. 14.1 Introduction. 14.2 Death Rates and Ratios. 14.3 Measures of Fertility. 14.4 Measures of Morbidity. 14.5 Summary. Review Questions and Exercises. References. Appendix. Statistical Tables. Answers To Odd-Numbered Exercises. Index.

116 citations


Journal ArticleDOI
TL;DR: In this article, the bias of the maximum likelihood estimator when testing groups of different sizes, using fixed and sequential procedures, was investigated, and the possibility of obtaining all positive groups contributes substantially to the bias.
Abstract: Summary In the assessment of disease, estimation of the proportion of infected units in a population can sometimes be facilitated by pooling units into groups for testing Such group testing was used in a study of virus infection levels in carnation plants grown in glasshouses In group testing problems, the maximum likelihood estimator is a biased estimator of the population proportion We investigate the bias of the maximum likelihood estimator when testing groups of different size, using fixed and sequential procedures The possibility of obtaining all positive groups contributes substantially to the bias Analytical methods are shown to correct the bias for fixed procedures satisfactorily For sequential procedures, with their uneven bias patterns, we propose a numerical method of correction which produces an almost unbiased estimator

58 citations


Journal ArticleDOI
TL;DR: A novel sampling model, called feature preserved sampling (FPS) that sequentially generates a high-quality sample over sliding windows that can be applied to infinite streams and finite datasets equally, and the generated samples can be used for various applications.
Abstract: In this article, we explore a novel sampling model, called feature preserved sampling (FPS) that sequentially generates a high-quality sample over sliding windows. The sampling quality we consider refers to the degree of consistency between the sample proportion and the population proportion of each attribute value in a window. Due to the time-variant nature of real-world datasets, users are more likely to be interested in the most recent data. However, previous works have not been able to generate a high-quality sample over sliding windows that precisely preserves up-to-date population characteristics. Motivated by this shortcoming, we have developed the FPS algorithm, which has several advantages: (1) it sequentially generates a sample from a time-variant data source over sliding windows; (2) the execution time of FPS is linear with respect to the database size; (3) the relative proportional differences between the sample proportions and population proportions of most distinct attribute values are guaranteed to be below a specified error threshold, e, while the relative proportion differences of the remaining attribute values are as close to e as possible, which ensures that the generated sample is of high quality; (4) the sample rate is close to the user specified rate so that a high quality sampling result can be obtained without increasing the sample size; (5) by a thorough analytical and empirical study, we prove that FPS has acceptable space overheads, especially when the attribute values have Zipfian distributions, and FPS can also excellently preserve the population proportion of multivariate features in the sample; and (6) FPS can be applied to infinite streams and finite datasets equally, and the generated samples can be used for various applications. Our experiments on both real and synthetic data validate that FPS can effectively obtain a high quality sample of the desired size. In addition, while using the sample generated by FPS in various mining applications, a significant improvement in efficiency can be achieved without compromising the model's precision.

15 citations


Journal ArticleDOI
TL;DR: This paper investigated the effect of imperfection in rankings on unbalanced ranked set sampling for binary variables and provided methods to obtain estimates for the probabilities of success for the judgment order statistics using training samples so that Neyman allocation can be implemented.
Abstract: The application of unbalanced ranked set sampling (RSS) to estimation of a population proportion has been studied for the perfect ranking situation. When the rankings are not perfect, the probabilities of success ranks for the judgment order statistics incorporate information on ranking errors as well as ranks. The objective of this article is to investigate the ranking errors effect of imperfection in rankings on unbalanced RSS for binary variables and provide methods to obtain estimates for the probabilities of success for the judgment order statistics using training samples so that Neyman allocation can be implemented. We also use a substantial data set, the NHANES III data, to demonstrate the feasibility and benefits of Neyman allocation in RSS for binary variables in the case of imperfect rankings.

12 citations


01 Jan 2009
TL;DR: In this paper, the authors developed the Bayes estimator of the population proportion of a sensitive characteristic when data are obtained through the randomized response technique (RRT) proposed by Hussain and Shabbir (2007).
Abstract: In this study, we have developed the Bayes estimator of the population proportion of a sensitive characteristic when data are obtained through the randomized response technique (RRT) proposed by Hussain and Shabbir (2007). Using simple Beta prior information, superiority of the Bayes estimators is established for a wide range of the values of the population proportion. We observed that Bayes estimators are better than the Maximum Likelihood Estimator (MLE) and Kim et al. (2006) estimator. For small as well as moderate samples, it has been observed that Bayes estimators outperform the MLE and Kim et al. (2006) estimator in case of using RRT by Hussain and Shabbir (2007).

9 citations


Journal Article
TL;DR: In this article, the authors explored scientific sampling methods and corresponding formulas for multi-class sensitive questions survey on stratified random sampling and found that sampling theories, total probability formulas and properties of variance were used.
Abstract: Objective To explore scientific sampling methods and corresponding formulas for multi-class sensitive questions survey on stratified random sampling Methods Randomized response technique of multi-class sensitive questions and stratified random sampling were used in this paper Moreover, sampling theories, total probability formulas and properties of variance were used Results Formulas for the estimator of the population proportion and its variance on randomized response model of multi-class sensitive questions in stratified random sampling were deduced Our survey methods and formulas on the RRT model may have a successful application for multi-class sensitive questions survey on, what students of Soochow University cheated in exams The severity of cheating in exams was classified into three types (never, 1-2 times, 2 times) and the proportion was 6855%, 1995% and 1150% respectively Conclusion The methods and corresponding formulas of stratified random sampling on RRT for multi-class sensitive questions survey were feasible

3 citations


Journal ArticleDOI
31 Dec 2009
TL;DR: The property of the Wald interval is investigated in terms of the coverage probability and the expected width, and an alternative confidence interval based on the Agresti-Coull's approach is recommended.
Abstract: The double sampling scheme is effective in reducing the sampling cost However, the doubly sampled data is contaminated by two types of error, namely false-positive and false-negative errors These would make the statistical analysis more difficult, and it would require more sophisticate analysis tools For instance, the Wald method for the interval estimation of a proportion would not work well In fact, it is well known that the Wald confidence interval behaves very poorly in many sampling schemes In this note, the property of the Wald interval is investigated in terms of the coverage probability and the expected width An alternative confidence interval based on the Agresti-Coull's approach is recommended

2 citations


Journal Article
TL;DR: In this paper, the authors explored scientific sampling methods and corresponding formulas for sensitive questions survey with the sample selected by stratified two-stage cluster sampling, and the estimated populated proportion was 14.71% for undergraduate, 27.25% for graduate and 18.37% for overall, respectively.
Abstract: Objective To explore scientific sampling methods and corresponding formulas for sensitive questions survey with the sample selected by stratified two-stage cluster sampling. Methods Stratified two-stage cluster sampling, Simmons model, total probality formulas and properties of variance were used to deduce corresponding formulas; in addition, agreement validity was applied to evaluate the statistical methods. Results Formulas for the estimation of the population proportion and its variance for dichotomous sensitive questions on Simmons model in stratified two-stage cluster sampling were deduced, with high validity (the concordance rate of "excellence" and "goodness" was 100%). The survey methods and formulas were employed successfully in a survey of premarital sexual behavior among students at Soochow University, and the estimated populated proportion was 14.71% for undergraduate, 27. 25% for graduate and 18.37% for overall, respectively. Conclusion The methods and corresponding formulas on Simmons model with the sample selected by stratified two-stage cluster sampling are feasible.

2 citations


01 Jan 2009
TL;DR: The authors compare the mean square error for linearization and replication variance estimators of proportions when the uncertainty in the control totals is either addressed or ignored, and illustrate the effects of different levels of variability in the estimated controls on the overall variance estimates.
Abstract: Calibration estimators, such as a poststratified estimate of a population proportion, use auxiliary data to improve the efficiency of survey estimates. Traditionally, the control totals used in the poststratification are assumed to be population values with no sampling variance. Often, however, estimates from other surveys are used because the population controls either do not exist or are not readily accessible. In this situation, many researchers apply traditional variance estimators to cases where the control totals are estimated, thus assuming that any additional sampling variance associated with these controls is negligible. We compare the mean square error for linearization and replication variance estimators of proportions when the uncertainty in the control totals is either addressed or ignored. Illustrations are given of the effects of different levels of variability in the estimated controls on the overall variance estimates. Comparisons are also made to previous work conducted in this area by the authors on estimated population totals.

1 citations


Posted Content
TL;DR: In this article, some ratio estimators for estimating the population mean of the variable under study, which make use of information regarding the population proportion possessing certain attribute, are proposed under simple random sampling without replacement (SRSWOR) scheme.
Abstract: Some ratio estimators for estimating the population mean of the variable under study, which make use of information regarding the population proportion possessing certain attribute, are proposed. Under simple random sampling without replacement (SRSWOR) scheme, the expressions of bias and mean-squared error (MSE) up to the first order of approximation are derived. The results obtained have been illustrated numerically by taking some empirical population considered in the literature.

1 citations