scispace - formally typeset
Search or ask a question

Showing papers on "Population proportion published in 2018"


Proceedings Article
23 May 2018
TL;DR: In this article, the authors derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance.
Abstract: We derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance. We show that in general, DP hypothesis tests can be written in terms of linear constraints, and for exchangeable data can always be expressed as a function of the empirical distribution. Using this structure, we prove a ‘Neyman-Pearson lemma’ for binomial data under DP, where the DP-UMP only depends on the sample sum. Our tests can also be stated as a post-processing of a random variable, whose distribution we coin “Truncated-Uniform-Laplace” (Tulap), a generalization of the Staircase and discrete Laplace distributions. Furthermore, we obtain exact p-values, which are easily computed in terms of the Tulap random variable. We show that our results also apply to distribution-free hypothesis tests for continuous data. Our simulation results demonstrate that our tests have exact type I error, and are more powerful than current techniques.

27 citations


Journal ArticleDOI
TL;DR: In this paper, an unbiased estimator for the population proportion in pair ranked set sampling design is proposed, and its theoretical properties are studied, showing that the estimator is more (less) efficient than its counterpart in simple random sampling (ranked set sampling).
Abstract: In this paper, we consider the problem of estimating the population proportion in pair ranked set sampling design. An unbiased estimator for the population proportion is proposed, and its theoretical properties are studied. It is shown that the estimator is more (less) efficient than its counterpart in simple random sampling (ranked set sampling). Asymptotic normality of the estimator is also established. Application of the suggested procedure is illustrated using a data set from an environmental study.

26 citations


Posted Content
TL;DR: This work derives uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance and obtaining exact p-values, which are easily computed in terms of the Tulap random variable.
Abstract: We derive uniformly most powerful (UMP) tests for simple and one-sided hypotheses for a population proportion within the framework of Differential Privacy (DP), optimizing finite sample performance. We show that in general, DP hypothesis tests for exchangeable data can always be expressed as a function of the empirical distribution. Using this structure, we prove a `Neyman-Pearson lemma' for binomial data under DP, where the DP-UMP only depends on the sample sum. Our tests can also be stated as a post-processing of a random variable, whose distribution we coin "Truncated-Uniform-Laplace" (Tulap), a generalization of the Staircase and discrete Laplace distributions. Furthermore, we obtain exact p-values, which are easily computed in terms of the Tulap random variable. We show that our results also apply to distribution-free hypothesis tests for continuous data. Our simulation results demonstrate that our tests have exact type I error, and are more powerful than current techniques.

18 citations


Journal ArticleDOI
TL;DR: The combination of the randomized response technique (RRT) and NSUM obtained a high response rate and produced a reliable estimate of the size of a high-risk population.

14 citations


Posted Content
TL;DR: Binary data, available for a relatively large number of families (or households), which are within small areas, from a population-based survey is analyzed, using a hierarchical Bayesian logistic regression model with each family having its own random effect.
Abstract: We analyze binary data, available for a relatively large number (big data) of families (or households), which are within small areas, from a population-based survey. Inference is required for the finite population proportion of individuals with a specific character for each area. To accommodate the binary data and important features of all sampled individuals, we use a hierarchical Bayesian logistic regression model with each family (not area) having its own random effect. This modeling helps to correct for overshrinkage so common in small area estimation. Because there are numerous families, the computational time on the joint posterior density using standard Markov chain Monte Carlo (MCMC) methods is prohibitive. Therefore, the joint posterior density of the hyper-parameters is approximated using an integrated nested normal approximation (INNA) via the multiplication rule. This approach provides a sampling-based method that permits fast computation, thereby avoiding very time-consuming MCMC methods. Then, the random effects are obtained from the exact conditional posterior density using parallel computing. The unknown nonsample features and household sizes are obtained using a nested Bayesian bootstrap that can be done using parallel computing as well. For relatively small data sets (e.g., 5000 families), we compare our method with a MCMC method to show that our approach is reasonable. We discuss an example on health severity using the Nepal Living Standards Survey (NLSS).

9 citations


Journal ArticleDOI
TL;DR: In this article, decision theoretic approach has been followed to obtain Bayes estimates of the two parameters along with their corresponding minimal Bayes posterior expected losses (BPEL) using beta prior and squared error loss function (SELF).
Abstract: Sihm et al. (2016) proposed an unrelated question binary optional randomized response technique (RRT) model for estimating the proportion of population that possess a sensitive characteristic and the sensitivity level of the question. In our work, decision theoretic approach has been followed to obtain Bayes estimates of the two parameters along with their corresponding minimal Bayes posterior expected losses (BPEL) using beta prior and squared error loss function (SELF). Relative losses are also examined to compare the performances of the Bayes estimates with those of the classical estimates obtained by Sihm et al. (2016). The results obtained are illustrated with the help of real survey data using non informative prior.

6 citations


Journal ArticleDOI
04 Jun 2018
TL;DR: In this paper, the authors analyse secondary school students' intuitive understanding of the relationship between the population proportion and the expected value of a sample proportion, as well as its variability in relation to the sample size.
Abstract: We analyse secondary school students’ intuitive understanding of the relationships between the population proportion and the expected value of a sample proportion, as well as its variability in relation to the sample size. We propose to 302 students four items in each of which four probable values for the number of outcomes for a given event are requested and in which the proportion population and sample size are varied. The statistical analysis of the values provided by the students suggests a good understanding of the relationships between the population and sample proportions. The variability of the sample proportion is overestimated in big samples and depends on the problem context in small samples. We also observed the equiprobability, positive and negative recency biases.

3 citations


Book ChapterDOI
01 Jan 2018
TL;DR: This chapter introduces several fundamental concepts related to power and sample size calculations, and presents some basic sample size formulas for when one plans to collect a sample of either continuous or binary data and then wishes to construct a confidence interval for the population mean or the population proportion with a certain degree of precision.
Abstract: This chapter introduces several fundamental concepts related to power and sample size calculations. We first review some key questions that should be considered when developing research studies. We then introduce the concept of statistical power and explain why having adequate power is essential for designing successful studies. Next, we present some basic sample size formulas for when we plan to collect a sample of either continuous or binary data and then wish to construct a confidence interval for the population mean or the population proportion with a certain degree of precision. Subsequently, we present some basic sample size formulas for when we plan to collect one sample, a paired sample, or two independent samples of either continuous or binary data and then wish to test hypotheses about specific characteristics of the populations from which the data came. Finally, we discuss several advanced topics related to sample size calculation and the collaborative process of study design.

2 citations


Journal ArticleDOI
01 Dec 2018
TL;DR: The authors derived a bound for the variance of unbiased estimator of the finite population proportion under inverse sampling without replacement, and showed that this bound holds for any estimator under the assumption that the population proportion is finite.
Abstract: In this paper we derive a bound for the variance of unbiased estimator of the finite population proportion under inverse sampling without replacement.

1 citations


Dissertation
01 Jan 2018
TL;DR: Some methodological advances are suggested, as IST estimation under a generic sampling design, the use of auxiliary information to improve the efficiency of the estimates and the extension of calibration approach to the estimation for domain are suggested.
Abstract: A survey is a research method that is based on questioning a sample of individuals. The interest in sample surveys studies often focuses on sensitive or confidential aspects of the interviewees. Because of this, the typical problem that arises is social desirability, which is defined as the tendency of respondents to answer based on what is socially acceptable. For this reason, many respondents refuse to participate in the survey or provide false or conditioned answers, altering the accuracy and reliability of the estimations in a major way. Randomized response (RR) technique (RRT) introduced by Warner is a possible solution for protecting the anonymity of the respondent and is used to reduce the risk of escape or no response to sensitive questions. Warner's study generated a rapidly-expanding body of research literature on alternative techniques for eliciting suitable RR schemes in order to estimate a population proportion. Standard RR methods are used primarily in surveys which require a binary response to a sensitive question, and seek to estimate the proportion of people presenting a given sensitive characteristic. On the other hand, some studies have addressed situations in which the response to a sensitive question results in a quantitative variable. The methodology of RR has advanced considerably in recent years, but the most research in this area concerns only simple random sampling and the real studies are based on complex surveys. Data from complex survey designs require special consideration with regard to estimation for parameters and corresponding variance estimation. Recently some authors have developed R-packages for estimation with RR surveys under the assumption on simple random sampling. In order to estimate parameters for sensitive characteristics, no existing software covers the estimation of these procedures from complex surveys. This gap is now filled by RRTCS package. The package includes the estimators for means and totals with several RR techniques and also provides confidence interval estimation. Most research into RRT deals exclusively with the interest variable and does not make explicit use of auxiliary variables in the construction of estimators. We introduce auxiliary variables for a general class of estimators to improve sampling design and to achieve higher precision in population parameter estimates. Warner's work originated a huge literature and has been used in many areas, but these techniques have difficulties and limitations. Due to this, other indirect techniques emerged as an alternative to RRT, among them we find the item count technique (ICT). This technique was conceived for surveys which require the study of a qualitative variable, but many practical situations may deal with sensitive variables which are quantitative in nature. So, the item sum technique (IST) was proposed as a generalization of ICT. To contribute to the development of the IST in real-world studies, we suggest some methodological advances, as IST estimation under a generic sampling design, the use of auxiliary information to improve the efficiency of the estimates and we extend this calibration approach to the estimation for domain. We also investigate the impact on the estimates of including an increased number of innocuous questions in the list of items. Traditionally, indirect questioning techniques (IQTs) deal with one sensitive variable. However, in real surveys, the researcher may be interested in investigating more than one sensitive variable. We discuss some estimation methods for multiple sensitive questions under different approaches. A key design decision in an IST survey is how to split the total sample into the long list sample and short list sample. A simple solution is to allocate the same number of units to each sample irrespective of the variability of the items in the two lists. Clearly, this intuitive and basic solution is not efficient because responses in the long list sample are tendentially affected by high variability due to the presence of innocuous items. We achieve the optimal sample size allocation by minimizing the variance of the IST estimates under a budget constraint. Optimal allocation results are finally extended to the multiple sensitive estimation setting. Finally, we use the IQTs for investigating some sensitive variables, drug addiction, sexual behavious and support for female genital cutting, in real studies and we compare these results with those get by direct question, obtaining in all cases higher estimates of the sensitive characteristics when we use IQTs. Note: This thesis is presented as a compendium of seven publications in relation with the contents of the thesis. The full version of the papers is included in Appendices A1 - A7.

01 Jul 2018
TL;DR: In this article, an efficient estimators procedure to estimate the current population proportion over two occasions successive sampling has been developed and suggested estimators have been studied and their respective optimum allocation of the second sample are discussed.
Abstract: An efficient estimators procedure to estimate the current population proportion over two occasions successive sampling has been developed. Suggested estimators have been studied and their respective optimum allocation of the second sample are discussed. The behavior of the optimum subsampling proportions and the gain in precision were tested empirically using different values of the design parameters, namely the correlation and the subgroup weights.

Journal ArticleDOI
TL;DR: This work considers a real life scenario in which the data gathering process is compounded by possible interventions by investigators/supervisors who may provide thoughtful judgments on the quality/category of the response given by a respondent, and addresses the problem of estimating the unknown population proportion P.
Abstract: In large scale surveys, it is customary to accept unaltered the responses provided by the respondents, leaving no provision for investigators to pass on any circumstantial judgment on the responses. This restrictive practice sometimes vitiates the estimation of a parameter. Here we consider a real life scenario in which the data gathering process is compounded by possible interventions by investigators/supervisors who may provide thoughtful judgments on the quality/category of the response given by a respondent. In this modified scenario, we address the problem of estimating the unknown population proportion P. In the context of an illustrative example, we develop a possible randomization theory and computational formulas to estimate P under intervention effects.

Book ChapterDOI
01 Jan 2018
TL;DR: In this article, the knowledge of sampling in 157 prospective primary school teachers in Spain was analyzed using two different tasks, and taking into account common and horizon content knowledge (described in the model proposed by Ball et al. in J Teacher Educ 59:389-407, 2008), assessing the teachers' understanding of the following concepts: population and sample, frequency, proportion, estimation, variability of estimates and the effect of sample size on this variability.
Abstract: In this paper, we analyse the knowledge of sampling in 157 prospective primary school teachers in Spain. Using two different tasks, and taking into account common and horizon content knowledge (described in the model proposed by Ball et al. in J Teacher Educ 59:389–407, 2008), we assess the teachers’ understanding of the following concepts: population and sample, frequency, proportion, estimation, variability of estimates, and the effect of sample size on this variability. Our results suggest that these prospective teachers have correct intuitions when estimating the sample proportion when the population proportion is known. However, they tend to confuse samples and populations, sometimes fail to apply proportional reasoning, misinterpret unpredictability, and show the representativeness heuristic and the equiprobability bias.