scispace - formally typeset
Search or ask a question

Showing papers on "Population proportion published in 2016"


Proceedings ArticleDOI
31 Oct 2016
TL;DR: This paper proposes a solution which guarantees that in presence of a uniform probabilistic scheduler every agent outputs the population proportion with any precision ε ∈ (0, 1) with any high probability after having interacted O(log n) times.
Abstract: The computational model of population protocols is a formalism that allows the analysis of properties emerging from simple and pairwise interactions among a very large number of anonymous finite-state agents. Significant work has been done so far to determine which problems are solvable in this model and at which cost in terms of states used by the agents and time needed to converge. The problem tackled in this paper is the population proportion problem: each agent starts independently from each other in one of two states, say A or B, and the objective is for each agent to determine the proportion of agents that initially started in state A, assuming that each agent only uses a finite set of states, and does not know the number n of agents. We propose a solution which guarantees that in presence of a uniform probabilistic scheduler every agent outputs the population proportion with any precision e ∈ (0, 1) with any high probability after having interacted O(log n) times. The number of states maintained by every agent is optimal and is equal to 2⌈3/(4e)⌉+1. Finally, we show that our solution is optimal in time and space to solve the counting problem, a generalization of the proportion problem. Finally, simulation results illustrate our theoretical analysis.

22 citations


Journal ArticleDOI
TL;DR: Simulations show that some estimators, including the commonly-used plug-in maximum likelihood estimator, can have substantial bias for small or moderate sample sizes, and an adjustment is proposed that ensures estimates are always credible.

15 citations


Posted Content
01 Sep 2016-viXra
TL;DR: In this paper, the authors proposed some estimators for the population variance of the variable under study, which make use of information regarding the population proportion possessing a certain attribute, under simple random sampling without replacement (SRSWOR) scheme, up to the first order of approximation.
Abstract: This chapter proposes some estimators for the population variance of the variable under study, which make use of information regarding the population proportion possessing certain attribute. Under simple random sampling without replacement (SRSWOR) scheme, the mean squared error (MSE) up to the first order of approximation is derived. The results have been illustrated numerically by taking some empirical population considered in the literature.

8 citations


Journal ArticleDOI
TL;DR: A hierarchical Bayesian model is presented in which the firststage binary responses have independent Bernoulli distributions, and each subsequent stage is modeled using a beta distribution, which is parameterized by its mean and a correlation coefficient to infer the finite population proportion of each area.
Abstract: We extend the twofold small-area model of Stukel and Rao (1997; 1999) to accommodate binary data. An example is the Third International Mathematics and Science Study (TIMSS), in which pass-fail data for mathematics of students from US schools (clusters) are available at the third grade by regions and communities (small areas). We compare the finite population proportions of these small areas. We present a hierarchical Bayesian model in which the firststage binary responses have independent Bernoulli distributions, and each subsequent stage is modeled using a beta distribution, which is parameterized by its mean and a correlation coefficient. This twofold small-area model has an intracluster correlation at the first stage and an intercluster correlation at the second stage. The final-stage mean and all correlations are assumed to be noninformative independent random variables. We show how to infer the finite population proportion of each area. We have applied our models to synthetic TIMSS data to show that the twofold model is preferred over a onefold small-area model that ignores the clustering within areas. We further compare these models using a simulation study, which shows that the intracluster correlation is particularly important.

7 citations


Book ChapterDOI
TL;DR: In this paper, the authors examined the effectiveness of generating randomized responses by negative hypergeometric distribution in respect to generating random responses by direct hypergeometrical distribution and provided theoretical derivations for unbiased estimator, variance and variance estimators.
Abstract: We consider estimating the proportion of people containing sensitive attributes like habitual drunkenness, drug addiction, reckless car driving, evading the tax liabilities, etc., in a given community. Following the pioneering work of Singh and Sedory (2013) , we examine the effectiveness of generating randomized responses by negative hypergeometric distribution in respect to generating randomized responses by direct hypergeometric distribution. We consider sampling of respondents by general sampling schemes having the positive inclusion probabilities for single and paired population units. Essential theoretical derivations for unbiased estimator, variance and variance estimators are presented here. We perform a numerical illustration for comparison purpose which support the usefulness of Singh and Sedory's (2013) negative hypergeometric approach.

3 citations


Book ChapterDOI
P. Shaw1
TL;DR: Chaudhuri and Christofides as discussed by the authors introduced the Item Count Technique (ICT) which was an improvement over the original ICT in terms of protection of privacy of the respondents drawn from the population using a general sampling design, whereas the original method was restricted to simple random sampling.
Abstract: Randomized Response Techniques (RRTs) initiated by Warner (1965) have some disadvantages. Some participants either do not understand the RRT procedure or suspect revelation of privacy. Sometimes they end up concluding the randomization procedure as a foul trick. To overcome these problems as well as to shape the whole procedure in the form of canvassing a survey questionnaire, Raghavarao and Federer (1979) , Miller (1984) , and Miller et al. (1986) introduced the Item Count Technique also known as the List Experiment or the Block Total Response or the Unmatched Count Technique which is user friendly. This method was revised by Chaudhuri and Christofides (2007) which was an improvement over the original Item Count Technique in terms of protection of privacy of the respondents drawn from the population using a general sampling design, whereas the original method was restricted to simple random sampling. But a serious disadvantage is that it requires the selection of two independent samples costing more time and money. Also it needs the knowledge of the population proportion of an innocuous characteristic unrelated to the sensitive characteristic whose proportion is to be estimated. This chapter avoids both the problems.

3 citations


Journal ArticleDOI
01 May 2016
TL;DR: In this paper, the problem of unbiased estimation of a finite population mean (or proportion) related to a sensitive character under a randomized response model was considered, and the comparisons of some with and without replacement sampling strategies based on equal and unequal probability sampling designs paralleling those for a direct survey were made.
Abstract: We consider the problem of unbiased estimation of a finite population mean (or proportion) related to a sensitive character under a randomized response model and present results on the comparisons of some with and without replacement sampling strategies based on equal and unequal probability sampling designs paralleling those for a direct survey.

2 citations


Proceedings ArticleDOI
08 Jan 2016
TL;DR: Wang et al. as mentioned in this paper calculated the Gini coefficient of urban, rural and national residents in China and found that the average gap of resident income breaks through the warning line and keeps highly stable.
Abstract: Based on the grouped data of resident income in Statistical Yearbook, the paper calculates the China’s Gini coefficients of urban, rural and national residents in the ten years from 2002 to 2011 according to the characteristics of income data of urban residents and rural residents, and finally analyzes the Gini coefficients of urban, rural and national residents calculated, to come to the conclusion that the Gini coefficient of resident income breaks through the warning line and keeps highly stable, which reflects the wealth gap of China’s residents in the current stage is still in the process of developing from a rational gap to an excessive gap, but there is no polarization. Introduction China has gradually entered the stage of a wealthy and strong country since the reform and opening up, but there is no doubt that the gap of resident income also becomes larger. Gini coefficient gives the quantity line reflecting the economic difference among residents, and based on that, the wealth gap among residents can be reflected intuitively and objectively, to provide early warning and prevent the appearance of wealth polarization, and it is an internationally accepted authoritative index measuring the level of a country’s wealth gap and income distribution gap. The paper studies the dynamic trajectory of Gini coefficient through calculating the Gini coefficients of China in recent ten years, so as to grasp the distribution situation and variation trend of China’s resident income. The Empirical Analysis on Gini Coefficient The Computing Method of Gini Coefficient As a common conception measuring income gap, Gini coefficient descries the comparative deviation degree of average income gap caused by population distribution to the expected value of total income. Its computing method is based on the data which evenly distribute population into several groups N, namely the proportion of each group’s population to the total population is the same, and meanwhile, the mean value μ of corresponding evenly divided group can be obtained, so the computing formula of Gini Coefficient is 2 1 1 1 2 N N i j i j G y y N       . i j y y  represents the absolute value of the income mean difference of any two evenly divided groups. μ represents the expected value of various evenly divided groups’ total income. According to the formula, it can be found that G is the average deviation of total income.The average deviation value will be divided by the expected value of total income μ to obtain the comparative deviation degree of average income deviation to the expected value of total income μ, which is Gini coefficient G. International Conference on Humanities and Social Science (HSS 2016) © 2016. The authors Published by Atlantis Press 579 The Calculation of Urban Gini Coefficient The mean values of urban resident income in different groups are determined according to the distribution of population, but due to the unequal grouping proportion of population as well as the difference in the number of members in each household in corresponding groups, the income mean values of several groups are determined not based on the grouping of equal population proportion. Therefore, the problem needing to be solved is to regroup based on the existing information and work out the corresponding income mean values of those groups according to the several evenly divided groups of population. For the specific computing process, it can be carried in accordance with the following two steps. Firstly, according to the characteristics of 7 groups of values given in the Statistical Yearbook, they can be preliminary divided into 5 groups which are based on approximately even division; and then according to the proportions of household groups and their corresponding population data of each household, the corresponding proportion values of those 7 groups can be calculated. Secondly, implementing even distribution for the population in the groups whose population proportion closing to 20%, to realize that the population proportions of all five groups equal to 20%, and work out the corresponding income mean value in various groups. Based the first assumption, the corresponding values of point 10, 30, 50, 70, 90 and other points can be calculated to regard them as the income mean values of those five evenly divided groups, see Table 1. Table 1: The Mean of the Five Evenly Divided Groups’ Corresponding Income Group One Two Three Four Five Evenly Divided Population Distribution 20 20 20 20 20 Corresponding Income Mean Value 3510.9 5736.55 7868.7 10532.08 19225.77 According to the corresponding income mean value of those evenly divided groups, we can work out that 1 1 N N i j i j y y     =72443.86, and the overall mean value of income in those five groups μ=9373.86. Based on N=5, various values are brought into the formula to obtain the urban resident coefficient in G2004=0.309131. The Calculation of Rural Gini Coefficient The calculation method of rural Gini coefficient is completely the same as that of urban Gini coefficient, and the difference between them is the data of urban resident data are evenly divided into 5 groups, see the following Table 3. In a similar way, according to the data, we can work out that 1 1 N N i j i j y y     =25424.60, and the mean value μ=2996.7. Based on N=5, various values are brought into the formula to obtain the urban rural coefficient in G2004=0.339368.

2 citations


Journal Article
TL;DR: For example, this paper developed a collection of dynamic modules in Excel that are intended to enhance student understanding of the fundamental concepts related to confidence intervals, as well as virtually every other topic in introductory statistics.
Abstract: One of the standard topics in any introductory statistics course is confidence intervals for estimating the value of some population parameter. In particular, consider the notion of estimating the unknown mean p for a population based on the data obtained from a random sample drawn from that population. The confidence interval so constructed is centered at the sample mean x and extends sufficiently far in each direction so as to have a pre-determined probability of containing p. To construct a 95% confidence interval for the mean, the interval should have a 95% chance of containing p. Equivalently, in 95% of the confidence intervals so constructed, the resulting interval should contain the true mean p.For most students in introductory statistics, the above statements represent little more than acts of faith. They do not fully appreciate the fact that the confidence interval constructed will correctly contain p with probability 0.95. There is simply no effective way to construct, in class, a large variety of different confidence intervals based on different sample data to see whether or not the theoretical considerations actually make sense. Instead, the students just perform the appropriate manipulations to calculate the correct answer to such problems in a purely mechanical fashion or have the calculations done for them with either a calculator routine or some statistical software package.Unfortunately, this is a topic that all too often reduces to rote memorization of formulas and procedures, in large measure because there are so many variations considered. For instance, there are:* confidence intervals for the population mean p when you have a large sample, and when you have a small sample;* confidence intervals for the population proportion n when you have a large sample, and when you have a small sample;* confidence intervals for the difference in population means when you have two samples drawn from similar populations, and* confidence intervals for the difference in population proportions.It is not in the least surprising that many students find their heads reeling and so come out of the course with little understanding of what confidence intervals are all about.Technology has much to offer to reduce the tendency for the topic to be treated as a variety of exercises in rote memorization. Most graphing calculators contain a full menu of statistical functions that include most variations on constructing confidence intervals, as do statistical software packages and spreadsheets such as Excel. Some older calculator models only operated on a set of data entered in one or more lists to calculate the summary statistical measures and the corresponding confidence interval. Newer calculators give the option to work either with the raw data or the statistical measures, which more closely mirrors the typical kind of problems found in most textbooks in which students are asked to construct the confidence interval based on a sample of size n = 36 where the sample mean is 24 and the sample standard deviation is 9, say.Even when students utilize technology to construct confidence intervals, the majority still tend to come away with very little in the way of basic understanding of the underlying fundamental concepts. In particular, they don't understand the significance of:* the variation in the results due to the variations between different samples;* the effects of sample size on the results;* the effects of changes in the sample data or the sample statistics on the results;* the effects of the choice of confidence level on the length of the confidence interval.Gaining a solid understanding of all of these ideas requires the use of dynamic software to bring the concepts to life and so make a far stronger impact on the students.DYNAMIC PROGRAMS IN EXCELThe authors have developed a collection of dynamic modules in Excel that are intended to enhance student understanding of the fundamental concepts related to confidence intervals (as well as virtually every other topic in introductory statistics). …

1 citations


Journal ArticleDOI
TL;DR: In this paper, the problem of estimating a finite population mean related to a sensitive character under a randomized response model was considered, where independent responses are obtained from each sampled individual as many times as he/she is selected in the sample and proved the admissibility of a sampling strategy in a class of comparable linear unbiased strategies.
Abstract: We consider the problem of estimation of a finite population mean (or proportion) related to a sensitive character under a randomized response model when independent responses are obtained from each sampled individual as many times as he/she is selected in the sample and prove the admissibility of a sampling strategy in a class of comparable linear unbiased strategies. We prove that the admissible strategy is also optimal in this class under a super-population model.

1 citations


Journal ArticleDOI
TL;DR: In this article, the problem of estimating the population variance under a randomized response plan was considered and the optimality of a sampling strategy in a class of comparable design unbiased strategies under a super-population model was shown.
Abstract: Let P be the proportion of individuals in a finite population possessing a sensitive attribute. We consider the problem of estimation of the population variance P(1 – P) under Warner’s randomized response plan and prove the optimality of a sampling strategy in a class of comparable design unbiased strategies under a super-population model.

Posted Content
01 Sep 2016-viXra
TL;DR: In this article, the authors suggest an estimator using two auxiliary variables in stratified random sampling for estimating population mean and almost unbiased estimators using known value of some population parameter(s) with known population proportion of an auxiliary variable has been used.
Abstract: The main aim of the present book is to suggest some improved estimators using auxiliary and attribute information in case of simple random sampling and stratified random sampling and some inventory models related to capacity constraints. This volume is a collection of five papers, written by six co-authors (listed in the order of the papers): Dr. Rajesh Singh, Dr. Sachin Malik, Dr. Florentin Smarandache, Dr. Neeraj Kumar, Mr. Sanjey Kumar & Pallavi Agarwal. In the first chapter authors suggest an estimator using two auxiliary variables in stratified random sampling for estimating population mean. In second chapter they proposed a family of estimators for estimating population means using known value of some population parameters. In Chapter third an almost unbiased estimator using known value of some population parameter(s) with known population proportion of an auxiliary variable has been used. In Chapter four the authors investigates a fuzzy economic order quantity model for two storage facility. The demand, holding cost, ordering cost, storage capacity of the own - warehouse are taken as trapezoidal fuzzy numbers. In Chapter five a two-warehouse inventory model deals with deteriorating items, with stock dependent demand rate and model affected by inflation under the pattern of time value of money over a finite planning horizon. Shortages are allowed and partially backordered depending on the waiting time for the next replenishment. The purpose of this model is to minimize the total inventory cost by using the genetic algorithm. This book will be helpful for the researchers and students who are working in the field of sampling techniques and inventory control.

Book ChapterDOI
TL;DR: The authors consider the problem of unbiased estimation of finite population mean (or proportion) related to a sensitive character under some randomized response models covering different randomized response plans and present a comprehensive review of various nonexistence, admissibility, and optimality results on the problem paralleling those for direct surveys.
Abstract: We consider the problem of unbiased estimation of finite population mean (or proportion) related to a sensitive character under some randomized response models covering different randomized response plans and present a comprehensive review of various nonexistence, admissibility, and optimality results on the problem paralleling those for direct surveys.