scispace - formally typeset
Search or ask a question

Showing papers on "Population proportion published in 2007"


Book
16 Apr 2007
TL;DR: In this paper, the authors present a two-sample t-test to compare the difference between two treatment sets in an endoscopic image of the same set of images, with a different set of data points.
Abstract: Preface . Statistical packages. PART 1. DATA TYPES. 1. Data types. 1.1Does it really matter? 1.2 Interval scale data. 1.3 Ordinal scale data. 1.4 Nominal scale data. 1.5 Structure of this book. 1.6 Chapter summary. PART 2. INTERVAL-SCALE DATA. 2. Descriptive statistics. 2.1 Summarizing data sets. 2.2 Indicators of central tendency. mean, median and mode. 2.3 Describing variability. standard deviation and coefficient of variation. 2.4 Quartiles. another way to describe data . 2.5 Using computer packages to generate descriptive statistics. 2.6 Chapter summary. 3. The normal distribution. 3.1 What is a normal distribution? . 3.2 Identifying data that are not normally distributed. 3.3 Proportions of individuals within one or two standard deviations of the mean. 3.4 Chapter summary. 4. Sampling from populations. the SEM. 4.1 Samples and populations. 4.2 From sample to population. 4.3 Types of sampling error. 4.4 What factors control the extent of random sampling error? 4.5 Estimating likely sampling error. The SEM. 4.6 Offsetting sample size against standard deviation. 4.7Chapter summary. 5. Ninety-five per cent confidence interval for the mean. 5.1 What is a confidence interval? 5.2 How wide should the interval be? 5.3 What do we mean by '95 per cent' confidence? 5.4 Calculating the interval width. 5.5 A long series of samples and 95 per cent confidence intervals. 5.6 How sensitive is the width of the confidence interval to changes in the SD, the sample size or the required level of confidence? 5.7 Two statements. 5.8 One-sided 95 per cent confidence intervals. 5.9 The 95 per cent confidence interval for the difference between two treatments. 5.10 The need for data to follow a normal distribution and data transformation. 5.11 Chapter summary. 6. The two-sample t-test(1).Introducing hypothesis tests. 6.1 The two-sample t-test. an example of a hypothesis test. 6.2 'Significance'. 6.3 The risk of a false positive finding. 6.4 What factors will influence whether or not we obtain a significant outcome? 6.5 Requirements for applying a two-sample t-test. 6.6 Chapter summary. 7. The two-sample t-test(2).The dreaded P value. 7.1 Measuring how significant a result is. 7.2 P values. 7.3 Two ways to define significance? 7.4 Obtaining the P value. 7.5 P values or 95 per cent confidence intervals? 7.6 Chapter summary. 8. The two-sample t-test(3).False negatives, power and necessary sample sizes. 8.1 What else could possibly go wrong? 8.2 Power. 8.3 Calculating necessary sample size. 8.4 Chapter summary. 9. The two-sample t-test(4).Statistical significance, practical significance and equivalence. 9.1 Practical significance. is the difference big enough to matter? 9.2 Equivalence testing. 9.3 Non-inferiority testing. 9.4 P values are less informative and can be positively misleading. 9.5 Setting equivalence limits prior to experimentation. 9.6 Chapter summary. 10. The two-sample t-test(5).One-sided testing. 10.1 Looking for a change in a specified direction. 10.2 Protection against false positives. 10.3 Temptation!. 10.4 Using a computer package to carry out a one-sided test. 10.5 Should one-sided tests be used more commonly? 10.5 Chapter summary. 11. What does a statistically significant result really tell us? 11.1 Interpreting statistical significance. 11.2 Starting from extreme scepticism. 11.3 Chapter summary. 12. The paired t-test. comparing two related sets of measurements. 12.1 Paired data. 12.2 We could analyse the data using a two-sample t-test. 12.3 Using a paired t-test instead. 12.4 Performing a paired t-test. 12.5 What determines whether a paired t-test will be significant? 12.6 Greater power of a paired t-test. 12.7 The paired t-test is only applicable to naturally paired data. 12.8 Choice of experimental design. 12.9 Requirements for applying a paired t-test . 12.10 Sample sizes, practical significance and one-sided tests. 12.11 Summarizing the differences between the paired and two-sample t-tests. 12.12 Chapter summary. 13. Analyses of variance. going beyond t-tests. 13.1 Extending the complexity of experimental designs. 13.2 One-way analysis of variance. 13.3 Two-way analysis of variance. 13.4 Multi-factorial experiments. 13.5 Keep it simple. Keep it powerful. 13.6 Chapter summary. 14. Correlation and regression. relationships between measured values. 14.1 Correlation analysis. 14.2 Regression analysis. 14.3 Multiple regression. 14.4 Chapter summary. PART 3. NOMINAL-SCALE DATA. 15. Describing categorized data. 15.1 Descriptive statistics. 15.2 Testing whether the population proportion might credibly be some pre-determined figure. 15.3 Chapter summary. 16. Comparing observed proportions. the contingency chi-square test. 16.1 Using the contingency chi-square test to compare observed proportions. 16.2 Obtaining a 95 per cent CI for the change in the proportion of expulsions. is the difference large enough to be of practical significance? 16.3 Larger tables. attendance at diabetic clinics. 16.4 Planning experimental size. 16.5 Chapter summary. PART 4. ORDINAL-SCALE DATA. 17. Ordinal and non-normally distributed data.Transformations and non-parametric tests. 17.1 Transforming data to a normal distribution. 17.2 The Mann-Whitney test. a non-parametric method. 17.3 Dealing with ordinal data. 17.4 Other non-parametric methods. 17.5 Chapter summary. Appendix to Chapter 17. PART 5. SOME CHALLENGES FROM THE REAL WORLD. 18. Multiple testing. 18.1 What is it and why is it a problem? 18.2 Where does multiple testing arise? 18.3 Methods to avoid false positives. 18.4 The role of scientific journals. 18.5 Chapter summary. 19. Questionnaires. 19.1 Is there anything special about questionnaires? 19.2 Types of questions. 19.3 Designing a questionnaire. 19.4 Sample sizes and return rates. 19.5 Analysing the results. 19.6 Confounded epidemiological data. 19.7 Multiple testing with questionnaire data. 19.8 Chapter summary. PART 6. CONCLUSIONS. 20. Conclusions. 20.1 Be clear about the purpose of the experiment. 20.2 Keep the experimental design simple and therefore clear and powerful. 20.3 Draw up a statistical analysis plan as part of the experimental design. it is not a last minute add-on. 20.4 Explore your data visually before launching into statistical testing. 20.5 Beware of multiple analyses. 20.6 Interpret both significance and non-significance with care. Index.

58 citations


Journal ArticleDOI
TL;DR: The results indicate that precision in estimation of a population proportion is improved through the use of logistic regression to carry out the RSS ranking and, hence, the sample size required to achieve a desired precision is reduced.
Abstract: Ranked set sampling (RSS) is a sampling procedure that can be considerably more efcient than simple random sampling (SRS). When the variable of interest is binary, ranking of the sample observations can be implemented using the estimated probabilities of success obtained from a logistic regression model developed for the binary variable. The main objective of this study is to use substantial data sets to investigate the application of RSS to estimation of a proportion for a population that is different from the one that provides the logistic regression. Our results indicate that precision in estimation of a population proportion is improved through the use of logistic regression to carry out the RSS ranking and, hence, the sample size required to achieve a desired precision is reduced. Further, the choice and the distribution of covariates in the logistic regression model are not overly crucial for the performance of a balanced RSS procedure. (© 2007 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim)

23 citations


Journal ArticleDOI
TL;DR: A novel sampling algorithm, referred to as algorithm PAS (standing for proportion approximation sampling), is explored, which can stably provide high-quality samples with corresponding computational overhead, whereas algorithm EQAS can flexibly generate samples with the desired balance between sampling quality and sampling efficiency.
Abstract: We explore in this paper a novel sampling algorithm, referred to as algorithm PAS (standing for proportion approximation sampling), to generate a high-quality online sample with the desired sample rate. The sampling quality refers to the consistency between the population proportion and the sample proportion of each categorical value in the database. Note that the state-of-the-art sampling algorithm to preserve the sampling quality has to examine the population proportion of each categorical value in a pilot sample a priori and is thus not applicable to incremental mining applications. To remedy this, algorithm PAS adaptively determines the inclusion probability of each incoming tuple in such a way that the sampling quality can be sequential/preserved while also guaranteeing the sample rate close to the user specified one. Importantly, PAS not only guarantees the proportion consistency of each categorical value but also excellently preserves the proportion consistency of multivariate statistics, which will be significantly beneficial to various data mining applications. For better execution efficiency, we further devise an algorithm, called algorithm EQAS (standing for efficient quality-aware sampling), which integrates PAS and random sampling to provide the flexibility of striking a compromise between the sampling quality and the sampling efficiency. As validated in experimental results on real and synthetic data, algorithm PAS can stably provide high-quality samples with corresponding computational overhead, whereas algorithm EQAS can flexibly generate samples with the desired balance between sampling quality and sampling efficiency

7 citations


PatentDOI
TL;DR: In this article, an unbiased estimator of the squared correlation between population and sample proportions is used to determine point and interval estimates of population proportions in a regression context involving simple random sampling with replacement.
Abstract: A method is presented for estimating population from sample proportions that produces margins of error narrower for any specific sample size or that requires a sample size smaller for any specific margin of error than do previously existing methods applied to the same data. This method applies an unbiased estimator (developed in this invention) of the squared correlation between population and sample proportions to determine point and interval estimates of population proportions in a regression context involving simple random sampling with replacement. In virtually all reasonable applications, assuming a Dirichlet prior distribution, the margin of error produced by this method for a population proportion is shown to be 1.96 times the posterior standard deviation of the proportion.

4 citations