Determination of sample size using power analysis and optimum bin size of histogram features

doi:10.1504/IJDATS.2011.038804

Citations

PDF

Open Access

More filters

Journal Article•DOI•

A confidence-prioritisation approach for learning noisy data

[...]

Nathaniel Gustafson¹, Christophe Giraud-Carrier¹•Institutions (1)

Brigham Young University¹

01 Dec 2014-International Journal of Data Analysis Techniques and Strategies

TL;DR: This work proposes a methodological framework for assigning confidence to individual data records and augmenting training with that information, and results indicate that applying and utilising confidence in training improves performance.

...read moreread less

Abstract: In a number of real-world applications, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may be affected by a variety of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they contain. We submit that in situations where some variability exists in the clarity or confidence associated with individual data points, an approach that takes this confidence into account during the training phase is beneficial. We propose a methodological framework for assigning confidence to individual data records and augmenting training with that information. We test the methodology on two separate datasets, a simulated dataset and a streamflow diel signals dataset. Results indicate that applying and utilising confidence in training improves performance.

...read moreread less

1 citations

Cites background or methods from "Determination of sample size using ..."

...…sets in such a way that training, and storage in the case of instance-based learning, can be effected on fewer instances with minor impact on the overall accuracy of the induced models (Wilson and Martinez, 2000; Czarnowski and Jedrzejowicz, 2005, 2006; Son and Kim, 2006; Indira et al., 2011)....
[...]
...Instance reduction techniques, for example, have been used to prune training sets in such a way that training, and storage in the case of instance-based learning, can be effected on fewer instances with minor impact on the overall accuracy of the induced models (Wilson and Martinez, 2000; Czarnowski and Jedrzejowicz, 2005, 2006; Son and Kim, 2006; Indira et al., 2011)....
[...]

References

PDF

Open Access

More filters

Book•

Statistical Power Analysis for the Behavioral Sciences

[...]

Jacob Cohen¹•Institutions (1)

University of North Carolina at Chapel Hill¹

01 Dec 1969

TL;DR: The concepts of power analysis are discussed in this paper, where Chi-square Tests for Goodness of Fit and Contingency Tables, t-Test for Means, and Sign Test are used.

...read moreread less

Abstract: Contents: Prefaces. The Concepts of Power Analysis. The t-Test for Means. The Significance of a Product Moment rs (subscript s). Differences Between Correlation Coefficients. The Test That a Proportion is .50 and the Sign Test. Differences Between Proportions. Chi-Square Tests for Goodness of Fit and Contingency Tables. The Analysis of Variance and Covariance. Multiple Regression and Correlation Analysis. Set Correlation and Multivariate Methods. Some Issues in Power Analysis. Computational Procedures.

...read moreread less

115,069 citations

"Determination of sample size using ..." refers methods in this paper

...The Pillai’s V trace is given by 1 1 ( ) 1 h i ii V trace BT λ λ − = = = +∑ (1) where λi is the ith Eigen value of W-1B in which W is the within-group variance and h is the number of factors being considered in MANOVA, defined by h = c – 1and c is the number of classes....
[...]
...In case of multi-class problem (number of classes greater than two), instead of t-statistic, the F-statistic measure derived from Pillai’s V formula (Olson, 1974; Cohen, 1969) is used for the estimation of sample size....
[...]
...It is a statistical measure often used in multivariate analysis of variance (MANOVA) (Cohen, 1969, 1988)....
[...]
...The test was priori sample size computation of multivariate analysis of variance (MANOVA) with repeated measures and within-between interactions....
[...]

Journal Article•DOI•

Statistical Power Analysis for the Behavioral Sciences (2nd ed.)

[...]

Peter A. Lachenbruch¹•Institutions (1)

University of York¹

01 Dec 1989-Journal of the American Statistical Association

49,129 citations

"Determination of sample size using ..." refers methods in this paper

...The Pillai’s V trace is given by 1 1 ( ) 1 h i ii V trace BT λ λ − = = = +∑ (1) where λi is the ith Eigen value of W-1B in which W is the within-group variance and h is the number of factors being considered in MANOVA, defined by h = c – 1and c is the number of classes....
[...]
...The experimental setup, fault simulation and experimental procedure are explained in detail in Sugumaran et al. (2008) and Cohen (1988)....
[...]
...It is a statistical measure often used in multivariate analysis of variance (MANOVA) (Cohen, 1969, 1988)....
[...]
...The test was priori sample size computation of multivariate analysis of variance (MANOVA) with repeated measures and within-between interactions....
[...]

Journal Article•DOI•

G*Power 3: A flexible statistical power analysis program for the social, behavioral, and biomedical sciences

[...]

Franz Faul¹, Edgar Erdfelder², Albert Georg Lang³, Axel Buchner³•Institutions (3)

University of Kiel¹, University of Mannheim², University of Düsseldorf³

01 May 2007-Behavior Research Methods

TL;DR: G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested.

...read moreread less

Abstract: G*Power (Erdfelder, Faul, & Buchner, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (i.e., Windows XP, Windows Vista, and Mac OS X 10.4) and covers many different statistical tests of thet, F, and χ2 test families. In addition, it includes power analyses forz tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested. Like its predecessors, G*Power 3 is free.

...read moreread less

40,195 citations

Additional excerpts

...…Kupper, 1990; Streiner, 1994; Beal, 1989; Dupont, 1988), paired samples (Parker and Bregman, 1986; Nam, 1992; Lu and Bean, 1995; Lachenbruch, 1992; Lachin, 1992; Royston, 1993; Nam, 1997; Donner and Eliasziw, 1992), measurement of agreement (Birkett and Day, 1994), and power (Faul et al., 2007)....
[...]
..., 1988; Whitehead, 1993; Roebruck and Kuhn, 1995; Lubin and Gail, 1990; Lakatos and Lan, 1992), for time-to-event (survival) data (Schoenfeld and Richter, 1982; Hanley and McNeil, 1982), for receiver operating curve (ROC) analysis (Obuchowski, 1994; Obuchowski and McClish, 1997; Whittemore, 1981), for logistic and Poisson regression (Hsieh, 1989; Flack and Eudey, 1993; Bull, 1993; Signorini, 1991; Lui and Cumberland, 1992), repeated measurements (Lipsitz and Fitzmaurice, 1994; Greenland, 1988), precision (Samuels and Lu, 1992; Buderer, 1996; Satten and Kupper, 1990; Streiner, 1994; Beal, 1989; Dupont, 1988), paired samples (Parker and Bregman, 1986; Nam, 1992; Lu and Bean, 1995; Lachenbruch, 1992; Lachin, 1992; Royston, 1993; Nam, 1997; Donner and Eliasziw, 1992), measurement of agreement (Birkett and Day, 1994), and power (Faul et al., 2007)....
[...]

Journal Article•DOI•

The meaning and use of the area under a receiver operating characteristic (ROC) curve.

[...]

James A. Hanley, Barbara J. McNeil

01 Apr 1982-Radiology

TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.

...read moreread less

Abstract: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect difference...

...read moreread less

19,398 citations

Additional excerpts

...…1993; Roebruck and Kuhn, 1995; Lubin and Gail, 1990; Lakatos and Lan, 1992), for time-to-event (survival) data (Schoenfeld and Richter, 1982; Hanley and McNeil, 1982), for receiver operating curve (ROC) analysis (Obuchowski, 1994; Obuchowski and McClish, 1997; Whittemore, 1981), for…...
[...]
..., 1988; Whitehead, 1993; Roebruck and Kuhn, 1995; Lubin and Gail, 1990; Lakatos and Lan, 1992), for time-to-event (survival) data (Schoenfeld and Richter, 1982; Hanley and McNeil, 1982), for receiver operating curve (ROC) analysis (Obuchowski, 1994; Obuchowski and McClish, 1997; Whittemore, 1981), for logistic and Poisson regression (Hsieh, 1989; Flack and Eudey, 1993; Bull, 1993; Signorini, 1991; Lui and Cumberland, 1992), repeated measurements (Lipsitz and Fitzmaurice, 1994; Greenland, 1988), precision (Samuels and Lu, 1992; Buderer, 1996; Satten and Kupper, 1990; Streiner, 1994; Beal, 1989; Dupont, 1988), paired samples (Parker and Bregman, 1986; Nam, 1992; Lu and Bean, 1995; Lachenbruch, 1992; Lachin, 1992; Royston, 1993; Nam, 1997; Donner and Eliasziw, 1992), measurement of agreement (Birkett and Day, 1994), and power (Faul et al....
[...]

Book•

Statistical methods for rates and proportions

[...]

Joseph L. Fleiss¹•Institutions (1)

New York State Department of Mental Hygiene¹

01 Jan 1981

TL;DR: In this paper, the basic theory of Maximum Likelihood Estimation (MLE) is used to detect a difference between two different proportions of a given proportion in a single proportion.

...read moreread less

Abstract: Preface.Preface to the Second Edition.Preface to the First Edition.1. An Introduction to Applied Probability.2. Statistical Inference for a Single Proportion.3. Assessing Significance in a Fourfold Table.4. Determining Sample Sizes Needed to Detect a Difference Between Two Proportions.5. How to Randomize.6. Comparative Studies: Cross-Sectional, Naturalistic, or Multinomial Sampling.7. Comparative Studies: Prospective and Retrospective Sampling.8. Randomized Controlled Trials.9. The Comparison of Proportions from Several Independent Samples.10. Combining Evidence from Fourfold Tables.11. Logistic Regression.12. Poisson Regression.13. Analysis of Data from Matched Samples.14. Regression Models for Matched Samples.15. Analysis of Correlated Binary Data.16. Missing Data.17. Misclassification Errors: Effects, Control, and Adjustment.18. The Measurement of Interrater Agreement.19. The Standardization of Rates.Appendix A. Numerical Tables.Appendix B. The Basic Theory of Maximum Likelihood Estimation.Appendix C. Answers to Selected Problems.Author Index.Subject Index.

...read moreread less

16,435 citations

Collapse

Determination of sample size using power analysis and optimum bin size of histogram features

Citations

Cites background or methods from "Determination of sample size using ..."

References

"Determination of sample size using ..." refers methods in this paper

"Determination of sample size using ..." refers methods in this paper

Additional excerpts

Additional excerpts

Related Papers (5)