scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Determination of sample size using power analysis and optimum bin size of histogram features

TL;DR: This paper provides a mathematical study to choose the bin size and the minimum sample size to train the classifier using power analysis with statistical stability and the results are compared with that of entropy based algorithm (J48) for determiningminimum sample size and bin size.
Abstract: Vibration signals are used in fault diagnosis of rotary machines as a source of information. Lots of work have been reported on identification of faults in roller bearing by using many techniques. Of late, application of machine learning approach in fault diagnosis is gaining momentum. Machine learning approach consists of chain of activities like, data acquisition, feature extraction, feature selection and feature classification. While histogram features are used, there are still a few questions to be answered such as how many histogram bins are to be used to extract features and how many samples to be used to train the classifier. This paper provides a mathematical study to choose the bin size and the minimum sample size to train the classifier using power analysis with statistical stability. A typical bearing fault diagnosis problem is taken as a case for illustration and the results are compared with that of entropy based algorithm (J48) for determining minimum sample size and bin size.
Citations
More filters
Journal ArticleDOI
TL;DR: This work proposes a methodological framework for assigning confidence to individual data records and augmenting training with that information, and results indicate that applying and utilising confidence in training improves performance.
Abstract: In a number of real-world applications, there is a range of noise associated with individual data points. Some points are extracted under relatively clear and defined conditions, while others may be affected by a variety of known or unknown confounding factors, which may decrease those points' validity. These points may or may not remain useful for training, depending on how much uncertainty they contain. We submit that in situations where some variability exists in the clarity or confidence associated with individual data points, an approach that takes this confidence into account during the training phase is beneficial. We propose a methodological framework for assigning confidence to individual data records and augmenting training with that information. We test the methodology on two separate datasets, a simulated dataset and a streamflow diel signals dataset. Results indicate that applying and utilising confidence in training improves performance.

1 citations


Cites background or methods from "Determination of sample size using ..."

  • ...…sets in such a way that training, and storage in the case of instance-based learning, can be effected on fewer instances with minor impact on the overall accuracy of the induced models (Wilson and Martinez, 2000; Czarnowski and Jedrzejowicz, 2005, 2006; Son and Kim, 2006; Indira et al., 2011)....

    [...]

  • ...Instance reduction techniques, for example, have been used to prune training sets in such a way that training, and storage in the case of instance-based learning, can be effected on fewer instances with minor impact on the overall accuracy of the induced models (Wilson and Martinez, 2000; Czarnowski and Jedrzejowicz, 2005, 2006; Son and Kim, 2006; Indira et al., 2011)....

    [...]

References
More filters
Book
01 Dec 1969
TL;DR: The concepts of power analysis are discussed in this paper, where Chi-square Tests for Goodness of Fit and Contingency Tables, t-Test for Means, and Sign Test are used.
Abstract: Contents: Prefaces. The Concepts of Power Analysis. The t-Test for Means. The Significance of a Product Moment rs (subscript s). Differences Between Correlation Coefficients. The Test That a Proportion is .50 and the Sign Test. Differences Between Proportions. Chi-Square Tests for Goodness of Fit and Contingency Tables. The Analysis of Variance and Covariance. Multiple Regression and Correlation Analysis. Set Correlation and Multivariate Methods. Some Issues in Power Analysis. Computational Procedures.

115,069 citations


"Determination of sample size using ..." refers methods in this paper

  • ...The Pillai’s V trace is given by 1 1 ( ) 1 h i ii V trace BT λ λ − = = = +∑ (1) where λi is the ith Eigen value of W-1B in which W is the within-group variance and h is the number of factors being considered in MANOVA, defined by h = c – 1and c is the number of classes....

    [...]

  • ...In case of multi-class problem (number of classes greater than two), instead of t-statistic, the F-statistic measure derived from Pillai’s V formula (Olson, 1974; Cohen, 1969) is used for the estimation of sample size....

    [...]

  • ...It is a statistical measure often used in multivariate analysis of variance (MANOVA) (Cohen, 1969, 1988)....

    [...]

  • ...The test was priori sample size computation of multivariate analysis of variance (MANOVA) with repeated measures and within-between interactions....

    [...]

Journal ArticleDOI

49,129 citations


"Determination of sample size using ..." refers methods in this paper

  • ...The Pillai’s V trace is given by 1 1 ( ) 1 h i ii V trace BT λ λ − = = = +∑ (1) where λi is the ith Eigen value of W-1B in which W is the within-group variance and h is the number of factors being considered in MANOVA, defined by h = c – 1and c is the number of classes....

    [...]

  • ...The experimental setup, fault simulation and experimental procedure are explained in detail in Sugumaran et al. (2008) and Cohen (1988)....

    [...]

  • ...It is a statistical measure often used in multivariate analysis of variance (MANOVA) (Cohen, 1969, 1988)....

    [...]

  • ...The test was priori sample size computation of multivariate analysis of variance (MANOVA) with repeated measures and within-between interactions....

    [...]

Journal ArticleDOI
TL;DR: G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested.
Abstract: G*Power (Erdfelder, Faul, & Buchner, 1996) was designed as a general stand-alone power analysis program for statistical tests commonly used in social and behavioral research. G*Power 3 is a major extension of, and improvement over, the previous versions. It runs on widely used computer platforms (i.e., Windows XP, Windows Vista, and Mac OS X 10.4) and covers many different statistical tests of thet, F, and χ2 test families. In addition, it includes power analyses forz tests and some exact tests. G*Power 3 provides improved effect size calculators and graphic options, supports both distribution-based and design-based input modes, and offers all types of power analyses in which users might be interested. Like its predecessors, G*Power 3 is free.

40,195 citations


Additional excerpts

  • ...…Kupper, 1990; Streiner, 1994; Beal, 1989; Dupont, 1988), paired samples (Parker and Bregman, 1986; Nam, 1992; Lu and Bean, 1995; Lachenbruch, 1992; Lachin, 1992; Royston, 1993; Nam, 1997; Donner and Eliasziw, 1992), measurement of agreement (Birkett and Day, 1994), and power (Faul et al., 2007)....

    [...]

  • ..., 1988; Whitehead, 1993; Roebruck and Kuhn, 1995; Lubin and Gail, 1990; Lakatos and Lan, 1992), for time-to-event (survival) data (Schoenfeld and Richter, 1982; Hanley and McNeil, 1982), for receiver operating curve (ROC) analysis (Obuchowski, 1994; Obuchowski and McClish, 1997; Whittemore, 1981), for logistic and Poisson regression (Hsieh, 1989; Flack and Eudey, 1993; Bull, 1993; Signorini, 1991; Lui and Cumberland, 1992), repeated measurements (Lipsitz and Fitzmaurice, 1994; Greenland, 1988), precision (Samuels and Lu, 1992; Buderer, 1996; Satten and Kupper, 1990; Streiner, 1994; Beal, 1989; Dupont, 1988), paired samples (Parker and Bregman, 1986; Nam, 1992; Lu and Bean, 1995; Lachenbruch, 1992; Lachin, 1992; Royston, 1993; Nam, 1997; Donner and Eliasziw, 1992), measurement of agreement (Birkett and Day, 1994), and power (Faul et al., 2007)....

    [...]

Journal ArticleDOI
TL;DR: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented and it is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a random chosen non-diseased subject.
Abstract: A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the "rating" method, or by mathematical predictions based on patient characteristics, is presented. It is shown that in such a setting the area represents the probability that a randomly chosen diseased subject is (correctly) rated or ranked with greater suspicion than a randomly chosen non-diseased subject. Moreover, this probability of a correct ranking is the same quantity that is estimated by the already well-studied nonparametric Wilcoxon statistic. These two relationships are exploited to (a) provide rapid closed-form expressions for the approximate magnitude of the sampling variability, i.e., standard error that one uses to accompany the area under a smoothed ROC curve, (b) guide in determining the size of the sample required to provide a sufficiently reliable estimate of this area, and (c) determine how large sample sizes should be to ensure that one can statistically detect difference...

19,398 citations


Additional excerpts

  • ...…1993; Roebruck and Kuhn, 1995; Lubin and Gail, 1990; Lakatos and Lan, 1992), for time-to-event (survival) data (Schoenfeld and Richter, 1982; Hanley and McNeil, 1982), for receiver operating curve (ROC) analysis (Obuchowski, 1994; Obuchowski and McClish, 1997; Whittemore, 1981), for…...

    [...]

  • ..., 1988; Whitehead, 1993; Roebruck and Kuhn, 1995; Lubin and Gail, 1990; Lakatos and Lan, 1992), for time-to-event (survival) data (Schoenfeld and Richter, 1982; Hanley and McNeil, 1982), for receiver operating curve (ROC) analysis (Obuchowski, 1994; Obuchowski and McClish, 1997; Whittemore, 1981), for logistic and Poisson regression (Hsieh, 1989; Flack and Eudey, 1993; Bull, 1993; Signorini, 1991; Lui and Cumberland, 1992), repeated measurements (Lipsitz and Fitzmaurice, 1994; Greenland, 1988), precision (Samuels and Lu, 1992; Buderer, 1996; Satten and Kupper, 1990; Streiner, 1994; Beal, 1989; Dupont, 1988), paired samples (Parker and Bregman, 1986; Nam, 1992; Lu and Bean, 1995; Lachenbruch, 1992; Lachin, 1992; Royston, 1993; Nam, 1997; Donner and Eliasziw, 1992), measurement of agreement (Birkett and Day, 1994), and power (Faul et al....

    [...]

Book
01 Jan 1981
TL;DR: In this paper, the basic theory of Maximum Likelihood Estimation (MLE) is used to detect a difference between two different proportions of a given proportion in a single proportion.
Abstract: Preface.Preface to the Second Edition.Preface to the First Edition.1. An Introduction to Applied Probability.2. Statistical Inference for a Single Proportion.3. Assessing Significance in a Fourfold Table.4. Determining Sample Sizes Needed to Detect a Difference Between Two Proportions.5. How to Randomize.6. Comparative Studies: Cross-Sectional, Naturalistic, or Multinomial Sampling.7. Comparative Studies: Prospective and Retrospective Sampling.8. Randomized Controlled Trials.9. The Comparison of Proportions from Several Independent Samples.10. Combining Evidence from Fourfold Tables.11. Logistic Regression.12. Poisson Regression.13. Analysis of Data from Matched Samples.14. Regression Models for Matched Samples.15. Analysis of Correlated Binary Data.16. Missing Data.17. Misclassification Errors: Effects, Control, and Adjustment.18. The Measurement of Interrater Agreement.19. The Standardization of Rates.Appendix A. Numerical Tables.Appendix B. The Basic Theory of Maximum Likelihood Estimation.Appendix C. Answers to Selected Problems.Author Index.Subject Index.

16,435 citations