scispace - formally typeset
Journal ArticleDOI

Determination of sample size using power analysis and optimum bin size of histogram features

Reads0
Chats0
TLDR
This paper provides a mathematical study to choose the bin size and the minimum sample size to train the classifier using power analysis with statistical stability and the results are compared with that of entropy based algorithm (J48) for determiningminimum sample size and bin size.
Abstract
Vibration signals are used in fault diagnosis of rotary machines as a source of information. Lots of work have been reported on identification of faults in roller bearing by using many techniques. Of late, application of machine learning approach in fault diagnosis is gaining momentum. Machine learning approach consists of chain of activities like, data acquisition, feature extraction, feature selection and feature classification. While histogram features are used, there are still a few questions to be answered such as how many histogram bins are to be used to extract features and how many samples to be used to train the classifier. This paper provides a mathematical study to choose the bin size and the minimum sample size to train the classifier using power analysis with statistical stability. A typical bearing fault diagnosis problem is taken as a case for illustration and the results are compared with that of entropy based algorithm (J48) for determining minimum sample size and bin size.

read more

Citations
More filters
Journal ArticleDOI

A confidence-prioritisation approach for learning noisy data

TL;DR: This work proposes a methodological framework for assigning confidence to individual data records and augmenting training with that information, and results indicate that applying and utilising confidence in training improves performance.
References
More filters
Journal ArticleDOI

The Myth of Continuity-Corrected Sample Size Formulae

Ian Gordon, +1 more
- 01 Mar 1996 - 
TL;DR: In this paper, it was shown that applying the correction for continuity when a normal distribution is used to approximate a discrete distribution does not give the Casagrande, Pike, and Smith approximation.
Journal Article

Sample size : profound implications of mundane calculations

TL;DR: It may be considered not just good science but an ethical requirement of good research that studies be of sufficient size to give statistically significant results.
Related Papers (5)