Determination of sample size using power analysis and optimum bin size of histogram features

doi:10.1504/IJDATS.2011.038804

Journal ArticleDOI

Determination of sample size using power analysis and optimum bin size of histogram features

V. Indira, +3 more

- 01 Mar 2011 -

International Journal of Data Analysis T...

- Vol. 3, Iss: 1, pp 21-41

Chats0

TLDR

This paper provides a mathematical study to choose the bin size and the minimum sample size to train the classifier using power analysis with statistical stability and the results are compared with that of entropy based algorithm (J48) for determiningminimum sample size and bin size.

Abstract:

Vibration signals are used in fault diagnosis of rotary machines as a source of information. Lots of work have been reported on identification of faults in roller bearing by using many techniques. Of late, application of machine learning approach in fault diagnosis is gaining momentum. Machine learning approach consists of chain of activities like, data acquisition, feature extraction, feature selection and feature classification. While histogram features are used, there are still a few questions to be answered such as how many histogram bins are to be used to extract features and how many samples to be used to train the classifier. This paper provides a mathematical study to choose the bin size and the minimum sample size to train the classifier using power analysis with statistical stability. A typical bearing fault diagnosis problem is taken as a case for illustration and the results are compared with that of entropy based algorithm (J48) for determining minimum sample size and bin size.

Determination of sample size using power analysis and optimum bin size of histogram features

Citations

A confidence-prioritisation approach for learning noisy data

References

Exact Sample Sizes for Use with the Fisher-Irwin Test for 2 x 2 Tables

On the sample size for one-sided equivalence of sensitivities based upon McNemar's test.

A comparison of sample size methods for the logrank statistic.

On sample-size and power calculations for studies using confidence intervals

Computing Sample Size for Receiver Operating Characteristic Studies

Related Papers (5)

Feature Selection Scheme Based on Pareto Method for Gearbox Fault Diagnosis

Remaining Life-Time Assessment of Gear Box Using Regression Model

Heterogeneous Feature Models and Feature Selection Applied to Bearing Fault Diagnosis

Time series data manifold learning-based mechanical equipment fault diagnosis method

Bearing Fault Feature Selection Method Based on Weighted Multidimensional Feature Fusion