Permutation Tests for Studying Classifier Performance
read more
Citations
Restoring cortical control of functional movement in a human with quadriplegia
Identifying major depression using whole-brain functional connectivity: a multivariate pattern analysis
Ten quick tips for machine learning in computational biology
MRIQC: Advancing the automatic prediction of image quality in MRI from unseen sites.
Exceeding chance level by chance: The caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy.
References
Controlling the false discovery rate: a practical and powerful approach to multiple testing
An introduction to the bootstrap
A Simple Sequentially Rejective Multiple Test Procedure
Data Mining: Practical Machine Learning Tools and Techniques
Related Papers (5)
Permutation Tests for Classification: Towards Statistical Significance in Image-Based Studies
Controlling the false discovery rate: a practical and powerful approach to multiple testing
Scikit-learn: Machine Learning in Python
Frequently Asked Questions (14)
Q2. What are the future works mentioned in the paper "Permutation tests for studying classifier performance" ?
However, if the classifier is not significant with Test 2, that is, the authors obtain a high p-value, there are three different possibilities: ( 1 ) there are no dependencies between the features in the data ; ( 2 ) there are some dependencies between the features in the data but they do not increase the class separation ; or ( 3 ) there are useful dependencies between the features in the data that increase the class separation but the chosen classifier is not able to exploit them. Future work should explore the use of Test 2 for selecting the best discriminant features for classifiers, in similar fashion as Test 1 has been used for decision trees and other biological applications ( Frank, 2000 ; Frank and Witten, 1998 ; Maglietta et al., 2007 ). Also, it would be useful to extend the setting to unsupervised learning, such as clustering. However, in general, when a high p-value is obtained with Test 2, the authors can not know which of these applies to the data and to the chosen classifier.
Q3. How is the evaluation of the different models in this local search strategy done?
The evaluation of the different models in this local search strategy is done via permutation tests, using the framework of multiple hypothesis testing (Benjamini and Hochberg, 1995; Holm, 1979).
Q4. What is the important factor in describing the blood pressure?
both weight and height convey information about the blood pressure but the dependency between them is the most important factor in describing the blood pressure.
Q5. What is the power of Test 1 calculated by Equation (4)?
note that when the null hypothesis is true, that is, t = 1/2, the power of Test 1 calculated by Equation (4) equals the significance level α as it should.
Q6. What is the average classification error for the randomized samples of data set D1?
On the randomized samples of data set D1 the authors obtain an average classification error of 0.53, a standard deviation 0.14 and a minimum classification error of 0.13.
Q7. How many random rows are used for training the data set?
for large data sets, the authors divide the dataset into training set with 10 000 random rows and to test set with the rest of the rows.
Q8. What is the traditional approach to the problem of classification accuracy?
The most traditional approach to this problem is to estimate the error of the classifier by means of cross-validation or leave-one-out cross-validation, among others.
Q9. What are the default approaches in Weka of the classifiers?
Missing values and the combination of nominal and numerical values are given as such as the input for the classifiers; the default approaches in Weka of the classifiers are used to handle these cases.
Q10. what is the probability of incorrectly classifying a sample by Equation (2)?
Bin (n, 1 2 − 1 π arcsinρ) ≈N (n 2 − n π arcsinρ, n 4 − n π2 arcsin2 ρ ) ,where 12 − 1π arcsinρ is the probability of incorrectly classifying a sample by Equation (2).
Q11. What is the way to use the feature dependency to improve the classification performance?
That is, the authors try more complex classifiers that could use the possible existing feature dependency, as well as simpler classifiers that could perform better if no feature dependency exists.
Q12. How many times will the classifier calculate the error of the original data?
Note that in total the authors will compute the error of the classifier r+ k times: r times on the original data and one time for each of the k randomized data sets.
Q13. What is the reason why the traditional permutation tests regard the results as significant?
and as a more important reason, the traditional permutation tests easily regard the results as significant even if there is only a slight class structure present because in the corresponding permuted data sets there is no class structure, especially if the original data set is large.
Q14. What is the relationship between the permutation method and the class labels?
the authors have that permuting the data columns is the randomization method producing the most diverse samples, while permuting labels (Test 1) and permuting data within class (Test 2) produce different randomized samples.