Stepwise Feature Selection by Cross Validation for EEG-based Brain Computer Interface

doi:10.1109/IJCNN.2006.247119

Home
/
Papers
/
Stepwise Feature Selection by Cross Validation for EEG-based Brain Computer Interface

Proceedings Article•DOI•

Stepwise Feature Selection by Cross Validation for EEG-based Brain Computer Interface

K. Tanaka¹, Takio Kurita, F. Meyer², Luc Berthouze², Tohru Kawabe¹ - Show less +1 more•Institutions (2)

University of Tsukuba¹, National Institute of Advanced Industrial Science and Technology²

30 Oct 2006-pp 4672-4677

TL;DR: A novel method to construct a classifier with improved generalization performance by applying a feature selection method to features calculated from the EEG signals so that unnecessary or redundant features can be removed and only effective features are left for the classification task.

read less

Abstract: The potential of brain-computer interfaces (BCI) in serving a useful purpose, e.g., supporting communication in paralyzed patients, hinges on the quality of the classification of the brain waves. This paper proposes a novel method to construct a classifier with improved generalization performance. A feature selection method is applied to features calculated from the EEG signals so that unnecessary or redundant features can be removed and only effective features are left for the classification task. Kernel support vector machines (kernel SVM) were used as a classifier and the best combinations of features were searched by backward stepwise selection, i.e., by eliminating unnecessary features one by one, and by evaluating the resulting generalization performance through cross validation. Experiments showed that the generalization performance of the classifier constructed from the best set of features was higher than that of the classifier using all features.

...read moreread less

Citations

PDF

Open Access

More filters

Book Chapter•DOI•

Selection of Histograms of Oriented Gradients Features for Pedestrian Detection

[...]

Takuya Kobayashi¹, Akinori Hidaka¹, Takio Kurita²•Institutions (2)

University of Tsukuba¹, National Institute of Advanced Industrial Science and Technology²

01 Jan 2008

TL;DR: This paper employs HOG features extracted from all locations of a grid on the image as candidates of the feature vectors and improves the recognition rates through experiments using MIT pedestrian dataset.

...read moreread less

Abstract: Histograms of Oriented Gradients (HOG) is one of the well-known features for object recognition. HOG features are calculated by taking orientation histograms of edge intensity in a local region. N.Dalal et al.proposed an object detection algorithm in which HOG features were extracted from all locations of a dense grid on a image region and the combined features are classified by using linear Support Vector Machine (SVM). In this paper, we employ HOG features extracted from all locations of a grid on the image as candidates of the feature vectors. Principal Component Analysis (PCA) is applied to these HOG feature vectors to obtain the score (PCA-HOG) vectors. Then a proper subset of PCA-HOG feature vectors is selected by using Stepwise Forward Selection (SFS) algorithm or Stepwise Backward Selection (SBS) algorithm to improve the generalization performance. The selected PCA-HOG feature vectors are used as an input of linear SVM to classify the given input into pedestrian/non-pedestrian. The improvement of the recognition rates are confirmed through experiments using MIT pedestrian dataset.

...read moreread less

118 citations

Cites background from "Stepwise Feature Selection by Cross..."

...It is well known that feature selection is effective for pattern classification as shown in [14]....
[...]

Journal Article•DOI•

Machine Learning Tools for Long-Term Type 2 Diabetes Risk Prediction

[...]

Nikos Fazakis¹, Otilia Kocsis¹, Elias Dritsas¹, Sotiris Alexiou¹, Nikos Fakotakis¹, Konstantinos Moustakas¹ - Show less +2 more•Institutions (1)

University of Patras¹

20 Jul 2021-IEEE Access

TL;DR: In this paper, a worker-centric, IoT enabled unobtrusive users health, well-being and functional ability monitoring framework, empowered with AI tools, is proposed to improve the prediction of diabetes, scoring an Area Under the ROC Curve (AUC) of 0.884.

...read moreread less

Abstract: A steady rise has been observed in the percentage of elderly people who want and are still able to contribute to society. Therefore, early retirement or exit from the labour market, due to health-related issues, poses a significant problem. Nowadays, thanks to technological advances and various data from different populations, the risk factors investigation and health issues screening are moving towards automation. In the context of this work, a worker-centric, IoT enabled unobtrusive users health, well-being and functional ability monitoring framework, empowered with AI tools, is proposed. Diabetes is a high-prevalence chronic condition with harmful consequences for the quality of life and high mortality rate for people worldwide, in both developed and developing countries. Hence, its severe impact on humans’ life, e.g., personal, social, working, can be considerably reduced if early detection is possible, but most research works in this field fail to provide a more personalized approach both in the modeling and prediction process. In this direction, our designed system concerns diabetes risk prediction in which specific components of the Knowledge Discovery in Database (KDD) process are applied, evaluated and incorporated. Specifically, dataset creation, features selection and classification, using different Supervised Machine Learning (ML) models are considered. The ensemble WeightedVotingLRRFs ML model is proposed to improve the prediction of diabetes, scoring an Area Under the ROC Curve (AUC) of 0.884. Concerning the weighted voting, the optimal weights are estimated by their corresponding Sensitivity and AUC of the ML model based on a bi-objective genetic algorithm. Also, a comparative study is presented among the Finnish Diabetes Risk Score (FINDRISC) and Leicester risk score systems and several ML models, using inductive and transductive learning. The experiments were conducted using data extracted from the English Longitudinal Study of Ageing (ELSA) database.

...read moreread less

46 citations

Journal Article•DOI•

Recent Advances in Resting-State Electroencephalography Biomarkers for Autism Spectrum Disorder—A Review of Methodological and Clinical Challenges

[...]

Tosca-Marie Heunis¹, Chris Aldrich¹, Chris Aldrich², Petrus J. de Vries³•Institutions (3)

Stellenbosch University¹, Colorado School of Mines², University of Cape Town³

01 Aug 2016-Pediatric Neurology

TL;DR: Novel resting-state EEG biomarkers will have to evaluate a range of potential demographic, clinical, and technical confounders including age, gender, intellectual ability, comorbidity, and medication, before these approaches can be translated into the clinical setting.

...read moreread less

38 citations

Journal Article•DOI•

On Estimating Model in Feature Selection With Cross-Validation

[...]

Chunxia Qi, Jiandong Diao, Like Qiu

01 Jan 2019-IEEE Access

TL;DR: A hybrid feature selection algorithm, FDHSFFS, is chosen, and comparative experiments on four UCI datasets with large differences in feature dimension and sample size by using five different cross-validation (CV) methods show that in the process of feature selection, twofold CV and leave-one-out-CV are more suitable for the model evaluation of low-dimensional and small sample datasets.

...read moreread less

Abstract: Both wrapper and hybrid methods in feature selection need the intervention of learning algorithm to train parameters. The preset parameters and dataset are used to construct several sub-optimal models, from which the final model is selected. The question is how to evaluate the performance of these sub-optimal models? What are the effects of different evaluation methods of sub-optimal model on the result of feature selection? Aiming at the evaluation problem of predictive models in feature selection, we chose a hybrid feature selection algorithm, FDHSFFS, and conducted comparative experiments on four UCI datasets with large differences in feature dimension and sample size by using five different cross-validation (CV) methods. The experimental results show that in the process of feature selection, twofold CV and leave-one-out-CV are more suitable for the model evaluation of low-dimensional and small sample datasets, tenfold nested CV and tenfold CV are more suitable for the model evaluation of high-dimensional datasets; tenfold nested CV is close to the unbiased estimation, and different optimal models may choose the same approximate optimal feature subset.

...read moreread less

15 citations

Cites background or methods from "Stepwise Feature Selection by Cross..."

...However, a large number of studies [30], [31], [34]–[36] have proved that LOO usually produce high variance, and the model evaluation effect is not as good as that of 10-fold CV [23]....
[...]
...For example, in literature [36], 5-fold CV were applied to improve the generalization performance of SVM model and guide the removal of irrelevant and redundant features in brain-computer interface, good application results are obtained....
[...]

Proceedings Article•DOI•

Early Detection of risk of autism spectrum disorder based on recurrence quantification analysis of electroencephalographic signals

[...]

T. Pistorius¹, Chris Aldrich¹, Lidia Auret¹, J. Pineda²•Institutions (2)

Stellenbosch University¹, University of California, San Diego²

01 Nov 2013

TL;DR: In this paper, recurrence quantification analysis features computed from resting state spontaneous eyes-closed electroencephalographic (EEG) signals may be useful biomarkers for early detection of risk of ASD.

...read moreread less

Abstract: Early detection of autism spectrum disorder (ASD) in infants is vital in maximizing the impact and potential long-term outcomes of early delivery of rehabilitative therapies. To date no definitive diagnostic test for ASD exists. Electroencephalography is a noninvasive method used to capture underlying electrical changes in brain activity. This proof-of-concept study suggests that recurrence quantification analysis features computed from resting state spontaneous eyes-closed electroencephalographic (EEG) signals may be useful biomarkers for early detection of risk of ASD.

...read moreread less

14 citations

Cites background or methods from "Stepwise Feature Selection by Cross..."

...In the case where K equals the number of samples available in the data, this validation method is referred to as the ‘leave-one-out’ approach (also known as jack-knifing) (Tanaka et al., 2006; Duffy and Als, 2012)....
[...]
...The generalisation performance of a classifier, using K-fold cross-validation, is estimated according to the following steps (Tanaka et al., 2006): Stellenbosch University https://scholar....
[...]
...The smaller the feature set to be classified, the lower the complexity of the computational burden, which in turn may lead to improved classification performance (Tanaka et al., 2006)....
[...]
...The drawback of training with large data sets is the cost of a high computational burden – this burden increases with an increase in sample size (Tanaka et al., 2006)....
[...]

1
2
3
4
…

References

PDF

Open Access

More filters

Journal Article•DOI•

A new look at the statistical model identification

[...]

Hirotugu Akaike

01 Dec 1974-IEEE Transactions on Automatic Control

TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.

...read moreread less

Abstract: The history of the development of statistical hypothesis testing in time series analysis is reviewed briefly and it is pointed out that the hypothesis testing procedure is not adequately defined as the procedure for statistical model identification. The classical maximum likelihood estimation procedure is reviewed and a new estimate minimum information theoretical criterion (AIC) estimate (MAICE) which is designed for the purpose of statistical identification is introduced. When there are several competing models the MAICE is defined by the model and the maximum likelihood estimates of the parameters which give the minimum of AIC defined by AIC = (-2)log-(maximum likelihood) + 2(number of independently adjusted parameters within the model). MAICE provides a versatile procedure for statistical model identification which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure. The practical utility of MAICE in time series analysis is demonstrated with some numerical examples.

...read moreread less

47,133 citations

"Stepwise Feature Selection by Cross..." refers methods in this paper

...Information criteria such as AIC [ 13 ] and MDL [14] are also often used, which evaluate the prediction model using the maximum log likelihood calculated from the learning parameters....
[...]

Statistical learning theory

[...]

Vladimir Vapnik

01 Jan 1998

TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

...read moreread less

26,531 citations

"Stepwise Feature Selection by Cross..." refers methods in this paper

...Instead of linear SVM, we suggest using kernel support vector machines (kernel SVM) [9] as a way of improving accuracy....
[...]
...Since the choice of a kernel function has an influence on the performance of the constructed SVM classifier, it was necessary to select an appropriate kernel function for classification....
[...]
...(2) and choosing an appropriate kernel function K, suitable kernel SVMs can be constructed for a given task [12]....
[...]
...An additional way of improving the classifier will be to use wavelets or kernel feature selection for kernel SVMs....
[...]
...The proposed algorithm is based on backward stepwise selection with 5-fold cross validation and kernel SVMs....
[...]

Book•

Pattern Classification

[...]

Peter E. Hart, Richard O. Duda, David G. Stork

01 Jan 1973

20,541 citations

Book•

An Introduction to Support Vector Machines and Other Kernel-based Learning Methods

[...]

Nello Cristianini¹, John Shawe-Taylor²•Institutions (2)

University of Bristol¹, Royal Holloway, University of London²

01 Jan 2000

TL;DR: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.

...read moreread less

Abstract: From the publisher: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc., and are now established as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications. The concepts are introduced gradually in accessible and self-contained stages, while the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally, the book and its associated web site will guide practitioners to updated literature, new applications, and on-line software.

...read moreread less

13,736 citations

Journal Article•DOI•

The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms

[...]

Peter D. Welch¹•Institutions (1)

IBM¹

01 Jun 1967-IEEE Transactions on Audio and Electroacoustics

TL;DR: In this article, the use of the fast Fourier transform in power spectrum analysis is described, and the method involves sectioning the record and averaging modified periodograms of the sections.

...read moreread less

Abstract: The use of the fast Fourier transform in power spectrum analysis is described. Principal advantages of this method are a reduction in the number of computations and in required core storage, and convenient application in nonstationarity tests. The method involves sectioning the record and averaging modified periodograms of the sections.

...read moreread less

9,705 citations

"Stepwise Feature Selection by Cross..." refers methods in this paper

...The power spectrum densities for each electrode was estimated using the Welch periodogram [8], [6] and was divided into 12 components with a 2Hz resolution....
[...]