scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Stepwise Feature Selection by Cross Validation for EEG-based Brain Computer Interface

TL;DR: A novel method to construct a classifier with improved generalization performance by applying a feature selection method to features calculated from the EEG signals so that unnecessary or redundant features can be removed and only effective features are left for the classification task.
Abstract: The potential of brain-computer interfaces (BCI) in serving a useful purpose, e.g., supporting communication in paralyzed patients, hinges on the quality of the classification of the brain waves. This paper proposes a novel method to construct a classifier with improved generalization performance. A feature selection method is applied to features calculated from the EEG signals so that unnecessary or redundant features can be removed and only effective features are left for the classification task. Kernel support vector machines (kernel SVM) were used as a classifier and the best combinations of features were searched by backward stepwise selection, i.e., by eliminating unnecessary features one by one, and by evaluating the resulting generalization performance through cross validation. Experiments showed that the generalization performance of the classifier constructed from the best set of features was higher than that of the classifier using all features.
Citations
More filters
Book ChapterDOI
01 Jan 2008
TL;DR: This paper employs HOG features extracted from all locations of a grid on the image as candidates of the feature vectors and improves the recognition rates through experiments using MIT pedestrian dataset.
Abstract: Histograms of Oriented Gradients (HOG) is one of the well-known features for object recognition. HOG features are calculated by taking orientation histograms of edge intensity in a local region. N.Dalal et al.proposed an object detection algorithm in which HOG features were extracted from all locations of a dense grid on a image region and the combined features are classified by using linear Support Vector Machine (SVM). In this paper, we employ HOG features extracted from all locations of a grid on the image as candidates of the feature vectors. Principal Component Analysis (PCA) is applied to these HOG feature vectors to obtain the score (PCA-HOG) vectors. Then a proper subset of PCA-HOG feature vectors is selected by using Stepwise Forward Selection (SFS) algorithm or Stepwise Backward Selection (SBS) algorithm to improve the generalization performance. The selected PCA-HOG feature vectors are used as an input of linear SVM to classify the given input into pedestrian/non-pedestrian. The improvement of the recognition rates are confirmed through experiments using MIT pedestrian dataset.

118 citations


Cites background from "Stepwise Feature Selection by Cross..."

  • ...It is well known that feature selection is effective for pattern classification as shown in [14]....

    [...]

Journal ArticleDOI
TL;DR: In this paper, a worker-centric, IoT enabled unobtrusive users health, well-being and functional ability monitoring framework, empowered with AI tools, is proposed to improve the prediction of diabetes, scoring an Area Under the ROC Curve (AUC) of 0.884.
Abstract: A steady rise has been observed in the percentage of elderly people who want and are still able to contribute to society. Therefore, early retirement or exit from the labour market, due to health-related issues, poses a significant problem. Nowadays, thanks to technological advances and various data from different populations, the risk factors investigation and health issues screening are moving towards automation. In the context of this work, a worker-centric, IoT enabled unobtrusive users health, well-being and functional ability monitoring framework, empowered with AI tools, is proposed. Diabetes is a high-prevalence chronic condition with harmful consequences for the quality of life and high mortality rate for people worldwide, in both developed and developing countries. Hence, its severe impact on humans’ life, e.g., personal, social, working, can be considerably reduced if early detection is possible, but most research works in this field fail to provide a more personalized approach both in the modeling and prediction process. In this direction, our designed system concerns diabetes risk prediction in which specific components of the Knowledge Discovery in Database (KDD) process are applied, evaluated and incorporated. Specifically, dataset creation, features selection and classification, using different Supervised Machine Learning (ML) models are considered. The ensemble WeightedVotingLRRFs ML model is proposed to improve the prediction of diabetes, scoring an Area Under the ROC Curve (AUC) of 0.884. Concerning the weighted voting, the optimal weights are estimated by their corresponding Sensitivity and AUC of the ML model based on a bi-objective genetic algorithm. Also, a comparative study is presented among the Finnish Diabetes Risk Score (FINDRISC) and Leicester risk score systems and several ML models, using inductive and transductive learning. The experiments were conducted using data extracted from the English Longitudinal Study of Ageing (ELSA) database.

46 citations

Journal ArticleDOI
TL;DR: Novel resting-state EEG biomarkers will have to evaluate a range of potential demographic, clinical, and technical confounders including age, gender, intellectual ability, comorbidity, and medication, before these approaches can be translated into the clinical setting.

38 citations

Journal ArticleDOI
TL;DR: A hybrid feature selection algorithm, FDHSFFS, is chosen, and comparative experiments on four UCI datasets with large differences in feature dimension and sample size by using five different cross-validation (CV) methods show that in the process of feature selection, twofold CV and leave-one-out-CV are more suitable for the model evaluation of low-dimensional and small sample datasets.
Abstract: Both wrapper and hybrid methods in feature selection need the intervention of learning algorithm to train parameters. The preset parameters and dataset are used to construct several sub-optimal models, from which the final model is selected. The question is how to evaluate the performance of these sub-optimal models? What are the effects of different evaluation methods of sub-optimal model on the result of feature selection? Aiming at the evaluation problem of predictive models in feature selection, we chose a hybrid feature selection algorithm, FDHSFFS, and conducted comparative experiments on four UCI datasets with large differences in feature dimension and sample size by using five different cross-validation (CV) methods. The experimental results show that in the process of feature selection, twofold CV and leave-one-out-CV are more suitable for the model evaluation of low-dimensional and small sample datasets, tenfold nested CV and tenfold CV are more suitable for the model evaluation of high-dimensional datasets; tenfold nested CV is close to the unbiased estimation, and different optimal models may choose the same approximate optimal feature subset.

15 citations


Cites background or methods from "Stepwise Feature Selection by Cross..."

  • ...However, a large number of studies [30], [31], [34]–[36] have proved that LOO usually produce high variance, and the model evaluation effect is not as good as that of 10-fold CV [23]....

    [...]

  • ...For example, in literature [36], 5-fold CV were applied to improve the generalization performance of SVM model and guide the removal of irrelevant and redundant features in brain-computer interface, good application results are obtained....

    [...]

Proceedings ArticleDOI
01 Nov 2013
TL;DR: In this paper, recurrence quantification analysis features computed from resting state spontaneous eyes-closed electroencephalographic (EEG) signals may be useful biomarkers for early detection of risk of ASD.
Abstract: Early detection of autism spectrum disorder (ASD) in infants is vital in maximizing the impact and potential long-term outcomes of early delivery of rehabilitative therapies. To date no definitive diagnostic test for ASD exists. Electroencephalography is a noninvasive method used to capture underlying electrical changes in brain activity. This proof-of-concept study suggests that recurrence quantification analysis features computed from resting state spontaneous eyes-closed electroencephalographic (EEG) signals may be useful biomarkers for early detection of risk of ASD.

14 citations


Cites background or methods from "Stepwise Feature Selection by Cross..."

  • ...In the case where K equals the number of samples available in the data, this validation method is referred to as the ‘leave-one-out’ approach (also known as jack-knifing) (Tanaka et al., 2006; Duffy and Als, 2012)....

    [...]

  • ...The generalisation performance of a classifier, using K-fold cross-validation, is estimated according to the following steps (Tanaka et al., 2006): Stellenbosch University https://scholar....

    [...]

  • ...The smaller the feature set to be classified, the lower the complexity of the computational burden, which in turn may lead to improved classification performance (Tanaka et al., 2006)....

    [...]

  • ...The drawback of training with large data sets is the cost of a high computational burden – this burden increases with an increase in sample size (Tanaka et al., 2006)....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: In this article, a new estimate minimum information theoretical criterion estimate (MAICE) is introduced for the purpose of statistical identification, which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure.
Abstract: The history of the development of statistical hypothesis testing in time series analysis is reviewed briefly and it is pointed out that the hypothesis testing procedure is not adequately defined as the procedure for statistical model identification. The classical maximum likelihood estimation procedure is reviewed and a new estimate minimum information theoretical criterion (AIC) estimate (MAICE) which is designed for the purpose of statistical identification is introduced. When there are several competing models the MAICE is defined by the model and the maximum likelihood estimates of the parameters which give the minimum of AIC defined by AIC = (-2)log-(maximum likelihood) + 2(number of independently adjusted parameters within the model). MAICE provides a versatile procedure for statistical model identification which is free from the ambiguities inherent in the application of conventional hypothesis testing procedure. The practical utility of MAICE in time series analysis is demonstrated with some numerical examples.

47,133 citations


"Stepwise Feature Selection by Cross..." refers methods in this paper

  • ...Information criteria such as AIC [ 13 ] and MDL [14] are also often used, which evaluate the prediction model using the maximum log likelihood calculated from the learning parameters....

    [...]

01 Jan 1998
TL;DR: Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.
Abstract: A comprehensive look at learning and generalization theory. The statistical theory of learning and generalization concerns the problem of choosing desired functions on the basis of empirical data. Highly applicable to a variety of computer science and robotics fields, this book offers lucid coverage of the theory as a whole. Presenting a method for determining the necessary and sufficient conditions for consistency of learning process, the author covers function estimates from small data pools, applying these estimations to real-life problems, and much more.

26,531 citations


"Stepwise Feature Selection by Cross..." refers methods in this paper

  • ...Instead of linear SVM, we suggest using kernel support vector machines (kernel SVM) [9] as a way of improving accuracy....

    [...]

  • ...Since the choice of a kernel function has an influence on the performance of the constructed SVM classifier, it was necessary to select an appropriate kernel function for classification....

    [...]

  • ...(2) and choosing an appropriate kernel function K, suitable kernel SVMs can be constructed for a given task [12]....

    [...]

  • ...An additional way of improving the classifier will be to use wavelets or kernel feature selection for kernel SVMs....

    [...]

  • ...The proposed algorithm is based on backward stepwise selection with 5-fold cross validation and kernel SVMs....

    [...]

Book
01 Jan 1973

20,541 citations

Book
01 Jan 2000
TL;DR: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory, and will guide practitioners to updated literature, new applications, and on-line software.
Abstract: From the publisher: This is the first comprehensive introduction to Support Vector Machines (SVMs), a new generation learning system based on recent advances in statistical learning theory. SVMs deliver state-of-the-art performance in real-world applications such as text categorisation, hand-written character recognition, image classification, biosequences analysis, etc., and are now established as one of the standard tools for machine learning and data mining. Students will find the book both stimulating and accessible, while practitioners will be guided smoothly through the material required for a good grasp of the theory and its applications. The concepts are introduced gradually in accessible and self-contained stages, while the presentation is rigorous and thorough. Pointers to relevant literature and web sites containing software ensure that it forms an ideal starting point for further study. Equally, the book and its associated web site will guide practitioners to updated literature, new applications, and on-line software.

13,736 citations

Journal ArticleDOI
Peter D. Welch1
TL;DR: In this article, the use of the fast Fourier transform in power spectrum analysis is described, and the method involves sectioning the record and averaging modified periodograms of the sections.
Abstract: The use of the fast Fourier transform in power spectrum analysis is described. Principal advantages of this method are a reduction in the number of computations and in required core storage, and convenient application in nonstationarity tests. The method involves sectioning the record and averaging modified periodograms of the sections.

9,705 citations


"Stepwise Feature Selection by Cross..." refers methods in this paper

  • ...The power spectrum densities for each electrode was estimated using the Welch periodogram [8], [6] and was divided into 12 components with a 2Hz resolution....

    [...]