scispace - formally typeset
Search or ask a question
Book ChapterDOI

Feature Selection and Analysis on Correlated Breath Data

TL;DR: This chapter studies the classical support vector machine recursive feature elimination (SVM-RFE) algorithm and improves it by incorporating a correlation bias reduction (CBR) strategy into the feature elimination procedure.
Abstract: Feature selection is a useful step in data analysis procedure. In this chapter, we study the classical support vector machine recursive feature elimination (SVM-RFE) algorithm and improve it by incorporating a correlation bias reduction (CBR) strategy into the feature elimination procedure. Experiments are conducted on a synthetic dataset and two breath analysis datasets. Large and comprehensive sets of transient features are extracted from the sensor responses. The classification accuracy with feature selection proves the efficacy of the proposed SVM-RFE + CBR. It outperforms the original SVM-RFE and other typical algorithms. An ensemble method is further studied to improve the stability of the proposed method. By statistically analyzing the features’ rankings, some knowledge is obtained, which can guide future design of e-noses and feature extraction algorithms.
Citations
More filters
Journal ArticleDOI
01 Jan 2021
TL;DR: Some of the first quantitative insights into the complex neural mechanism of exercise intervention for fatigue recovery are provided and lead a new direction for further application research in real-world situations.
Abstract: Accumulating efforts have been made to discover effective solutions for fatigue recovery with the ultimate aim of reducing adverse consequences of mental fatigue in real life. The previously-reported behavioral benefits of physical exercise on mental fatigue recovery prompted us to investigate the restorative effect and reveal the underlying neural mechanisms. Specifically, we introduced an empirical method to investigate the beneficial effect of physical exercise on the reorganization of EEG functional connectivity (FC) in a two-session experiment where one session including a successive 30-min psychomotor vigilance task (PVT) ( No-intervention session ) compared to an insertion of a mid-task 15-min cycling exercise ( Intervention session ). EEG FC was obtained from 21 participants and quantitatively assessed via graph theoretical analysis and a classification framework. The findings demonstrated the effectiveness of exercise intervention on behavioral performance as shown in improved reaction time and response accuracy. Although we found significantly altered network alterations towards the end of experiment in both sessions, no significant differences between the two sessions and no interaction between session and time were found in EEG network topology. Further interrogation of functional connectivity through classification analysis showed decreased FC in distributed brain areas, which may lead to the significant reduction of network efficiency in both sessions. Moreover, we showed distinct patterns of FC alterations between the two sessions, indicating different information processing strategies adopted in the intervention session . In sum, these results provide some of the first quantitative insights into the complex neural mechanism of exercise intervention for fatigue recovery and lead a new direction for further application research in real-world situations.

6 citations


Additional excerpts

  • ...The stability and effectiveness of the SVM-RFE+CBR ensemble method have already been verified [38]....

    [...]

  • ...To remove the substantial irrelevant connectivity and avoid the possible overfitting issue due to the fact that the number of features is much larger than that of samples, linear support vector machine recursive feature elimination (SVM-RFE) with correlation bias reduction (CBR) [37] was utilized....

    [...]

  • ...Given that correlations of FC in brain network may cause the importance of features to be underestimated, the CBR method was employed to reduce this correlation bias....

    [...]

  • ...Subsequently, the SVM-RFE+CBR method was applied in all the data, and two ranked feature sets of each session were obtained based on the significance of each feature....

    [...]

References
More filters
Journal ArticleDOI
TL;DR: There are several arguments which support the observed high accuracy of SVMs, which are reviewed and numerous examples and proofs of most of the key theorems are given.
Abstract: The tutorial starts with an overview of the concepts of VC dimension and structural risk minimization. We then describe linear Support Vector Machines (SVMs) for separable and non-separable data, working through a non-trivial example in detail. We describe a mechanical analogy, and discuss when SVM solutions are unique and when they are global. We describe how support vector training can be practically implemented, and discuss in detail the kernel mapping technique which is used to construct SVM solutions which are nonlinear in the data. We show how Support Vector machines can have very large (even infinite) VC dimension by computing the VC dimension for homogeneous polynomial and Gaussian radial basis function kernels. While very high VC dimension would normally bode ill for generalization performance, and while at present there exists no theory which shows that good generalization performance is guaranteed for SVMs, there are several arguments which support the observed high accuracy of SVMs, which we review. Results of some experiments which were inspired by these arguments are also presented. We give numerous examples and proofs of most of the key theorems. There is new material, and I hope that the reader will find that even old material is cast in a fresh light.

15,696 citations

Journal ArticleDOI
TL;DR: The contributions of this special issue cover a wide range of aspects of variable selection: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.
Abstract: Variable and feature selection have become the focus of much research in areas of application for which datasets with tens or hundreds of thousands of variables are available. These areas include text processing of internet documents, gene expression array analysis, and combinatorial chemistry. The objective of variable selection is three-fold: improving the prediction performance of the predictors, providing faster and more cost-effective predictors, and providing a better understanding of the underlying process that generated the data. The contributions of this special issue cover a wide range of aspects of such problems: providing a better definition of the objective function, feature construction, feature ranking, multivariate feature selection, efficient search methods, and feature validity assessment methods.

14,509 citations

Journal ArticleDOI
TL;DR: In this article, the maximal statistical dependency criterion based on mutual information (mRMR) was proposed to select good features according to the maximal dependency condition. But the problem of feature selection is not solved by directly implementing mRMR.
Abstract: Feature selection is an important problem for pattern classification systems. We study how to select good features according to the maximal statistical dependency criterion based on mutual information. Because of the difficulty in directly implementing the maximal dependency condition, we first derive an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection. Then, we present a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers). This allows us to select a compact set of superior features at very low cost. We perform extensive experimental comparison of our algorithm and other methods using three different classifiers (naive Bayes, support vector machine, and linear discriminate analysis) and four different data sets (handwritten digits, arrhythmia, NCI cancer cell lines, and lymphoma tissues). The results confirm that mRMR leads to promising improvement on feature selection and classification accuracy.

8,078 citations

Journal ArticleDOI
TL;DR: In this article, a Support Vector Machine (SVM) method based on recursive feature elimination (RFE) was proposed to select a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays.
Abstract: DNA micro-arrays now permit scientists to screen thousands of genes simultaneously and determine whether those genes are active, hyperactive or silent in normal or cancerous tissue. Because these new micro-array devices generate bewildering amounts of raw data, new analytical methods must be developed to sort out whether cancer tissues have distinctive signatures of gene expression over normal tissues or other types of cancer tissues. In this paper, we address the problem of selection of a small subset of genes from broad patterns of gene expression data, recorded on DNA micro-arrays. Using available training examples from cancer and normal patients, we build a classifier suitable for genetic diagnosis, as well as drug discovery. Previous attempts to address this problem select genes with correlation techniques. We propose a new method of gene selection utilizing Support Vector Machine methods based on Recursive Feature Elimination (RFE). We demonstrate experimentally that the genes selected by our techniques yield better classification performance and are biologically relevant to cancer. In contrast with the baseline method, our method eliminates gene redundancy automatically and yields better and more compact gene subsets. In patients with leukemia our method discovered 2 genes that yield zero leave-one-out error, while 64 genes are necessary for the baseline method to get the best result (one leave-one-out error). In the colon cancer database, using only 4 genes our method is 98% accurate, while the baseline method is only 86% accurate.

7,939 citations

05 Aug 2003
TL;DR: This work derives an equivalent form, called minimal-redundancy-maximal-relevance criterion (mRMR), for first-order incremental feature selection, and presents a two-stage feature selection algorithm by combining mRMR and other more sophisticated feature selectors (e.g., wrappers).

7,075 citations