scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 2008"


Journal ArticleDOI
TL;DR: An algorithm which automates the purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process and has the capability of retaining important confounding variables, resulting potentially in a slightly richer model.
Abstract: Background The main problem in many model-building situations is to choose from a large set of covariates those that should be included in the "best" model. A decision to keep a variable in the model might be based on the clinical or statistical significance. There are several variable selection algorithms in existence. Those methods are mechanical and as such carry some limitations. Hosmer and Lemeshow describe a purposeful selection of covariates within which an analyst makes a variable selection decision at each step of the modeling process.

2,577 citations


Journal ArticleDOI
TL;DR: In this article, the authors introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size.
Abstract: Summary. Variable selection plays an important role in high dimensional statistical modelling which nowadays appears in many areas and is key to various scientific discoveries. For problems of large scale or dimensionality p, accuracy of estimation and computational cost are two top concerns. Recently, Candes and Tao have proposed the Dantzig selector using L1-regularization and showed that it achieves the ideal risk up to a logarithmic factor log (p). Their innovative procedure and remarkable result are challenged when the dimensionality is ultrahigh as the factor log (p) can be large and their uniform uncertainty principle can fail. Motivated by these concerns, we introduce the concept of sure screening and propose a sure screening method that is based on correlation learning, called sure independence screening, to reduce dimensionality from high to a moderate scale that is below the sample size. In a fairly general asymptotic framework, correlation learning is shown to have the sure screening property for even exponentially growing dimensionality. As a methodological extension, iterative sure independence screening is also proposed to enhance its finite sample performance. With dimension reduced accurately from high to below sample size, variable selection can be improved on both speed and accuracy, and can then be accomplished by a well-developed method such as smoothly clipped absolute deviation, the Dantzig selector, lasso or adaptive lasso. The connections between these penalized least squares methods are also elucidated.

2,204 citations


Journal ArticleDOI
TL;DR: This paper re-examine the Bayesian paradigm for model selection and proposes an extended family of Bayesian information criteria, which take into account both the number of unknown parameters and the complexity of the model space.
Abstract: SUMMARY The ordinary Bayesian information criterion is too liberal for model selection when the model space is large. In this paper, we re-examine the Bayesian paradigm for model selection and propose an extended family of Bayesian information criteria, which take into account both the number of unknown parameters and the complexity of the model space. Their consistency is established, in particular allowing the number of covariates to increase to infinity with the sample size. Their performance in various situations is evaluated by simulation studies. It is demonstrated that the extended Bayesian information criteria incur a small loss in the positive selection rate but tightly control the false discovery rate, a desirable property in many applications. The extended Bayesian information criteria are extremely useful for variable selection in problems with a moderate sample size but with a huge number of covariates, especially in genome-wide association studies, which are now an active area in genetics research.

1,472 citations


Book
01 Jan 2008
TL;DR: Novel computational approaches for deep learning of behaviors as opposed to just static patterns will be presented, based on structured nonnegative matrix factorizations of matrices that encode observation frequencies of behaviors.
Abstract: Future Directions -- Semi-supervised Multiple Classifier Systems: Background and Research Directions -- Boosting -- Boosting GMM and Its Two Applications -- Boosting Soft-Margin SVM with Feature Selection for Pedestrian Detection -- Observations on Boosting Feature Selection -- Boosting Multiple Classifiers Constructed by Hybrid Discriminant Analysis -- Combination Methods -- Decoding Rules for Error Correcting Output Code Ensembles -- A Probability Model for Combining Ranks -- EER of Fixed and Trainable Fusion Classifiers: A Theoretical Study with Application to Biometric Authentication Tasks -- Mixture of Gaussian Processes for Combining Multiple Modalities -- Dynamic Classifier Integration Method -- Recursive ECOC for Microarray Data Classification -- Using Dempster-Shafer Theory in MCF Systems to Reject Samples -- Multiple Classifier Fusion Performance in Networked Stochastic Vector Quantisers -- On Deriving the Second-Stage Training Set for Trainable Combiners -- Using Independence Assumption to Improve Multimodal Biometric Fusion -- Design Methods -- Half-Against-Half Multi-class Support Vector Machines -- Combining Feature Subsets in Feature Selection -- ACE: Adaptive Classifiers-Ensemble System for Concept-Drifting Environments -- Using Decision Tree Models and Diversity Measures in the Selection of Ensemble Classification Models -- Ensembles of Classifiers from Spatially Disjoint Data -- Optimising Two-Stage Recognition Systems -- Design of Multiple Classifier Systems for Time Series Data -- Ensemble Learning with Biased Classifiers: The Triskel Algorithm -- Cluster-Based Cumulative Ensembles -- Ensemble of SVMs for Incremental Learning -- Performance Analysis -- Design of a New Classifier Simulator -- Evaluation of Diversity Measures for Binary Classifier Ensembles -- Which Is the Best Multiclass SVM Method? An Empirical Study -- Over-Fitting in Ensembles of Neural Network Classifiers Within ECOC Frameworks -- Between Two Extremes: Examining Decompositions of the Ensemble Objective Function -- Data Partitioning Evaluation Measures for Classifier Ensembles -- Dynamics of Variance Reduction in Bagging and Other Techniques Based on Randomisation -- Ensemble Confidence Estimates Posterior Probability -- Applications -- Using Domain Knowledge in the Random Subspace Method: Application to the Classification of Biomedical Spectra -- An Abnormal ECG Beat Detection Approach for Long-Term Monitoring of Heart Patients Based on Hybrid Kernel Machine Ensemble -- Speaker Verification Using Adapted User-Dependent Multilevel Fusion -- Multi-modal Person Recognition for Vehicular Applications -- Using an Ensemble of Classifiers to Audit a Production Classifier -- Analysis and Modelling of Diversity Contribution to Ensemble-Based Texture Recognition Performance -- Combining Audio-Based and Video-Based Shot Classification Systems for News Videos Segmentation -- Designing Multiple Classifier Systems for Face Recognition -- Exploiting Class Hierarchies for Knowledge Transfer in Hyperspectral Data.

1,073 citations


Proceedings ArticleDOI
01 Jun 2008
TL;DR: A novel filter bank common spatial pattern (FBCSP) is proposed to perform autonomous selection of key temporal-spatial discriminative EEG characteristics and shows that FBCSP, using a particular combination feature selection and classification algorithm, yields relatively higher cross-validation accuracies compared to prevailing approaches.
Abstract: In motor imagery-based brain computer interfaces (BCI), discriminative patterns can be extracted from the electroencephalogram (EEG) using the common spatial pattern (CSP) algorithm. However, the performance of this spatial filter depends on the operational frequency band of the EEG. Thus, setting a broad frequency range, or manually selecting a subject-specific frequency range, are commonly used with the CSP algorithm. To address this problem, this paper proposes a novel filter bank common spatial pattern (FBCSP) to perform autonomous selection of key temporal-spatial discriminative EEG characteristics. After the EEG measurements have been bandpass-filtered into multiple frequency bands, CSP features are extracted from each of these bands. A feature selection algorithm is then used to automatically select discriminative pairs of frequency bands and corresponding CSP features. A classification algorithm is subsequently used to classify the CSP features. A study is conducted to assess the performance of a selection of feature selection and classification algorithms for use with the FBCSP. Extensive experimental results are presented on a publicly available dataset as well as data collected from healthy subjects and unilaterally paralyzed stroke patients. The results show that FBCSP, using a particular combination feature selection and classification algorithm, yields relatively higher cross-validation accuracies compared to prevailing approaches.

991 citations


Journal ArticleDOI
TL;DR: Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.
Abstract: The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91p on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.

949 citations


Journal ArticleDOI
TL;DR: Experimental results demonstrate that the classification accuracy rates of the developed approach surpass those of grid search and many other approaches, and that the developed PSO+SVM approach has a similar result to GA+S VM, Therefore, the PSO + SVM approach is valuable for parameter determination and feature selection in an SVM.
Abstract: Support vector machine (SVM) is a popular pattern classification method with many diverse applications. Kernel parameter setting in the SVM training procedure, along with the feature selection, significantly influences the classification accuracy. This study simultaneously determines the parameter values while discovering a subset of features, without reducing SVM classification accuracy. A particle swarm optimization (PSO) based approach for parameter determination and feature selection of the SVM, termed PSO+SVM, is developed. Several public datasets are employed to calculate the classification accuracy rate in order to evaluate the developed PSO+SVM approach. The developed approach was compared with grid search, which is a conventional method of searching parameter values, and other approaches. Experimental results demonstrate that the classification accuracy rates of the developed approach surpass those of grid search and many other approaches, and that the developed PSO+SVM approach has a similar result to GA+SVM. Therefore, the PSO+SVM approach is valuable for parameter determination and feature selection in an SVM.

802 citations


Journal ArticleDOI
TL;DR: A neighborhood rough set model is introduced to deal with the problem of heterogeneous feature subset selection and Experimental results show that the neighborhood model based method is more flexible to deals with heterogeneous data.

780 citations


Journal ArticleDOI
TL;DR: This work presents a method to adjust SVM parameters before classification, and examines overlapped segmentation and majority voting as two techniques to improve controller performance.
Abstract: This paper proposes and evaluates the application of support vector machine (SVM) to classify upper limb motions using myoelectric signals. It explores the optimum configuration of SVM-based myoelectric control, by suggesting an advantageous data segmentation technique, feature set, model selection approach for SVM, and postprocessing methods. This work presents a method to adjust SVM parameters before classification, and examines overlapped segmentation and majority voting as two techniques to improve controller performance. A SVM, as the core of classification in myoelectric control, is compared with two commonly used classifiers: linear discriminant analysis (LDA) and multilayer perceptron (MLP) neural networks. It demonstrates exceptional accuracy, robust performance, and low computational load. The entropy of the output of the classifier is also examined as an online index to evaluate the correctness of classification; this can be used by online training for long-term myoelectric control operations.

730 citations


Journal ArticleDOI
TL;DR: Two tree-based ensemble classification algorithms are assessed: Adaboost and Random Forest, based on standard classification accuracy, training time and classification stability, and both outperform a neural network classifier in dealing with hyperspectral data.

720 citations


Journal ArticleDOI
TL;DR: This paper derives necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, and proposes an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.
Abstract: We consider the least-square regression problem with regularization by a block l1-norm, that is, a sum of Euclidean norms over spaces of dimensions larger than one. This problem, referred to as the group Lasso, extends the usual regularization by the l1-norm where all spaces have dimension one, where it is commonly referred to as the Lasso. In this paper, we study the asymptotic group selection consistency of the group Lasso. We derive necessary and sufficient conditions for the consistency of group Lasso under practical assumptions, such as model mis specification. When the linear predictors and Euclidean norms are replaced by functions and reproducing kernel Hilbert norms, the problem is usually referred to as multiple kernel learning and is commonly used for learning from heterogeneous data sources and for non linear variable selection. Using tools from functional analysis, and in particular covar iance operators, we extend the consistency results to this infinite dimensional case and also propose an adaptive scheme to obtain a consistent model estimate, even when the necessary condition required for the non adaptive scheme is not satisfied.

Journal ArticleDOI
TL;DR: This article showed that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias.
Abstract: Meinshausen and Buhlmann [Ann. Statist. 34 (2006) 1436–1462] showed that, for neighborhood selection in Gaussian graphical models, under a neighborhood stability condition, the LASSO is consistent, even when the number of variables is of greater order than the sample size. Zhao and Yu [(2006) J. Machine Learning Research 7 2541–2567] formalized the neighborhood stability condition in the context of linear regression as a strong irrepresentable condition. That paper showed that under this condition, the LASSO selects exactly the set of nonzero regression coefficients, provided that these coefficients are bounded away from zero at a certain rate. In this paper, the regression coefficients outside an ideal model are assumed to be small, but not necessarily zero. Under a sparse Riesz condition on the correlation of design variables, we prove that the LASSO selects a model of the correct order of dimensionality, controls the bias of the selected model at a level determined by the contributions of small regression coefficients and threshold bias, and selects all coefficients of greater order than the bias of the selected model. Moreover, as a consequence of this rate consistency of the LASSO in model selection, it is proved that the sum of error squares for the mean response and the lα-loss for the regression coefficients converge at the best possible rates under the given conditions. An interesting aspect of our results is that the logarithm of the number of variables can be of the same order as the sample size for certain random dependent designs.

Book ChapterDOI
15 Sep 2008
TL;DR: It is shown that ensemble feature selection techniques show great promise for high-dimensional domains with small sample sizes, and provide more robust feature subsets than a single feature selection technique.
Abstract: Robustness or stability of feature selection techniques is a topic of recent interest, and is an important issue when selected feature subsets are subsequently analysed by domain experts to gain more insight into the problem modelled. In this work, we investigate the use of ensemble feature selection techniques, where multiple feature selection methods are combined to yield more robust results. We show that these techniques show great promise for high-dimensional domains with small sample sizes, and provide more robust feature subsets than a single feature selection technique. In addition, we also investigate the effect of ensemble feature selection techniques on classification performance, giving rise to a new model selection strategy.

Journal ArticleDOI
TL;DR: A network-constrained regularization procedure that efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework is introduced.
Abstract: Motivation: Graphs or networks are common ways of depicting information. In biology in particular, many different biological processes are represented by graphs, such as regulatory networks or metabolic pathways. This kind of a priori information gathered over many years of biomedical research is a useful supplement to the standard numerical genomic data such as microarray gene-expression data. How to incorporate information encoded by the known biological networks or graphs into analysis of numerical data raises interesting statistical challenges. In this article, we introduce a network-constrained regularization procedure for linear regression analysis in order to incorporate the information from these graphs into an analysis of the numerical data, where the network is represented as a graph and its corresponding Laplacian matrix. We define a network-constrained penalty function that penalizes the L1-norm of the coefficients but encourages smoothness of the coefficients on the network. Results: Simulation studies indicated that the method is quite effective in identifying genes and subnetworks that are related to disease and has higher sensitivity than the commonly used procedures that do not use the pathway structure information. Application to one glioblastoma microarray gene-expression dataset identified several subnetworks on several of the Kyoto Encyclopedia of Genes and Genomes (KEGG) transcriptional pathways that are related to survival from glioblastoma, many of which were supported by published literatures. Conclusions: The proposed network-constrained regularization procedure efficiently utilizes the known pathway structures in identifying the relevant genes and the subnetworks that might be related to phenotype in a general regression framework. As more biological networks are identified and documented in databases, the proposed method should find more applications in identifying the subnetworks that are related to diseases and other biological processes. Contact: hongzhe@mail.med.upenn.edu

Journal ArticleDOI
TL;DR: Recursive Feature Elimination is evaluated in terms of sensitivity of discriminative maps (Receiver Operative Characteristic analysis) and generalization performances and compare it to previously used univariate voxel selection strategies based on activation and discrimination measures.

Journal ArticleDOI
TL;DR: Improved binary particle swarm optimization (IBPSO) is used in this study to implement feature selection, and the K-nearest neighbor (K-NN) method serves as an evaluator of the IBPSO for gene expression data classification problems, showing that this method effectively simplifies feature selection and reduces the total number of features needed.

Journal ArticleDOI
01 Sep 2008
TL;DR: Experimental results showed the proposed PSO-SVM model can correctly select the discriminating input features and also achieve high classification accuracy.
Abstract: This study proposed a novel PSO-SVM model that hybridized the particle swarm optimization (PSO) and support vector machines (SVM) to improve the classification accuracy with a small and appropriate feature subset. This optimization mechanism combined the discrete PSO with the continuous-valued PSO to simultaneously optimize the input feature subset selection and the SVM kernel parameter setting. The hybrid PSO-SVM data mining system was implemented via a distributed architecture using the web service technology to reduce the computational time. In a heterogeneous computing environment, the PSO optimization was performed on the application server and the SVM model was trained on the client (agent) computer. The experimental results showed the proposed approach can correctly select the discriminating input features and also achieve high classification accuracy.

Journal ArticleDOI
TL;DR: A new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters, in addition to improving prediction accuracy and interpretation.
Abstract: Variable selection can be challenging, particularly in situations with a large number of predictors with possibly high correlations, such as gene expression data. In this article, a new method called the OSCAR (octagonal shrinkage and clustering algorithm for regression) is proposed to simultaneously select variables while grouping them into predictive clusters. In addition to improving prediction accuracy and interpretation, these resulting groups can then be investigated further to discover what contributes to the group having a similar behavior. The technique is based on penalized least squares with a geometrically intuitive penalty function that shrinks some coefficients to exactly zero. Additionally, this penalty yields exact equality of some coefficients, encouraging correlated predictors that have a similar effect on the response to form predictive clusters represented by a single coefficient. The proposed procedure is shown to compare favorably to the existing shrinkage and variable selection techniques in terms of both prediction error and model complexity, while yielding the additional grouping information.

Journal ArticleDOI
TL;DR: This study focuses on the integration of two-block data that are measured on the same samples and shows that sparse PLS provides a valuable variable selection tool for highly dimensional data sets.
Abstract: Recent biotechnology advances allow for multiple types of omics data, such as transcriptomic, proteomic or metabolomic data sets to be integrated. The problem of feature selection has been addressed several times in the context of classification, but needs to be handled in a specific manner when integrating data. In this study, we focus on the integration of two-block data that are measured on the same samples. Our goal is to combine integration and simultaneous variable selection of the two data sets in a one-step procedure using a Partial Least Squares regression (PLS) variant to facilitate the biologists' interpretation. A novel computational methodology called ;;sparse PLS" is introduced for a predictive analysis to deal with these newly arisen problems. The sparsity of our approach is achieved with a Lasso penalization of the PLS loading vectors when computing the Singular Value Decomposition. Sparse PLS is shown to be effective and biologically meaningful. Comparisons with classical PLS are performed on a simulated data set and on real data sets. On one data set, a thorough biological interpretation of the obtained results is provided. We show that sparse PLS provides a valuable variable selection tool for highly dimensional data sets.

Proceedings ArticleDOI
05 Jul 2008
TL;DR: In this article, the authors consider the least-square linear regression problem with regularization by the l1-norm, a problem usually referred to as the Lasso, and present a detailed asymptotic analysis of model consistency.
Abstract: We consider the least-square linear regression problem with regularization by the l1-norm, a problem usually referred to as the Lasso. In this paper, we present a detailed asymptotic analysis of model consistency of the Lasso. For various decays of the regularization parameter, we compute asymptotic equivalents of the probability of correct model selection (i.e., variable selection). For a specific rate decay, we show that the Lasso selects all the variables that should enter the model with probability tending to one exponentially fast, while it selects all other variables with strictly positive probability. We show that this property implies that if we run the Lasso for several bootstrapped replications of a given sample, then intersecting the supports of the Lasso bootstrap estimates leads to consistent model selection. This novel variable selection algorithm, referred to as the Bolasso, is compared favorably to other linear regression methods on synthetic data and datasets from the UCI machine learning repository.

Journal ArticleDOI
TL;DR: The experimental results show that neighborhood-based feature selection algorithm is able to delete most of the redundant and irrelevant features and the classification accuracies based on neighborhood classifier is superior to K-NN, CART in original feature spaces and reduced feature subspaces, and a little weaker than SVM.
Abstract: K nearest neighbor classifier (K-NN) is widely discussed and applied in pattern recognition and machine learning, however, as a similar lazy classifier using local information for recognizing a new test, neighborhood classifier, few literatures are reported on. In this paper, we introduce neighborhood rough set model as a uniform framework to understand and implement neighborhood classifiers. This algorithm integrates attribute reduction technique with classification learning. We study the influence of the three norms on attribute reduction and classification, and compare neighborhood classifier with KNN, CART and SVM. The experimental results show that neighborhood-based feature selection algorithm is able to delete most of the redundant and irrelevant features. The classification accuracies based on neighborhood classifier is superior to K-NN, CART in original feature spaces and reduced feature subspaces, and a little weaker than SVM.

Journal ArticleDOI
01 Oct 2008
TL;DR: This work investigates the role of sparsity and localized features in a biologically-inspired model of visual object classification and demonstrates the value of retaining some position and scale information above the intermediate feature level.
Abstract: We investigate the role of sparsity and localized features in a biologically-inspired model of visual object classification. As in the model of Serre, Wolf, and Poggio, we first apply Gabor filters at all positions and scales; feature complexity and position/scale invariance are then built up by alternating template matching and max pooling operations. We refine the approach in several biologically plausible ways. Sparsity is increased by constraining the number of feature inputs, lateral inhibition, and feature selection. We also demonstrate the value of retaining some position and scale information above the intermediate feature level. Our final model is competitive with current computer vision algorithms on several standard datasets, including the Caltech 101 object categories and the UIUC car localization task. The results further the case for biologically-motivated approaches to object classification.

Journal ArticleDOI
Yaochu Jin1, Bernhard Sendhoff1
01 May 2008
TL;DR: An overview of the existing research on multiobjective machine learning, focusing on supervised learning is provided, and a number of case studies are provided to illustrate the major benefits of the Pareto-based approach to machine learning.
Abstract: Machine learning is inherently a multiobjective task. Traditionally, however, either only one of the objectives is adopted as the cost function or multiple objectives are aggregated to a scalar cost function. This can be mainly attributed to the fact that most conventional learning algorithms can only deal with a scalar cost function. Over the last decade, efforts on solving machine learning problems using the Pareto-based multiobjective optimization methodology have gained increasing impetus, particularly due to the great success of multiobjective optimization using evolutionary algorithms and other population-based stochastic search methods. It has been shown that Pareto-based multiobjective learning approaches are more powerful compared to learning algorithms with a scalar cost function in addressing various topics of machine learning, such as clustering, feature selection, improvement of generalization ability, knowledge extraction, and ensemble generation. One common benefit of the different multiobjective learning approaches is that a deeper insight into the learning problem can be gained by analyzing the Pareto front composed of multiple Pareto-optimal solutions. This paper provides an overview of the existing research on multiobjective machine learning, focusing on supervised learning. In addition, a number of case studies are provided to illustrate the major benefits of the Pareto-based approach to machine learning, e.g., how to identify interpretable models and models that can generalize on unseen data from the obtained Pareto-optimal solutions. Three approaches to Pareto-based multiobjective ensemble generation are compared and discussed in detail. Finally, potentially interesting topics in multiobjective machine learning are suggested.

Journal ArticleDOI
TL;DR: Most methods improve on the naïve complete‐case analysis for variable selection, but importantly the type 1 error is only preserved if selection is based on RR, which is the recommended approach.
Abstract: Multiple imputation is a popular technique for analysing incomplete data. Given the imputed data and a particular model, Rubin's rules (RR) for estimating parameters and standard errors are well established. However, there are currently no guidelines for variable selection in multiply imputed data sets. The usual practice is to perform variable selection amongst the complete cases, a simple but inefficient and potentially biased procedure. Alternatively, variable selection can be performed by repeated use of RR, which is more computationally demanding. An approximation can be obtained by a simple 'stacked' method that combines the multiply imputed data sets into one and uses a weighting scheme to account for the fraction of missing data in each covariate. We compare these and other approaches using simulations based around a trial in community psychiatry. Most methods improve on the naive complete-case analysis for variable selection, but importantly the type 1 error is only preserved if selection is based on RR, which is our recommended approach.

Journal ArticleDOI
TL;DR: The proposed approach to intelligent fault diagnosis based on statistics analysis, an improved distance evaluation technique and adaptive neuro-fuzzy inference system (ANFIS) can reliably recognise different fault categories and severities.
Abstract: This paper presents a new approach to intelligent fault diagnosis based on statistics analysis, an improved distance evaluation technique and adaptive neuro-fuzzy inference system (ANFIS). The approach consists of three stages. First, different features, including time-domain statistical characteristics, frequency-domain statistical characteristics and empirical mode decomposition (EMD) energy entropies, are extracted to acquire more fault characteristic information. Second, an improved distance evaluation technique is proposed, and with it, the most superior features are selected from the original feature set. Finally, the most superior features are fed into ANFIS to identify different abnormal cases. The proposed approach is applied to fault diagnosis of rolling element bearings, and testing results show that the proposed approach can reliably recognise different fault categories and severities. Moreover, the effectiveness of the proposed feature selection method is also demonstrated by the testing results.

Proceedings Article
13 Jul 2008
TL;DR: A novel algorithm is proposed to efficiently find the global optimal feature subset such that the subset-level score is maximized, and extensive experiments demonstrate the effectiveness of the proposed algorithm in comparison with the traditional methods for feature selection.
Abstract: Fisher score and Laplacian score are two popular feature selection algorithms, both of which belong to the general graph-based feature selection framework. In this framework, a feature subset is selected based on the corresponding score (subset-level score), which is calculated in a trace ratio form. Since the number of all possible feature subsets is very huge, it is often prohibitively expensive in computational cost to search in a brute force manner for the feature subset with the maximum subset-level score. Instead of calculating the scores of all the feature subsets, traditional methods calculate the score for each feature, and then select the leading features based on the rank of these feature-level scores. However, selecting the feature subset based on the feature-level score cannot guarantee the optimum of the subset-level score. In this paper, we directly optimize the subset-level score, and propose a novel algorithm to efficiently find the global optimal feature subset such that the subset-level score is maximized. Extensive experiments demonstrate the effectiveness of our proposed algorithm in comparison with the traditional methods for feature selection.

Journal ArticleDOI
01 Sep 2008
TL;DR: Experimental results indicate that the classification accuracy rates of the proposed approach exceed those of grid search and other approaches, and the SA-SVM is thus useful for parameter determination and feature selection in the SVM.
Abstract: Support vector machine (SVM) is a novel pattern classification method that is valuable in many applications. Kernel parameter setting in the SVM training process, along with the feature selection, significantly affects classification accuracy. The objective of this study is to obtain the better parameter values while also finding a subset of features that does not degrade the SVM classification accuracy. This study develops a simulated annealing (SA) approach for parameter determination and feature selection in the SVM, termed SA-SVM. To measure the proposed SA-SVM approach, several datasets in UCI machine learning repository are adopted to calculate the classification accuracy rate. The proposed approach was compared with grid search which is a conventional method of performing parameter setting, and various other methods. Experimental results indicate that the classification accuracy rates of the proposed approach exceed those of grid search and other approaches. The SA-SVM is thus useful for parameter determination and feature selection in the SVM.

Book
29 Sep 2008
TL;DR: Computational Intelligence and Feature Selection provides a high level audience with both the background and fundamental ideas behind feature selection with an emphasis on those techniques based on rough and fuzzy sets, including their hybridizations.
Abstract: Computational Intelligence and Feature Selection provides a high level audience with both the background and fundamental ideas behind feature selection with an emphasis on those techniques based on rough and fuzzy sets, including their hybridizations It introduces set theory, fuzzy set theory, rough set theory, and fuzzy-rough set theory, and illustrates the power and efficacy of the feature selection described through the use of real-world applications and worked examples Program files implementing major algorithms covered, together with the necessary instructions and datasets, are available on the Web

Journal ArticleDOI
TL;DR: A general framework, incremental tensor analysis (ITA), which efficiently computes a compact summary for high-order and high-dimensional data, and also reveals the hidden correlations is introduced.
Abstract: How do we find patterns in author-keyword associations, evolving over timeq Or in data cubes (tensors), with product-branchcustomer sales informationq And more generally, how to summarize high-order data cubes (tensors)q How to incrementally update these patterns over timeq Matrix decompositions, like principal component analysis (PCA) and variants, are invaluable tools for mining, dimensionality reduction, feature selection, rule identification in numerous settings like streaming data, text, graphs, social networks, and many more settings. However, they have only two orders (i.e., matrices, like author and keyword in the previous example).We propose to envision such higher-order data as tensors, and tap the vast literature on the topic. However, these methods do not necessarily scale up, let alone operate on semi-infinite streams. Thus, we introduce a general framework, incremental tensor analysis (ITA), which efficiently computes a compact summary for high-order and high-dimensional data, and also reveals the hidden correlations. Three variants of ITA are presented: (1) dynamic tensor analysis (DTA); (2) streaming tensor analysis (STA); and (3) window-based tensor analysis (WTA). In paricular, we explore several fundamental design trade-offs such as space efficiency, computational cost, approximation accuracy, time dependency, and model complexity.We implement all our methods and apply them in several real settings, such as network anomaly detection, multiway latent semantic indexing on citation networks, and correlation study on sensor measurements. Our empirical studies show that the proposed methods are fast and accurate and that they find interesting patterns and outliers on the real datasets.

Journal ArticleDOI
TL;DR: It is shown that a behavior model trained using an unlabeled data set is superior to those trained using the same but labeled data set in detecting anomaly from an unseen video, and the online LRT-based behavior recognition approach is advantageous over the commonly used Maximum Likelihood method in differentiating ambiguities among different behavior classes observed online.
Abstract: This paper aims to address the problem of modeling video behavior captured in surveillance videos for the applications of online normal behavior recognition and anomaly detection. A novel framework is developed for automatic behavior profiling and online anomaly sampling/detection without any manual labeling of the training data set. The framework consists of the following key components: 1) A compact and effective behavior representation method is developed based on discrete-scene event detection. The similarity between behavior patterns are measured based on modeling each pattern using a Dynamic Bayesian Network (DBN). 2) The natural grouping of behavior patterns is discovered through a novel spectral clustering algorithm with unsupervised model selection and feature selection on the eigenvectors of a normalized affinity matrix. 3) A composite generative behavior model is constructed that is capable of generalizing from a small training set to accommodate variations in unseen normal behavior patterns. 4) A runtime accumulative anomaly measure is introduced to detect abnormal behavior, whereas normal behavior patterns are recognized when sufficient visual evidence has become available based on an online Likelihood Ratio Test (LRT) method. This ensures robust and reliable anomaly detection and normal behavior recognition at the shortest possible time. The effectiveness and robustness of our approach is demonstrated through experiments using noisy and sparse data sets collected from both indoor and outdoor surveillance scenarios. In particular, it is shown that a behavior model trained using an unlabeled data set is superior to those trained using the same but labeled data set in detecting anomaly from an unseen video. The experiments also suggest that our online LRT-based behavior recognition approach is advantageous over the commonly used Maximum Likelihood (ML) method in differentiating ambiguities among different behavior classes observed online.