scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 1995"


Journal ArticleDOI
TL;DR: In this article, a Bayesian approach to hypothesis testing, model selection, and accounting for model uncertainty is presented, which is straightforward through the use of the simple and accurate BIC approximation, and it can be done using the output from standard software.
Abstract: It is argued that P-values and the tests based upon them give unsatisfactory results, especially in large samples. It is shown that, in regression, when there are many candidate independent variables, standard variable selection procedures can give very misleading results. Also, by selecting a single model, they ignore model uncertainty and so underestimate the uncertainty about quantities of interest. The Bayesian approach to hypothesis testing, model selection, and accounting for model uncertainty is presented. Implementing this is straightforward through the use of the simple and accurate BIC approximation, and it can be done using the output from standard software. Specific results are presented for most of the types of model commonly used in sociology. It is shown that this approach overcomes the difficulties with P-values and standard model selection procedures based on them. It also allows easy comparison of nonnested models, and permits the quantification of the evidence for a null hypothesis of interest, such as a convergence theory or a hypothesis about societal norms.

6,100 citations


01 Mar 1995
TL;DR: This thesis applies neural network feature selection techniques to multivariate time series data to improve prediction of a target time series and results indicate that the Stochastics and RSI indicators result in better prediction results than the moving averages.
Abstract: : This thesis applies neural network feature selection techniques to multivariate time series data to improve prediction of a target time series. Two approaches to feature selection are used. First, a subset enumeration method is used to determine which financial indicators are most useful for aiding in prediction of the S&P 500 futures daily price. The candidate indicators evaluated include RSI, Stochastics and several moving averages. Results indicate that the Stochastics and RSI indicators result in better prediction results than the moving averages. The second approach to feature selection is calculation of individual saliency metrics. A new decision boundary-based individual saliency metric, and a classifier independent saliency metric are developed and tested. Ruck's saliency metric, the decision boundary based saliency metric, and the classifier independent saliency metric are compared for a data set consisting of the RSI and Stochastics indicators as well as delayed closing price values. The decision based metric and the Ruck metric results are similar, but the classifier independent metric agrees with neither of the other metrics. The nine most salient features, determined by the decision boundary based metric, are used to train a neural network and the results are presented and compared to other published results. (AN)

1,545 citations


Proceedings ArticleDOI
05 Nov 1995
TL;DR: Chi2 is a simple and general algorithm that uses the /spl chi//sup 2/ statistic to discretize numeric attributes repeatedly until some inconsistencies are found in the data, and achieves feature selection via discretization.
Abstract: Discretization can turn numeric attributes into discrete ones. Feature selection can eliminate some irrelevant attributes. This paper describes Chi2 a simple and general algorithm that uses the /spl chi//sup 2/ statistic to discretize numeric attributes repeatedly until some inconsistencies are found in the data, and achieves feature selection via discretization. The empirical results demonstrate that Chi/sup 2/ is effective in feature selection and discretization of numeric and ordinal attributes.

960 citations


Book
01 Aug 1995
TL;DR: Descriptive statistics the acquisition and enhancement of data feature selection and extraction pattern recognition - unsupervised analysis pattern Recognition - supervised learning calibration and regression analysis matrix tools and operations.
Abstract: Descriptive statistics the acquisition and enhancement of data feature selection and extraction pattern recognition - unsupervised analysis pattern recognition - supervised learning calibration and regression analysis matrix tools and operations.

429 citations


Proceedings Article
20 Aug 1995
TL;DR: This work introduces compound operators that dynamically change the topology of the search space to better utilize the information available from the evaluation of feature subsets and shows that compound operators unify previous approaches that deal with relevant and irrelevant features.
Abstract: In the wrapper approach to feature subset selection, a search for an optimal set of features is made using the induction algorithm as a black box. The estimated future performance of the algorithm is the heuristic guiding the search. Statistical methods for feature subset selection including forward selection, backward elimination, and their stepwise variants can be viewed as simple hill-climbing techniques in the space of feature subsets. We utilize best-first search to find a good feature subset and discuss overfitting problems that may be associated with searching too many feature subsets. We introduce compound operators that dynamically change the topology of the search space to better utilize the information available from the evaluation of feature subsets. We show that compound operators unify previous approaches that deal with relevant and irrelevant features. The improved feature subset selection yields significant improvements for real-world datasets when using the ID3 and the Naive-Bayes induction algorithms.

358 citations


Journal ArticleDOI
TL;DR: In this article, a predictive Bayesian viewpoint is advocated to avoid the specification of prior probabilities for the candidate models and the detailed interpretation of the parameters in each model, and using criteria derived from a certain predictive density and a prior specification that emphasizes the observables, they implement the proposed methodology for three common problems arising in normal linear models: variable subset selection, selection of a transformation of predictor variables and estimation of a parametric variance function.
Abstract: We consider the problem of selecting one model from a large class of plausible models. A predictive Bayesian viewpoint is advocated to avoid the specification of prior probabilities for the candidate models and the detailed interpretation of the parameters in each model. Using criteria derived from a certain predictive density and a prior specification that emphasizes the observables, we implement the proposed methodology for three common problems arising in normal linear models: variable subset selection, selection of a transformation of predictor variables and estimation of a parametric variance function. Interpretation of the relative magnitudes of the criterion values for various models is facilitated by a calibration of the criteria. Relationships between the proposed criteria and other well-known criteria are examined

337 citations


Journal ArticleDOI
TL;DR: This method of determining input features is shown to result in more accurate, faster training multilayer perceptron classifiers.

161 citations


Proceedings Article
J. Bala1, J. Huang1, H. Vafaie1, K. Dejong1, Harry Wechsler1 
20 Aug 1995
TL;DR: A hybrid learning methodology that integrates genetic algorithms (GAs) and decision tree learning (ID3) in order to evolve optimal subsets of discriminatory features for robust pattern classification is introduced.
Abstract: This paper introduces a hybrid learning methodology that integrates genetic algorithms (GAs) and decision tree learning (ID3) in order to evolve optimal subsets of discriminatory features for robust pattern classification. A GA is used to search the space of all possible subsets of a large set of candidate discrimination features. For a given feature subset, ID3 is invoked to produce a decision tree. The classification performance of the decision tree on unseen data is used as a measure of fitness for the given feature set, which, in turn, is used by the GA to evolve better feature sets. This GA-ID3 process iterates until a feature subset is found with satisfactory classification performance. Experimental results are presented which illustrate the feasibility of our approach on difficult problems involving recognizing visual concepts in satellite and facial image data. The results also show improved classification performance and reduced description complexity when compared against standard methods for feature selection.

139 citations


Journal ArticleDOI
TL;DR: This work shows that, in the authors' pattern classification problem, using a feature selection step reduced the number of features used, reduced the processing time requirements, and gave results comparable to the full set of features.
Abstract: In pattern classification problems, the choice of variables to include in the feature vector is a difficult one. The authors have investigated the use of stepwise discriminant analysis as a feature selection step in the problem of segmenting digital chest radiographs. In this problem, locally calculated features are used to classify pixels into one of several anatomic classes. The feature selection step was used to choose a subset of features which gave performance equivalent to the entire set of candidate features, while utilizing less computational resources. The impact of using the reduced/selected feature set on classifier performance is evaluated for two classifiers: a linear discriminator and a neural network. The results from the reduced/selected feature set were compared to that of the full feature set as well as a randomly selected reduced feature set. The results of the different feature sets were also compared after applying an additional postprocessing step which used a rule-based spatial information heuristic to improve the classification results. This work shows that, in the authors' pattern classification problem, using a feature selection step reduced the number of features used, reduced the processing time requirements, and gave results comparable to the full set of features. >

134 citations


Journal ArticleDOI
TL;DR: In this paper, a comparison of several calibration methods (principal component regression (PCR), partial least squares, multiple linear regression), with and without feature selection, applied on near-infrared spectroscopic data is presented for a pharmaceutical application.

108 citations


Journal ArticleDOI
TL;DR: In this paper, the authors used near-infrared spectroscopy to discriminate between different dosage strengths of tablets in blister packs, and three data transformation methods were studied, the second derivative appears to be the most effective transformation.

Book ChapterDOI
William W. Cohen1
09 Jul 1995
TL;DR: It is shown that FOIL usually forms classifiers with lower error rates and higher rates of precision and recall with a relational encoding than with a propositional encoding, and its performance can be improved by relation selection, a first order analog of feature selection.
Abstract: We evaluate the first order learning system FOIL on a series of text categorization problems. It is shown that FOIL usually forms classifiers with lower error rates and higher rates of precision and recall with a relational encoding than with a propositional encoding. We show that FOIL's performance can be improved by relation selection, a first order analog of feature selection. Relation selection improves FOIL's performance as measured by any of recall, precision, F-measure, or error rate. With an appropriate level of relation selection, FOIL appears to be competitive with or superior to existing propositional techniques.

Journal ArticleDOI
TL;DR: A new method of feature selection based on the approximation of class conditional densities by a mixture of parameterized densities of a special type, suitable especially for multimodal data, is presented.

Journal ArticleDOI
TL;DR: In this paper, a method of estimating linear model dimension and variable selection is proposed based on a new class of penalty functions and a procedure of sorting covariates based on t-statistics.
Abstract: A method of estimating linear model dimension and variable selection is proposed This new criterion, which generalizes the Cp criterion, the Akaike information criterion (AIC), the Bayes information criterion, and the phiv criterion and is consistent under certain conditions, is based on a new class of penalty functions and a procedure of sorting covariates based on t-statistics In the course of introducing this method, we discuss the important role of the penalty function in the consistency of model dimension estimation and in variable selection The proposed method requires less computation than resampling-based methods that search over all subsets of covariates for the true model Simulation results show that the new method is superior to the Cp criterion and AIC in finite-sample situations as well


Proceedings ArticleDOI
05 Nov 1995
TL;DR: The approach involves the use of genetic algorithms as a "front end" to a traditional tree induction system (ID3) in order to find the best feature set to be used by the induction system.
Abstract: This paper describes an approach being explored to improve the usefulness of machine learning techniques to classify complex, real world data. The approach involves the use of genetic algorithms as a "front end" to a traditional tree induction system (ID3) in order to find the best feature set to be used by the induction system. This approach has been implemented and tested on difficult texture classification problems. The results are encouraging and indicate significant advantages of the presented approach.

Book ChapterDOI
09 Jul 1995
TL;DR: An inductive learning method based on linear programming that predicts recurrence times using censored training examples, that is, examples in which the available training output may be only a lower bound on the “right answer.”
Abstract: This paper introduces the Recurrence Surface Approximation, an inductive learning method based on linear programming that predicts recurrence times using censored training examples, that is, examples in which the available training output may be only a lower bound on the “right answer.” This approach is augmented with a feature selection method that chooses an appropriate feature set within the context of the linear programming generalizer. Computational results in the field of breast cancer prognosis are shown. A straightforward translation of the prediction method to an artificial neural network model is also proposed.

Proceedings ArticleDOI
09 Apr 1995
TL;DR: This work demonstrates a technique called "sensitivity-based pruning" (SBP) that removes irrelevant input variables from a nonlinear forecasting or regression model and makes use of a saliency measure computed for each input variable and uses estimates of prediction risk for determining the number of input variables to prune.
Abstract: Selecting a "best subset" of input variables is a critical issue in forecasting. This is especially true when the number of available input series is large, and an exhaustive search through all combinations of variables is computationally infeasible. Inclusion of irrelevant variables not only doesn't help prediction, but can reduce forecasting accuracy through added noise or systematic bias. We demonstrate a technique called "sensitivity-based pruning" (SBP) that removes irrelevant input variables from a nonlinear forecasting or regression model. The technique makes use of a saliency measure computed for each input variable and uses estimates of prediction risk for determining the number of input variables to prune. We present preliminary results of the SBP technique applied to neural network predictors of a key business cycle measure, the US Index of Industrial Production.

Dissertation
01 Jan 1995
TL;DR: Wavelet neural networks are introduced as a new class of elliptic basis function neural networks and wavelet networks, and are applied to the numerical modeling and classification of EEGs, finding them to be ideally suited for problems of EEG analysis.
Abstract: Wavelet neural networks (WNNs) are introduced as a new class of elliptic basis function neural networks and wavelet networks, and are applied to the numerical modeling and classification of EEGs. The implementation of the networks is achieved in two possibly cyclical stages of structure and parameter identification. For structure identification, two methods are developed: one generic, based on data clusterings, and one specific, using wavelet analysis. For parameter identification, two methods are also implemented: the Levenberg-Marquardt algorithm and a genetic algorithm of ranking type. The problem of model generalization is considered from both, a crossvalidation and a regularization point of view. For the latter, a corrected average squared error (CASE) is derived as a new model selection criterion that does not rely on assumptions about error distributions or modeling paradigms. For EEG modeling, the nonlinear dynamics framework is employed in the reconstruction of state-spaces via the embedding scheme. Preprocessing for the resulting state-vector is introduced in terms of decorrelation and compression. The naive application of chaos theory to EEGs is shown to be useful in feature extraction, but not in corroborating theories about the nature of EEGs. For the latter, the concept of modeling resolution is introduced. It is shown that the chaos-in-the-brain question becomes meaningful only as a function of modeling resolution. For EEG classification, a general WNN classification system is implemented as a cascade of synergistic feature selection, WNN nonlinear discrimination, and decision logic. A feature library is described including raw and model-based features, ranging from traditional measures to chaotic indicators. Training for maximum-likelihood classification is shown to be inductively feasible via a decoder-type WNN classifier adjusted with nonanalytic methods. WNNs were found to be ideally suited for problems of EEG analysis due to the long-duration/low-frequency and short-duration/high-frequency structure of EEG signals.

Journal ArticleDOI
TL;DR: This paper presents some of the computer vision techniques employed in order to automatically select features, measure features' displacements, and evaluate measurements during robotic visual servoing tasks and the most robust proved to be the Sum-of-Squared Differences (SSD) optical flow technique.
Abstract: This paper presents some of the computer vision techniques that were employed in order to automatically select features, measure features' displacements, and evaluate measurements during robotic visual servoing tasks. We experimented with a lot of different techniques, but the most robust proved to be the Sum-of-Squared Differences (SSD) optical flow technique. In addition, several techniques for the evaluation of the measurements are presented. One important characteristic of these techniques is that they can also be used for the selection of features for tracking in conjunction with several numerical criteria that guarantee the robustness of the servoing. These techniques are important aspects of our work since they can be used either on-line or off-line. An extension of the SSD measure to color images is presented and the results from the application of these techniques to real images are discussed. Finally, the derivation of depth maps through the controlled motion of the handeye system is outlined and the important role of the automatic feature selection algorithm in the accurate computation of the depth-related parameters is highlighted.

01 Apr 1995
TL;DR: A method to achieve this using a very simple algorithm that gives good performance across different supervised learning schemes and when compared to one of the most common methods for feature subset selection.
Abstract: It has been our experience that in order to obtain useful results using supervised learning of real-world datasets it is necessary to perform feature subset selection and to perform many experiments using computed aggregates from the most relevant features. It is, therefore, important to look for selection algorithms that work quickly and accurately so that these experiments can be performed in a reasonable length of time, preferably interactively. This paper suggests a method to achieve this using a very simple algorithm that gives good performance across different supervised learning schemes and when compared to one of the most common methods for feature subset selection.

Journal ArticleDOI
Rong Chen1
TL;DR: A digression concept is introduced and two simple algorithms to classify the observations without knowing the threshold variable are proposed and used with several graphical procedures to search for the most suitable threshold variable.
Abstract: . An open-loop threshold autoregressive model is defined as The main difficulty for building such a model is that the threshold variable Zt is usually unknown. In practice, there may exist many possible candidates for the threshold variable Zt. It is difficult and tedious, if not impossible, to search for the best among all the candidates using standard model selection procedures. In this paper, we introduce a digression concept and propose two simple algorithms to classify the observations without knowing the threshold variable. The classification is then used with several graphical procedures to search for the most suitable threshold variable. Simulated and real examples are included to illustrate the proposed procedures.

Proceedings Article
20 Aug 1995
TL;DR: It is proved that the stepwise backward selection algorithm finds a small subset of relevant features that are ideally sufficient and necessary to define target concepts with respect to a given threshold.
Abstract: In this paper, we investigate enhancements to an upper classifier - a decision algorithm generated by an upper classification method, which is one of the classification methods in rough set theory Specifically, we consider two enhancements First, we present a stepwise backward feature selection algorithm to preprocess a given set of features This is important because rough classification methods are incapable of removing superfluous features We prove that the stepwise backward selection algorithm finds a small subset of relevant features that are ideally sufficient and necessary to define target concepts with respect to a given threshold This threshold value indicates an acceptable degradation in the quality of an upper classifier Second, to make an upper classifier adaptive, we associate it with some kind of frequency information, which we call incremental information An extended decision table is used to represent an adaptive upper classifier It is also used for interpreting an upper classifier either deterministically or nondeterministically

Patent
02 Mar 1995
TL;DR: In this paper, an image recognition and classification system includes a preprocessor in which a "top-down" method is used to extract features from an image; an associative learning neural network system, which groups the features into patterns and classifies the patterns: and a feedback mechanism which improves system performance by tuning preprocessor scale, feature detection, and feature selection.
Abstract: An image recognition and classification system includes a preprocessor in which a "top-down" method is used to extract features from an image; an associative learning neural network system, which groups the features into patterns and classifies the patterns: and a feedback mechanism which improves system performance by tuning preprocessor scale, feature detection, and feature selection.

Journal ArticleDOI
TL;DR: This review focuses on the last decade's development of the computational stereopsis for recovering three-dimensional information, and special attention is paid to parallelism as a way to achieve the required level of performance.
Abstract: This review focuses on the last decade's development of the computational stereopsis for recovering three-dimensional information. The main components of the stereo analysis are exposed: image acquisition and camera modeling, feature selection, feature matching and disparity interpretation. A brief survey is given of the well known feature selection approaches and the estimation parameters for this selection are mentioned. The difficulties in identifying correspondent locations in the two images are explained. Methods as to how effectively to constrain the search for correct solution of the correspondence problem are discussed, as are strategies for the whole matching process. Reasons for the occurrence of matching errors are considered. Some recently proposed approaches, employing new ideas in the modeling of stereo matching in terms of energy minimization, are described. Acknowledging the importance of computation time for real-time applications, special attention is paid to parallelism as a way to achieve the required level of performance. The development of trinocular stereo analysis as an alternative to the conventional binocular one, is described. Finally a classification based on the test images for verification of the stereo matching algorithms, is supplied

Proceedings ArticleDOI
26 Jun 1995
TL;DR: It is shown that optimal electrode positions as well as frequency bands are strongly dependent on each subject and that a subject specific feature selection is when important for BCI systems.
Abstract: This paper describes a simple but very powerful method for feature selection. The Distinction Sensitive Learning Vector Quantizer (DSLVQ) is a learning classifier which focuses on relevant features according to its own instance based classifications. Two different experiments describe the application of DSLVQ as a feature selector for an EEG-based Brain Computer Interface (BCI) system. It is shown that optimal electrode positions as well as frequency bands are strongly dependent on each subject and that a subject specific feature selection is when important for BCI systems.

Journal ArticleDOI
TL;DR: Several methods have been proposed in recent years for selecting variables in such mixed-variable discrimination situations, and a brief review of the possibilities can be found in this paper, where some of the methods are simply variations on the same basic underlying model (the location model).

Proceedings ArticleDOI
22 Oct 1995
TL;DR: Using a technique called projection pursuit, a pre-processing dimensional reduction method has been developed based on the optimization of a projection index and a method to estimate an initial value that can more quickly lead to the global maximum is presented.
Abstract: Supervised classification techniques use labeled samples to train the classifier. Often the number of such samples is limited, thus limiting the precision with which class characteristics can be estimated. As the number of spectral bands becomes large, the limitation on performance imposed by the limited number of training samples can become severe. Such consequences suggest the value of reducing the dimensionality by a pre-processing method that takes advantage of the asymptotic normality of projected data. Using a technique called projection pursuit, a pre-processing dimensional reduction method has been developed based on the optimization of a projection index. A method to estimate an initial value that can more quickly lead to the global maximum is presented for projection pursuit using the Bhattacharyya distance as the projection index.

Posted Content
TL;DR: In this article, a Bayesian approach that goes beyond the standard independence prior for variable selection is adopted, and preference for certain models is interpreted as prior information, which may then be incorporated in a model selection procedure.
Abstract: In data sets with many predictors, algorithms for identifying a good subset of predictors are often used. Most such algorithms do not account for any relationships between predictors. For example, stepwise regression might select a model containing an interaction AB but neither main effect A or B. This paper develops mathematical representations of this and other relations between predictors, which may then be incorporated in a model selection procedure. A Bayesian approach that goes beyond the standard independence prior for variable selection is adopted, and preference for certain models is interpreted as prior information. Priors relevant to arbitrary interactions and polynomials, dummy variables for categorical factors, competing predictors, and restrictions on the size of the models are developed. Since the relations developed are for priors, they may be incorporated in any Bayesian variable selection algorithm for any type of linear model. The application of the methods here is illustrated via the Stochastic Search Variable Selection algorithm of George and McCulloch (1993), which is modified to utilize the new priors. The performance of the approach is illustrated with two constructed examples and a computer performance dataset. Keywords: Model Selection, Prior Distributions, Interaction, Dummy Variable

Proceedings ArticleDOI
27 Nov 1995
TL;DR: A hierarchical clustering mechanism that enables to make a tree structure graph of classification result of samples, and find features of each cluster, arboART is utilized to automatic rule generation of Kansei engineering expert systems.
Abstract: A hierarchical clustering mechanism is designed to analyze multidimensional data and feature selection based on ART-type neural networks. Prototype of clusters obtained from an ART's top-down vectors are sent to another ART. Several ART networks that have different similarity criteria are used for cluster combination. This scheme of hierarchical clustering (arboART) enables to make a tree structure graph of classification result of samples, and find features of each cluster. arboART is utilized to automatic rule generation of Kansei engineering expert systems. Analyzing result on color evaluation experiment by arboART and comparison with conventional multivariate analysis are shown.