scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 1990"


01 Jan 1990
TL;DR: A technique has been developed which analyzes the weights in a multilayer perceptron to determine which features the network finds important and which are unimportant, and the saliency measure is used to compare the results of two different training rules on a target recognition problem.
Abstract: The problem of selecting the best set of features for target r ecognition using a multilayer perceptron is addressed in this paper. A technique has been developed which analyzes the weights in a multilayer perceptron to determine which features the network finds important and which are unimportant. A brief introduction to the use of multilayer perceptrons for classification and the training rules available is followed by the mathematical development of the saliency measure for multilayer perceptrons. The tec hnique is applied to two different image databases and is found to be consistent with statistical techniques and independent of the network initial conditions. The saliency measure is the n used to compare the results of two different training rules on a target recognition problem.

262 citations


Journal ArticleDOI
TL;DR: Taking advantage of the orthogonality property, a systematic feature selection method for choosing an appropriate number of the Zernike features is developed, based on computing a measure of the information content differences of features of different classes.

166 citations


Proceedings Article
01 Oct 1990
TL;DR: The results suggest that genetic algorithms are becoming practical for pattern classification problems as faster serial and parallel computers are developed.
Abstract: Genetic algorithms were used to select and create features and to select reference exemplar patterns for machine vision and speech pattern classification tasks. For a complex speech recognition task, genetic algorithms required no more computation time than traditional approaches to feature selection but reduced the number of input features required by a factor of five (from 153 to 33 features). On a difficult artificial machine-vision task, genetic algorithms were able to create new features (polynomial functions of the original features) which reduced classification error rates from 19% to almost 0%. Neural net and k nearest neighbor (KNN) classifiers were unable to provide such low error rates using only the original features. Genetic algorithms were also used to reduce the number of reference exemplar patterns for a KNN classifier. On a 338 training pattern vowel-recognition problem with 10 classes, genetic algorithms reduced the number of stored exemplars from 338 to 43 without significantly increasing classification error rate. In all applications, genetic algorithms were easy to apply and found good solutions in many fewer trials than would be required by exhaustive search. Run times were long, but not unreasonable. These results suggest that genetic algorithms are becoming practical for pattern classification problems as faster serial and parallel computers are developed.

82 citations


Proceedings ArticleDOI
J. Sheinvald1, Byron Dom1, W. Niblack1
16 Jun 1990
TL;DR: In this paper, an information-theoretic approach is used to derive a new feature selection criterion capable of detecting features that are totally useless by fitting a probability model to a given training set of classified feature-vectors using the minimum description length criterion (MDLC) for model selection.
Abstract: An information-theoretic approach is used to derive a new feature selection criterion capable of detecting features that are totally useless. Since the number of useless features is initially unknown, traditional class-separability and distance measures are not capable of coping with this problem. The useless feature-subset is detected by fitting a probability model to a given training set of classified feature-vectors using the minimum-description-length criterion (MDLC) for model selection. The resulting criterion for the Gaussian case is a simple closed-form expression, having a plausible geometric interpretation, and is proved to be consistent, i.e., it yields the true useless subset with probability 1 as the size of the training set grows to infinity. Simulations show excellent results compared to the cross-validation method and other information-theoretic criteria, even for small-sized training sets. >

42 citations


Proceedings ArticleDOI
17 Jun 1990
TL;DR: On a difficult artificial machine-vision task, genetic algorithms were able to create new features (polynomial functions of the original features) which dramatically reduced classification error rates.
Abstract: Genetic algorithms were used for feature selection and creation in two pattern-classification problems. On a machine-version inspection task, it was found that genetic algorithms performed no better than conventional approaches to feature selection but required much more computation. On a difficult artificial machine-vision task, genetic algorithms were able to create new features (polynomial functions of the original features) which dramatically reduced classification error rates. Neural network and nearest-neighbor classifiers were unable to provide such low error rates using only the original features

34 citations


Journal ArticleDOI
TL;DR: It is shown that the branch and bound algorithm guarantees the optimal feature subset without evaluating all possible feature subsets, if the criterion function used satisfies the ‘monotonicity’ property.

24 citations




Journal ArticleDOI
TL;DR: Hawkins' procedure seems the better method for detecting outliers when multinormal distribution procedure and Hawkins' procedure were applied, and the two subsets produced did not differ greatly.

8 citations



Journal ArticleDOI
Manabu1
TL;DR: An optimum feature selection method which is applicable to arbitrary (nonlinear) decision functions is presented and numerical examples of feature selection for a linear and a quadratic decision function are presented.
Abstract: Feature selection is one of the most important processes in the design of pattern classifiers. This paper presents an optimum feature selection method which is applicable to arbitrary (nonlinear) decision functions. It is assumed that a finite number of training samples (training set) is given for each pattern class, and the decision function is designed based on the training sets. The training sets are edited by removing the samples which are classified incorrectly by the decision function. Then the feature selection problem is transformed to a modified zero-one integer program. In this method, under a chosen permissible error, a minimum feature subset can be found which is combinationally optimum. Numerical examples of feature selection for a linear and a quadratic decision function are presented.


Proceedings ArticleDOI
01 Mar 1990
TL;DR: An information fusion approach is presented for mapping a multiple dimensional feature space into a lower dimensional decision space with simplified decision boundaries by measuring differences in probability density functions of features.
Abstract: An information fusion approach is presented for mapping a multiple dimensional feature space into a lower dimensional decision space with simplified decision boundaries. A new statistic, called the tie statistic, is used to perform the mapping by measuring differences in probability density functions of features. These features are then evaluated based on the separation of the decision classes using a parametric beta representation for the tie statistic. The feature evaluation and fusion methods are applied to perform texture recognition.

Proceedings ArticleDOI
16 Jun 1990
TL;DR: A study on applying the discrete pseudo-Wigner distribution to classification of the plasma cortisol short-time series of two disease categories is presented and shows a great potential for using Wigner spectra for feature extraction.
Abstract: A study on applying the discrete pseudo-Wigner distribution to classification of the plasma cortisol short-time series of two disease categories is presented. The autocomponent selection in the Wigner distribution has been studied in regard to feature selection for the pattern recognition. Three energy density features are selected along with the mean level of each time series. These features allow a simple linear classification of the available data. The results show a great potential for using Wigner spectra for feature extraction. >

01 Apr 1990
TL;DR: Variable selection techniques in stepwise regression analysis are discussed, and general and specific criticisms of the stepwise method in the literature are outlined and misuses of the method are considered.
Abstract: Variable selection techniques in stepwise regression analysis are discussed. In stepwise regression, variables are added or deleted from a model in sequence to produce a final "good" or "best" predictive model. Stepwise computer programs are discussed and four different variable selection strategies are described. These strategies include the forward method, backward method, forward stepwise method, and backward stepwise method. General and specific criticisms of the stepwise method in the literature are outlined, and misuses of the method are considered. Although most statisticians would agree that stepwise methods should not be used when an explanatory model is desired, it is common to see research articles where explanatory interpretations are given to a model that is called a "prediction" model. Stepwise methods should not be used to determine the number of variables in the final model. When model selection is being performed, the stepwise method can be helpful if: the initial choice of variables is conducted based on theory, the defaults are not used automatically, more than one run is done using different variable selection methods, and the final model is chosen through an intelligent process rather than automatically using the final model generated by the computer program. Three tables and outlines of the models described are included. (TJH) *********************************************************************** * Reproductions supplied by EDRS are the best that can be made * * from the original document. * *********************************************************************** V U S DEPARTMENT OF EDUCATION Ott,ce of Educattonal Research and Improvement EDUCATIONAL RESOURCr.S INFORMATION CENTER (ERIC) fi4n.'is document has been reproduced as received from the person of organization origInating 0 Minor changes have been made to improve reproduction quality 0 Pc.rts or opn.ons stated .n ohs emu ment do not necessar., represent official OERI pos,toon or policy "PERMISSION TO REPRODUCE THIS MATERIAL HAS BEEN GRANTED BY 614 Ti). Ili/P161 TO THE EDUCATIONAL RESOURCES INFORMATION CENTER (ERIC)" Implementing Variable Selection Techniques in Regression

Journal ArticleDOI
TL;DR: In this article, a nonparametric algorithm is proposed for the selection of variables in allocatory discriminant analysis, which is based on the ability to reuse calculations for the inverse of a nonsingular matrix.
Abstract: A computationally efficient nonparametric algorithm is proposed for the selection of variables in allocatory discriminant analysis. The efficiency of the algorithm derives from an ability to reuse calculations for the inverse of a nonsingular matrix. A subset of the original variables is found for which the leave-one-out estimate of the conditional probability of misclassification is never significantly greater than the estimated conditional probability of misclassification for the full set of predicate variables.

Book ChapterDOI
01 Jan 1990
TL;DR: The problem of variable selection in principal component analysis (PCA) has been studied by several authors as discussed by the authors, but as yet no selection procedures are found in the classical statistical computer softwares.
Abstract: The problem of variable selection, in Principal Component Analysis (PCA) has been studied by several authors [1] but as yet, no selection procedures are found in the classical statistical computer softwares. Such selection procedures are found, on the other hand, for linear regression or discriminant analysis because the selection criteria are based on well known quantities such as the multiple correlation coefficient or the average prediction error.

Proceedings ArticleDOI
23 May 1990
TL;DR: In this paper, a variable selection procedure maximizes the dominance of the subsystem of the selected variables, hence justifying the proposed selection, and prior model reduction can significantly improve variable selection since it eliminates the negative effects of inappropriate modelling.
Abstract: The procedure for selecting input and output variables, proposed by Keller and Bonvin [4] is analysed using the principle of internal dominance proposed by Moore [6]. Using that principle, a dominance condition for the subsystem of the selected variables over the subsystem of the neglected variables is derived. It can be shown that the proposed variable selection procedure maximizes the dominance of the subsystem of the selected variables, hence justifying the proposed selection. The investigation further shows that prior model reduction can significantly improve variable selection since it eliminates the negative effects of inappropriate modelling.

Proceedings ArticleDOI
11 Mar 1990
TL;DR: A new feature selection approach is presented for using parallel distributed processing to identify a three-dimensional object from a two-dimensional image recorded at an arbitrary viewing angle and range and compares favorably with established feature selection approaches.
Abstract: A new feature selection approach is presented for using parallel distributed processing to identify a three-dimensional object from a two-dimensional image recorded at an arbitrary viewing angle and range. One vector of 32 feature variables is used to describe a two-dimensional binary image. The feature variables are based on counts of nearest neighbor conjuncts, which reflect shape and area differences among airplanes. Thirteen standardized airplanes are used in the experiment in order to compare the results with established feature selection approaches. Results based on the new approach compare favorably with results from traditional approaches. In addition, a relatively fast compact parallel hardware design and data structure are presented and compared with traditional algorithms. >

Journal ArticleDOI
TL;DR: A ranking scheme for features based on a feature’s calculated “performance potential” is outlined, made up of a number of performance measures: extraction time, memory requirements, variance, covariance and classification success.
Abstract: Feature selection is an important phase in most pattern recognition problems, especially when the space of the extracted features is very large. Feature selection methods attempt to reduce the feature space to satisfy certain objectives. We propose the concept of defining a performance potential as a measure of the effectiveness of the set of selected features. This paper begins by outlining a ranking scheme for features based on a feature’s calculated “performance potential”. The performance potential is made up of a number of performance measures: extraction time, memory requirements, variance, covariance and classification success. An adaptive scheme is proposed to process a number of initial features and arrive at the “best” subset based on their performance potential. The approach is applied to a texture analysis problem. The results of the testing of the approach point to conclusions concerning its effectiveness.

01 Mar 1990
TL;DR: The purpose of this thesis was to revise and extend the capabilities of existing software for selecting the significant control variables of a simulation model, based on a newly developed selection criterion, and validate the effectiveness of the new selection criterion by comparison to results derived using other common selection criteria.
Abstract: : The purpose of this thesis was three-fold. The first purpose was to revise and extend the capabilities of existing software for selecting the significant control variables of a simulation model, based on a newly developed selection criterion. The second purpose was to compare the results obtained using the revised software employing two different selection procedures. And the third purpose was then to validate the effectiveness of the new selection criterion by comparison to results derived using other common selection criteria. Extensive revision of the existing software was necessary to prepare it for use. Initially, the software was revised to extend its adaptability to evaluating new data and to increase user friendliness. Next, a new procedure was added to the software to permit it to evaluate data using a Stepwise (Forward Selection) procedure. Previously, the software only performed analysis of the data through an Enumerated Subsets approach. After revision of the software was complete, it was renamed the Variable Subset Selection Program (VSSP). Once the VSSP was ready, it was used to evaluate two types of data. The first type of data was created using a known stochastic structure. Three sets of this data was used, each set using a different covariance structure between the responses and control variables. The second type of data was created from an untested simulation model.

Proceedings ArticleDOI
03 Apr 1990
TL;DR: A procedure for feature selection in isolated word recognition is discussed and the speech recognition results show a significant improvement in the recognition performance with a digit database and the confusable E-set.
Abstract: A procedure for feature selection in isolated word recognition is discussed. The feature selection is performed in two steps. The first step takes into account the temporal correlation among feature vectors in order to obtain a transformation matrix which projects the initial template of N feature vectors to a new space where they are uncorrelated. This step gives a new template of M feature vectors, where M >

Proceedings ArticleDOI
01 Mar 1990
TL;DR: This paper describes a system which has been developed based on conjecture that humans learn to recognize objects by incrementally modifying and extending an internal representation based on the characteristics which distinguish objects from the rest of the environment.
Abstract: It is conjectured that humans learn to recognize objects by incrementally modifying and extending an internal representation based on the characteristics which distinguish objects from the rest of the environment. As new objects are encountered, it is often required to recall similar yet distinct objects and determine what differentiates the new objects from the old. Sometimes all that is required is to refine the allowable range for a particular feature, i.e., use a higher level of precision. Other times a previously useful feature must be discarded for a more powerful one in order to perform efficient recognition. This paper describes a system which has been developed based on this conjecture. Initially, a decision tree is created for recognizing a set of training objects using an automatically selected subset of extractable features. Factors involved in the creation include the cost of extraction and comparison of features, their discriminating strength within the domain, and the stability or separability of classes of objects using the features. The system then allows incremental, local modification of the tree to accommodate new objects or instances of old objects which were incorrectly identified. Various strategies for tree modification have been implemented, some of which guarantee the correct recognition of objects previously recognized and others which require some degree of retraining to maintain perfect recollection. Strategy selection is based on the technique which minimizes a metric based on the increased cost and complexity of the tree and the potential decrease in the stability of recognition.

01 Oct 1990
TL;DR: The PNN method has been successfully used to distinguish amongst the resonant sounds of five thin metal gongs of different regular shapes having the same areas and thicknesses and can be usefully generalised to other similar classification problems.
Abstract: This paper describes a general multiclass classification algorithm called the Probabilistic Neural Network (PNN) (Specht, 1988). Its decision surfaces approach the Bayes- optimal boundaries by non-parametric probability density function (PDF) estimation as the number of training samples grow. Theoretical and practical aspects of the PNN classification method are discussed as well as its advanatages and disadvantages in comparison to the Backpropagation Network (BN). The algorithm has been implemented primarily as a research tool for feature selection and classification for general Pattern Recognition (PR) problems. It will also be used for classifier-directed signal processing tasks. The method has been successfully used to distinguish amongst the resonant sounds of five thin metal gongs of different regular shapes having the same areas and thicknesses. This example application includes a description of data sampling, primary analysis, feature selection and classification which can be usefully generalised to other similar classification problems. Some other PNN applications investigated by the author and others mentioned in the literature are described to show some of the PNN''sfeatures and uses. These include a Bayes-optimal maximum liklehood signal detector, ship hull classification from sonar signals and electrocardiogram classifications from QRX complexes.

Proceedings ArticleDOI
05 Feb 1990
TL;DR: The Kittler-Young (K-Y) transform is a nonparametric method for feature extraction that ensures the information of class variances and mean squares are utilized optimally in feature selection.
Abstract: The Kittler-Young (K-Y) transform is a nonparametric method for feature extraction. The important property of the K-Y transform is that the information of class variances and mean squares are utilized optimally in feature selection. A joint transform correlator (JTC) is used to extract the features of the K-Y transform from input images optically. Making use of these features, classifications are performed on a microcomputer.

Journal ArticleDOI
TL;DR: A new method is proposed for determining feature selection ordering and a stopping rule which focuses on the remaining patterns at each stage, and which maximizes the value of mutual information between the user's responses and the required pattern.
Abstract: In designing systems with a human-computer interface with a minimum number of interactions, there are two issues to consider: the determination of a proper sequence of questions by the user, and proper termination by the computer system, based on previous instructions. In short, these issues are those of feature selection ordering and a stopping rule for pattern recognition processes. Conventional treatments of these problems have been investigated from the viewpoint of an average error rate. When the number of patterns is large, however, and the number of instructions to be terminated is large, from the point of view of user interface efficiency, and the average error rate is not an effective indicator at the intermediate stage. In this paper, a new method is proposed for determining feature selection ordering and a stopping rule which focuses on the remaining patterns at each stage, and which maximizes the value of mutual information between the user's responses and the required pattern. Important properties associated with this scheme have been demonstrated while evaluating its performance via computer simulation. One is that the average information gain at each stage decreases monotonically, and another is that this scheme produces the minimum error rate.

Journal ArticleDOI
TL;DR: The goal in this work has been to use variables’ contribution to clustering tendency to distinguish those that contribute to clusters from those variables that do not, and to choose the smallest subsets of variables that will support clustering.
Abstract: Methods for feature selection in cluster analysis are not yet well established, although research has demonstrated clearly that extraneous descriptors can mask natural clusters in data. The goal in...