scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 1988"


Journal ArticleDOI
TL;DR: In this paper, a review of feature selection for multidimensional pattern classification is presented, and the potential benefits of Monte Carlo approaches such as simulated annealing and genetic algorithms are compared.
Abstract: We review recent research on methods for selecting features for multidimensional pattern classification. These methods include nonmonotonicity-tolerant branch-and-bound search and beam search. We describe the potential benefits of Monte Carlo approaches such as simulated annealing and genetic algorithms. We compare these methods to facilitate the planning of future research on feature selection.

366 citations



Proceedings ArticleDOI
14 Nov 1988
TL;DR: It is proved that this algorithm can guarantee the globally best subset without exhaustive enumeration for any criterion that satisfies monotonicity.
Abstract: A heuristic search strategy taken from the field of artificial intelligence is applied to feature selection. An algorithm called BFF for feature selection is proposed. It is proved that this algorithm can guarantee the globally best subset without exhaustive enumeration for any criterion that satisfies monotonicity. It is shown that the number of subsets evaluated by BFF is less (even much less) than that needed by the branch and bound algorithm, an optimal feature selection algorithm proposed by P.M. Marendra and K. Funkunaga (1977). >

101 citations


Journal ArticleDOI
TL;DR: A computable measure was developed that can be used to discriminate between attributes on the basis of their potential value in the formation of decision rules by the inductive learning process and a significant reduction in the number of attributes to be considered was achieved for a complex medical domain.
Abstract: A computable measure was developed that can be used to discriminate between attributes on the basis of their potential value in the formation of decision rules by the inductive learning process. This relevance measure is the product of extensions to an information-theoretic foundation that address the particular characteristics of a class of inductive learning algorithms. The measure is also conceptually compatible with approaches from pattern recognition. It is described in the context of a generalized model of the expertise development process, and an experiment is presented in which a significant reduction in the number of attributes to be considered was achieved for a complex medical domain. >

90 citations


Journal ArticleDOI
TL;DR: In this paper, a criterion is developed to distinguish significant from nonsignificant variables for use in LP models based on the “jackknife” methodology.
Abstract: There are numerous variable selection rules in classical discriminant analysis. These rules enable a researcher to distinguish significant variables from nonsignificant ones and thus provide a parsimonious classification model based solely on significant variables. Prominent among such rules are the forward and backward stepwise variable selection criteria employed in statistical software packages such as Statistical Package for the Social Sciences and BMDP Statistical Software. No such criterion currently exists for linear programming (LP) approaches to discriminant analysis. In this paper, a criterion is developed to distinguish significant from nonsignificant variables for use in LP models. This criterion is based on the “jackknife” methodology. Examples are presented to illustrate implementation of the proposed criterion.

25 citations


Journal ArticleDOI
TL;DR: In this article, the statistical implications of using model selection criteria in the pure noise case, the asymptotic properties of alternative variable selection procedures and an abbreviated computational solution are investigated.

24 citations


Journal ArticleDOI
TL;DR: In this article, a Monte Carlo study was conducted to examine the performance of several strategies for estimating the squared cross-validity coefficient of a sample regres sion equation in the context of best subset regression, and the results indicated that sample size may play a much greater role in validity estimation than is true in situations where selection has not occurred.
Abstract: A monte carlo study was conducted to examine the performance of several strategies for estimating the squared cross-validity coefficient of a sample regres sion equation in the context of best subset regression. Data were simulated for populations and experimental designs likely to be encountered in practice. The re sults indicated that a formula presented by Stein (1960) could be expected to yield estimates as good as or better than cross-validation, or several other for mula estimators, for the populations considered. Fur ther, the results suggest that sample size may play a much greater role in validity estimation in subset se lection than is true in situations where selection has not occurred. Index terms: Best subset regression, Cross-validity coefficient, Multiple regression, Predic tive validity, Variable selection.

24 citations


Journal ArticleDOI
TL;DR: In this paper, a set of data arising from an attitudinal investigation of meat products was used for variable selection in principal component analysis (PCA) and generalized Procrustes analysis (GPA).
Abstract: . In many attitudinal investigations, particularly those involving free-choice profiling, a very large list of variables or features can emerge. Ordination using generalized Procrustes analysis provides a common base for comparing assessors, but the derived configurations are often high-dimensional and difficult to summarize. This problem can be rectified by selecting a small subset of the original set of variables. Methods of variable selection in principal component analysis can be adapted easily for such purposes, but there is no guarantee with these methods that overall data structure is preserved. A recently introduced variable selection procedure that does aim to preserve the data structure as much as possible would seem to be more appropriate. All methods are described and applied to a set of data arising from an attitudinal investigation of meat products. The results indicate that variable selection should be more widely encouraged.

19 citations


Proceedings ArticleDOI
08 Aug 1988
TL;DR: This paper proposes a heuristic-statistical criterion symmetrical /spl tau/.
Abstract: Many machine learning methods have been developed for constructing decision trees from collections of examples. When they are applied to complicated real-world problems they often suffer from difficulties of coping with multi-valued or continuous features and noisy or conflicting data. To cope with these difficulties, a key issue is a powerful feature- selection criterion. After a brief review of the main existing criteria, this paper proposes a heuristic-statistical criterion symmetrical /spl tau/. This overcomes a number of the weaknesses of previous feature selection methods. Illustrative examples are presented.

14 citations



Proceedings ArticleDOI
W.E. Blanz1
14 Nov 1988
TL;DR: A method of feature selection is presented that has linear computational complexity, and ways are shown how to use it when the type of the probability density function is unknown.
Abstract: A method of feature selection is presented that has linear computational complexity, and ways are shown how to use it when the type of the probability density function is unknown. There is no claim that the procedures for nonparametric probability density function estimation are applicable to any thinkable distribution, but the lower bound for the classifier performance estimation, makes the presented measure applicable in most practical cases. Simulations with synthetic test data as well as references to applications with real-world data demonstrate the applicability of the measure discussed. >

01 Aug 1988
TL;DR: It was shown that the infinite clipped versions of the first 16 optimal features had excellent classification performance and the overall probability of correct classification is over 90 percent while providing for a reduced downlink data rate by a factor of 10.
Abstract: The High resolution Imaging Spectrometer (HIRIS) is designed to acquire images simultaneously in 192 spectral bands in the 0.4 to 2.5 micrometers wavelength region. It will make possible the collection of essentially continuous reflectance spectra at a spectral resolution sufficient to extract significantly enhanced amounts of information from return signals as compared to existing systems. The advantages of such high dimensional data come at a cost of increased system and data complexity. For example, since the finer the spectral resolution, the higher the data rate, it becomes impractical to design the sensor to be operated continuously. It is essential to find new ways to preprocess the data which reduce the data rate while at the same time maintaining the information content of the high dimensional signal produced. Four spectral feature design techniques are developed from the Weighted Karhunen-Loeve Transforms: (1) non-overlapping band feature selection algorithm; (2) overlapping band feature selection algorithm; (3) Walsh function approach; and (4) infinite clipped optimal function approach. The infinite clipped optimal function approach is chosen since the features are easiest to find and their classification performance is the best. After the preprocessed data has been received at the ground station, canonical analysis is further used to find the best set of features under the criterion that maximal class separability is achieved. Both 100 dimensional vegetation data and 200 dimensional soil data were used to test the spectral feature design system. It was shown that the infinite clipped versions of the first 16 optimal features had excellent classification performance. The overall probability of correct classification is over 90 percent while providing for a reduced downlink data rate by a factor of 10.

Proceedings ArticleDOI
08 Aug 1988
TL;DR: In this paper, dynamic stability assessment of electrical power systems is formulated as a pattern recognition problem so that it is fast and accurate enough for on-line applications.
Abstract: In this paper, dynamic stability assessment of electrical power systems is formulated as a pattern recognition problem so that it is fast and accurate enough for on-line applications. The problem concerned is the potentially hazardous dynamic instability of the interconnected power system between Hong Kong and China. New techniques, such as the Branch and Bound Search, are employed to achieve optimality in Feature Selection. In Feature Extraction where further dimensionality reduction is made, matrix augmentation is used to extract information content in class-centralised vectors as well as in class-mean vectors. Finally, for improvement of classification efficiency, an innovative approach making use of partitioning and decision tree search is pursued to tackle the non- linearly separable patterns which exist uniquely in the study of dynamic stability.

Journal Article
TL;DR: In this paper, a new technique based on the fuzzy integral is introduced, which combines objective evidence with the importance of that feature set for recognition purposes, in order to deal with uncertainty in class definition.

Journal ArticleDOI
TL;DR: In this paper, three successive feature reduction methods are employed to select good features for the automatic visual inspection of solder joints and the remaining 11 feature are tested and shown to be superior to state-of-the-art identification methods.
Abstract: In this paper, three successive feature reduction methods are employed to select good features for the automatic visual inspection of solder joints. This reduction strategy includes (1) a stability test to remove the features with unstable performance, (2) a separability examination to select the features with good classification capabilities, (3) a correlation analysis to delete the redundant features. Three sets of features are implemented in this feature reduction work: (1) a circular sub-area feature set is related to the intensity conditions within distinct areas in the joint image, (2) a moment of inertia feature set is based upon the intensity of pixels and their relative position in the image plane, (3) a surface curvature feature set analyses the three-dimensional joint topology. Initially 50 features are formulated based on the above strategy. The reduction technique deletes 39 features from this set because of instability, poor performance, and high correlation with other features. Finally, the remaining 11 feature are tested and shown to be superior to state-of-the-art identification methods.

Proceedings ArticleDOI
01 Jan 1988
TL;DR: A principal feature of this method is the synergistic interaction between the tracker, the segmenter and the classifier to eliminate uninteresting objects, to improve estimates of the background and noise, to guide threshold selection, and to influence feature selection for classification.
Abstract: This paper describes a method of detecting, tracking and identifying moving objects in video scenes. The method is based on an adaptive change detector which detects and tracks moving objects and extracts silhouettes of the objects from the background so they can be classified by shape. The adaptive change detector uses estimates of image noise and contrast to dynamically adjust decision thresholds. A principal feature of this method is the synergistic interaction between the tracker, the segmenter and the classifier to eliminate uninteresting objects, to improve estimates of the background and noise, to guide threshold selection, and to influence feature selection for classification.

Book ChapterDOI
TL;DR: Discriminant analysis based on a set of learning objects that is not an aselective random sample of the universe of objects is discussed and the use of fuzzy labels is argued.
Abstract: Discriminant analysis based on a set of learning objects that is not an aselective random sample of the universe of objects is discussed. The use of fuzzy labels is argued. Possibilities for evaluation and feature selection are investigated.

Proceedings ArticleDOI
23 May 1988
TL;DR: A bit-mapped classifier system is described as a simple, effective method of distinguishing among classifiers based on descriptive features, and a discussion on sequential acquisition information is included to show the value of an expert system that can both forward and backward chain.
Abstract: A bit-mapped classifier system is described as a simple, effective method of distinguishing among classifiers based on descriptive features. Simple knowledge-representation techniques are used to encode a knowledge base in the domain of product performance agreements (warranties). The user interface is a simple display of features requiring a yes or no answer to indicate applicability. Feature selection is presented that ensures that each classifier can be distinguished from the others. A scoring routine based on features absent, features present, and 'don't care' track the classifiers that best fit the specified description. After starting in a data-driven mode (forward chaining), the system switches to an hypothesis-driven mode and thus economizes on the acquisition of information. A discussion on sequential acquisition information is included to show the value of an expert system that can both forward and backward chain. The system establishes a threshold score for each classifier so that when sufficient points are accumulated, the winning classifier will be declared. An explanation of the selection is given by listing the features that the user designated. >

Book ChapterDOI
28 Mar 1988
TL;DR: A new technique based on the fuzzy integral is introduced which combines objective evidence with the importance of that feature set for recognition purposes and attempts to use the strongest measurements first in the object classification.
Abstract: Dealing with uncertainty is a common problem in pattern recognition. Rarely do object descriptions from different classes fall into totally disjoint regions of feature space. This uncertainty in class definition can be handled in several ways. In this paper we present several approaches to the incorporation of fuzzy set information into pattern recognition. We then introduce a new technique based on the fuzzy integral which combines objective evidence with the importance of that feature set for recognition purposes. In effect, the fuzzy integral performs a local feature selection, in that it attempts to use the strongest measurements first in the object classification. Algorithm performance is illustrated on real and synthetic data sets.

Journal ArticleDOI
J. Bigham1
TL;DR: A new algorithm for the inductive inference of pattern recognition rules from a class of examples and another class of counter-examples is given, which is an extension to that of Michalski and incorporates ideas from fuzzy set theory.
Abstract: A new algorithm for the inductive inference of pattern recognition rules from a class of examples and another class of counter-examples is given. An important feature of the algorithm is that a discriminant for the class of examples is generated in a format which allows immediate interpretation into the language of the application domain expert. The approach is an extension to that of Michalski and incorporates ideas from fuzzy set theory. It is aimed at applications where the observable variables are such that the sample space of each class overlaps. In such cases an absolutely complete and consistent descriptor of the class of examples can be overlay complicated or even impossible. By allowing suitable classification errors simpler descriptors can be constructed. The heart of the algorithm involves generalisation of examples of the concept using the counter-examples to limit the degree of generalisation. Generalisation procedures for nominal, ordinal and hierarchical variable types are described, and procedures for generating linguistic descriptors of the class of examples at different levels of generality are given. Because the search tree involved in the algorithm is potentially very large, a form of utility function is constructed which allows a branch and bound approach to pruning the search tree and to feature selection. The algorithm has been tested and validated on a complex data set consisting of observations on 1000 head-injured patients. A particular feature of this data set is the large proportion of missing values. This data set was also of special interest, since it has been used by others as a vehicle for comparing several statistical discrimination techniques. The proposed method for transformation to linguistic output is also demonstrated for this data set.

01 Jan 1988
TL;DR: A heuristic-statistical criterion developed from a measure of association used in the statistics area namely the Goodman-Kruskal's asymmetrical 2 (tau) overcomes a number of the weaknesses of previous feature selection methods.
Abstract: Many machine learning methods have been developed for constructing decision trees from collections ,of examples. When they are applied to complicated real-world problems they often suffer from difficulties of coping with multi-valued or continuous features and noisy or conflicting data. To cope with these difficulties, a key issue is a powerful featureselection criterion. After a brief review of the main existing criteria, this paper proposes a heuristic-statistical criterion symmetrical 2 (tau) developed from a measure of association used in the statistics area namely the Goodman-Kruskal's asymmetrical 2 . This overcomes a number of the weaknesses of previous feature selection methods. Illustrative examples are presented.

Proceedings ArticleDOI
25 Oct 1988
TL;DR: A new practicable statistical approach to goodness-of-fit testing which is based on the notion of sufficiency and provides an unified efficient approach to the problem of test construction in the presence of nuisance parameters is proposed.
Abstract: The objective of this paper is to focus attention on a new practicable statistical approach to goodness-of-fit testing which is based on the notion of sufficiency and. provides an unified efficient approach to the problem of test construction in the presence of nuisance parameters. The general strategy of the above approach is to transform a set of random variables into a smaller set of independently and identically distributed uniform random variables on the interval (0,1)-i.i.d. U(0,1) under the null hypothesis HO. Under the alternative hypothesis this set of rv's will, in general, not be i.i.d. U(0,1). In other words, we replace the composite hypotheses by equivalent simple ones. Any statistic which measures a distance from uniformity in the transformed sample can be used as a test statistic. For instance, for this situation standard procedures of goodness-of-fit testing such as those based on Kolmogorov-Smirnov and Cramervon Mises statistics can be used. The obtained results are applicable to feature selection and pattern recognition. According to proposed approach, the best subset of feature measurements is the subset which maximizes the likelihood function of statistic that measures a distance from uniformity in the transformed sample. For the sake of illustrations the examples are given.


Proceedings ArticleDOI
14 Nov 1988
TL;DR: For the case of a large character set, a novel definition of separability of classes is proposed that can be used to evaluate error bounds and to choose a subset of features with minimum loss of information.
Abstract: Based on the reduction of entropy, the analysis of the estimation of error bounds is discussed. This estimation is simple in computation and is associated with a scatter matrix of classes. For the case of a large character set, a novel definition of separability of classes is proposed. It is simple in computation and can be used to evaluate error bounds and to choose a subset of features with minimum loss of information. >

Book ChapterDOI
04 Jul 1988
TL;DR: The phase of the study reported here addresses the problems and successes of initial efforts towards a practical application of neural network computational concepts with the main focus on those approaches which have been found to be particularly effective in obtaining adequate estimates of effective linear functions for classification of binary patterns.
Abstract: Estimates of connection strengths are obtained to derive the weights used in building a linear classifier of complex binary patterns. Use of bootstrap resampling methods permits small, large and highly disparate-in-size training sets to be utilized with equal ease. Using ideas suggested by a method proposed for learning in neural net systems [Hopfield 1982] [Cruz-Young et al 1986], it is now possible to obtain connection values without prohibitive or arbitrarily terminated computations. The weight values so derived are identified as equivalent with those at the middle layer of three-layer neural net models. The model serving as the springboard for this study was developed by Kanerva [1986–87], and functions as a distributed sparse memory or DSM. In the various implementations of connection-based memories, both linear and nonlinear relationships within patterns play a role in the classification process. Experimental data used in this research were derived from psychological profiles which result from the coding of responses elicited by question-like items. Traditionally, such items are usually designed and chosen to be linear predictors of class membership or performance. The phase of the study reported here addresses the problems and successes of initial efforts towards a practical application of neural network computational concepts. The main focus is on those approaches which have been found to be particularly effective in obtaining adequate estimates of effective linear functions for classification of binary patterns with a number of hits ranging from 64 to 256.

Proceedings ArticleDOI
14 Nov 1988
TL;DR: A statistical method is suggested for data classification using the interclass correlation coefficient (ICC) statistics as a criterion function.
Abstract: A statistical method is suggested for data classification using the interclass correlation coefficient (ICC) statistics as a criterion function. The theory of the statistics is represented and its characteristics has been investigated. An algorithm for data classification using ICC is also described. >


01 Jan 1988
TL;DR: The homogeneity coefficient *L is some L, -type criterion which can be asuring the degree of linear dependence among some measurements as mentioned in this paper and properties of the criterion 9% are analgsed in the paper.
Abstract: The homogeneity coefficient *L is some L, -type criterion which can be asuring the degree of linear dependence among some measurements. Properties of the criterion 9% are analgsed in the paper. Some feature selection procedure based on the coefficient yk is described. used in me

01 Jan 1988
TL;DR: In this paper, a dynamic stabili ty assessment of electrical power systems is formulated as a pattern recognition problem so that it is fast and aCCllrate enough for on-line applications.
Abstract: In this paper, dynamic stabili ty assessment of electrical power systems is formulated as a pattern recognition problem so that it is fast and aCCllrate enough for on-line applications. The problem concerned is the potentially hazardous dynamic instability of the interconnected power system between Hong Kong and China. New techniques, such as the Branch and Bound Search, are employed to achieve optimality in Feature Selection. In .Feature Extraction where further dimensionality reduction is made, matrix augmentation is used to extract

Proceedings ArticleDOI
Jakub Segen1
14 Nov 1988
TL;DR: A clustering technique for binary vectors is described that permits the overlapping of clusters (clumping) and selects for each cluster a subset of relevant features, which represents a tradeoff between the fit of data to clusters and the simplicity of cluster configuration.
Abstract: A clustering technique for binary vectors is described that permits the overlapping of clusters (clumping) and selects for each cluster a subset of relevant features. This technique does not require the user to set any parameters (e.g. number of clusters, degree of overlap). The clustering problem is rigorously defined as that of the minimization of a cost function. This preference criterion called the 'minimal representation criterion', represents a tradeoff between the fit of data to clusters and the simplicity of cluster configuration and may be considered a quantitative Occam's razor. The central component of the presented method is an iterative algorithm that converges to a local minimum of the cost function. >