scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 1994"


Proceedings ArticleDOI
21 Jun 1994
TL;DR: A feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world are proposed.
Abstract: No feature-based vision system can work unless good features can be identified and tracked from frame to frame. Although tracking itself is by and large a solved problem, selecting features that can be tracked well and correspond to physical points in the world is still hard. We propose a feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world. These methods are based on a new tracking algorithm that extends previous Newton-Raphson style search methods to work under affine image transformations. We test performance with several simulations and experiments. >

8,432 citations


Journal ArticleDOI
TL;DR: Sequential search methods characterized by a dynamically changing number of features included or eliminated at each step, henceforth "floating" methods, are presented and are shown to give very good results and to be computationally more effective than the branch and bound method.

3,104 citations


ReportDOI
01 Nov 1994
TL;DR: This paper describes the problem of selecting rele- vant features for use in machine learning in terms of heuristic search through a space of feature sets, and identifies four dimensions along which approaches to the problem can vary.
Abstract: In this paper, we review the problem of selecting rele- vant features for use in machine learning. We describe this problem in terms of heuristic search through a space of feature sets, and we identify four dimensions along which approaches to the problem can vary. We consider recent work on feature selection in terms of this framework, then close with some challenges for future work in the area. 1. The Problem of Irrelevant Features accuracy) to grow slowly with the number of irrele- vant attributes. Theoretical results for algorithms that search restricted hypothesis spaces are encouraging. For instance, the worst-case number of errors made by Littlestone's (1987) WINNOW method grows only logarithmically with the number of irrelevant features. Pazzani and Sarrett's (1992) average-case analysis for WHOLIST, a simple conjunctive algorithm, and Lang- ley and Iba's (1993) treatment of the naive Bayesian classifier, suggest that their sample complexities grow at most linearly with the number of irrelevant features. However, the theoretical results are less optimistic for induction methods that search a larger space of concept descriptions. For example, Langley and Iba's (1993) average-case analysis of simple nearest neighbor indicates that its sample complexity grows exponen- tially with the number of irrelevant attributes, even for conjunctive target concepts. Experimental stud- ies of nearest neighbor are consistent with this conclu- sion, and other experiments suggest that similar results hold even for induction algorithms that explicitly se- lect features. For example, the sample complexity for decision-tree methods appears to grow linearly with the number of irrelevants for conjunctive concepts, but exponentially for parity concepts, since the evaluation metric cannot distinguish relevant from irrelevant fea- tures in the latter situation (Langley & Sage, in press). Results of this sort have encouraged machine learn- ing researchers to explore more sophisticated methods for selecting relevant features. In the sections that fol- low, we present a general framework for this task, and then consider some recent examples of work on this important problem.

735 citations


Book ChapterDOI
10 Jul 1994
TL;DR: Experiments suggest hillclimbing in attribute space can yield substantial improvements in generalization performance, and a caching scheme is presented that makes attribute hillClimbing more practical computationally.
Abstract: Many real-world domains bless us with a wealth of attributes to use for learning. This blessing is often a curse: most inductive methods generalize worse given too many attributes than if given a good subset of those attributes. We examine this problem for two learning tasks taken from a calendar scheduling domain. We show that ID3/C4.5 generalizes poorly on these tasks if allowed to use all available attributes. We examine five greedy hillclimbing procedures that search for attribute sets that generalize well with ID3/C4.5. Experiments suggest hillclimbing in attribute space can yield substantial improvements in generalization performance. We present a caching scheme that makes attribute hillclimbing more practical computationally. We also compare the results of hillclimbing in attribute space with FOCUS and RELIEF on the two tasks.

572 citations


Book ChapterDOI
10 Jul 1994
TL;DR: On four datasets, it is shown that only three or four prototypes sufficed to give predictive accuracy equal or superior to a basic nearest neighbor algorithm whose run-time storage costs were approximately 10 to 200 times greater.
Abstract: With the goal of reducing computational costs without sacrificing accuracy, we describe two algorithms to find sets of prototypes for nearest neighbor classification. Here, the term “prototypes” refers to the reference instances used in a nearest neighbor computation — the instances with respect to which similarity is assessed in order to assign a class to a new data item. Both algorithms rely on stochastic techniques to search the space of sets of prototypes and are simple to implement. The first is a Monte Carlo sampling algorithm; the second applies random mutation hill climbing. On four datasets we show that only three or four prototypes sufficed to give predictive accuracy equal or superior to a basic nearest neighbor algorithm whose run-time storage costs were approximately 10 to 200 times greater. We briefly investigate how random mutation hill climbing may be applied to select features and prototypes simultaneously. Finally, we explain the performance of the sampling algorithm on these datasets in terms of a statistical measure of the extent of clustering displayed by the target classes.

510 citations


Proceedings ArticleDOI
09 Oct 1994
TL;DR: The recently developed "floating search" algorithms are presented and modified to a more compact form facilitating their direct comparison with the well known (l,r) search.
Abstract: In this paper the recently developed "floating search" algorithms are presented and modified to a more compact form facilitating their direct comparison with the well known (l,r) search. The properties of the floating search methods are investigated, especially with respect to their tolerance to nonmonotonic criteria. Their computational efficiency is demonstrated by results on real data of high dimensionality.

368 citations


Journal ArticleDOI
TL;DR: A comparison of the results for the Selwood data set with those obtained by other groups shows that more relevant models are derived by the evolutionary approach than by other methods.
Abstract: In QSAR studies of large data sets, variable selection and model building is a difficult, time-consuming and ambiguous procedure. While most often stepwise regression procedures are applied for this purpose, other strategies, like neural networks, cluster significance analysis or genetic algorithms have been used. A simple and efficient evolutionary strategy, including iterative mutation and selection, but avoiding crossover of regression models, is described in this work. The MUSEUM (Mutation and Selection Uncover Models) algorithm starts from a model containing any number of randomly chosen variables. Random mutation, first by addition or elimination of only one or very few variables, afterwards by simultaneous random additions, eliminations and/or exchanges of several variables at a time, leads to new models which are evaluated by an appropriate fitness function. In contrast to common genetic algorithm procedures, only the “fittest” model is stored and used for further mutation and selection, leading to better and better models. In the last steps of mutation, all variables inside the model are eliminated and all variables outside the model are added, one by one, to control whether this systematic strategy detects any mutation which still improves the model. After every generation of a better model, a new random mutation procedure starts from this model. In the very last step, variables not significant at the 95% level are eliminated, starting with the least significant variable. In this manner, “stable” models are produced, containing only significant variables. A comparison of the results for the Selwood data set (n = 31 compounds, k = 53 variables) with those obtained by other groups shows that more relevant models are derived by the evolutionary approach than by other methods.

259 citations


Book ChapterDOI
TL;DR: In this paper, the authors compared the performance of the (l, r) search algorithm with the genetic approach to feature subset search in high-dimensional spaces and found that the properties inferred for these techniques from medium scale experiments involving up to a few tens of dimensions extend to dimensionalities of one order of magnitude higher.
Abstract: The combinatorial search problem arising in feature selection in high dimensional spaces is considered. Recently developed techniques based on the classical sequential methods and the (l, r) search called Floating search algorithms are compared against the Genetic approach to feature subset search. Both approaches have been designed with the view to give a good compromise between efficiency and effectiveness for large problems. The purpose of this paper is to investigate the applicability of these techniques to high dimensional problems of feature selection. The aim is to establish whether the properties inferred for these techniques from medium scale experiments involving up to a few tens of dimensions extend to dimensionalities of one order of magnitude higher. Further, relative merits of these techniques vis-a-vis such high dimensional problems are explored and the possibility of exploiting the best aspects of these methods to create a composite feature selection procedure with superior properties is considered.

252 citations


Journal ArticleDOI
TL;DR: A modified PLS algorithm is introduced with the goal of achieving improved prediction ability, based on dimension‐wise selective reweighting of single elements in the PLS weight vector w that leads to rotation of the classical PLS solution.
Abstract: A modified PLS algorithm is introduced with the goal of achieving improved prediction ability The method, denoted IVS-PLS, is based on dimension-wise selective reweighting of single elements in the PLS weight vector w Cross-validation, a criterion for the estimation of predictive quality, is used for guiding the selection procedure in the modelling stage A threshold that controls the size of the selected values in w is put inside a cross-validation loop This loop is repeated for each dimension and the results are interpreted graphically The manipulation of w leads to rotation of the classical PLS solution The results of IVS-PLS are different from simply selecting X-variables prior to modelling The theory is explained and the algorithm is demonstrated for a simulated data set with 200 variables and 40 objects, representing a typical spectral calibration situation with four analytes Improvements of up to 70% in external PRESS over the classical PLS algorithm are shown to be possible

207 citations


Journal ArticleDOI
TL;DR: While the neural network method performed slightly better than the other two methods at the basic level, the inclusion of the variable selection principle led to similar performance indices for all three methods.

198 citations


Journal ArticleDOI
TL;DR: A development of a previous genetic algorithm is presented so that a full validation of the results can be obtained and this algorithm has been shown to perform very well also as an outlier detector, allowing easy identification of the presence of outliers even in cases where the ‘classical’ techniques fail.
Abstract: Genetic algorithms have been proved to be a very efficient method in the feature selection problem. However, as for every other method, if the validation of the results is performed in an incomplete way, erroneous conclusions can be drawn. In this paper a development of a previous genetic algorithm is presented so that a full validation of the results can be obtained. Furthermore, this algorithm has been shown to perform very well also as an outlier detector, allowing easy identification of the presence of outliers even in cases where the ‘classical’ techniques fail.

Journal ArticleDOI
TL;DR: A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees and it is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.
Abstract: A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.

01 Jan 1994
TL;DR: A set of such algorithms that use case-based classifiers, empirically compare them, and introduce novel extensions of backward sequential selection that allows it to scale to this task are described.
Abstract: Accurate weather prediction is crucial for many activities, including Naval operations. Researchers within the meteorological division of the Naval Research Laboratory have developed and fielded several expert systems for problems such as fog and turbulence forecasting, and tropical storm movement. They are currently developing an automated system for satellite image interpretation, part of which involves cloud classification. Their cloud classification database contains 204 high-level features, but contains only a few thousand instances. The predictive accuracy of classifiers can be improved on this task by employing a feature selection algorithm. We explain why non-parametric case-based classifiers are excellent choices for use in feature selection algorithms. We then describe a set of such algorithms that use case-based classifiers, empirically compare them, and introduce novel extensions of backward sequential selection that allows it to scale to this task. Several of the approaches we tested located feature subsets that attain significantly higher accuracies than those found in previously published research, and some did so with fewer features.

Journal ArticleDOI
TL;DR: A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees and it is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.
Abstract: A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.

Journal ArticleDOI
TL;DR: A novel stereo matching algorithm which integrates learning, feature selection, and surface reconstruction, and a self-diagnostic method for determining when apriori knowledge is necessary for finding the correct match is presented.
Abstract: We present a novel stereo matching algorithm which integrates learning, feature selection, and surface reconstruction. First, a new instance based learning (IBL) algorithm is used to generate an approximation to the optimal feature set for matching. In addition, the importance of two separate kinds of knowledge, image dependent knowledge and image independent knowledge, is discussed. Second, we develop an adaptive method for refining the feature set. This adaptive method analyzes the feature error to locate areas of the image that would lead to false matches. Then these areas are used to guide the search through feature space towards maximizing the class separation distance between the correct match and the false matches. Third, we introduce a self-diagnostic method for determining when apriori knowledge is necessary for finding the correct match. If the a priori knowledge is necessary then we use a surface reconstruction model to discriminate between match possibilities. Our algorithm is comprehensively tested against fixed feature set algorithms and against a traditional pyramid algorithm. Finally, we present and discuss extensive empirical results of our algorithm based on a large set of real images. >

Journal ArticleDOI
TL;DR: Viewing the analysis of designed experiments as a model selection problem, the use of a predictive Bayesian criterion based on the predictive density of a replicate experiment (PDRE) is introduced and compared with the usual F statistic for two nested models.
Abstract: Viewing the analysis of designed experiments as a model selection problem, we introduce the use of a predictive Bayesian criterion in this context based on the predictive density of a replicate experiment (PDRE). A calibration of the criterion is provided to assist in the model choice. The relationships of the proposed criterion to other prevalent criteria, such as AIC, BIC, and Mallows's C p , are given. An information theoretic criterion based on the PDRE's of two competing models is also introduced and compared with the usual F statistic for two nested models. Examples are given to illustrate the proposed methodology.

Journal ArticleDOI
TL;DR: Three experiments reported here employed synthetic data sets, constructed so as to have the precise properties required to test specific hypotheses, and showed that the performance decrement typical of random attribute selection increased with the number of available pure-noise attributes.
Abstract: Recent work by Mingers and by Buntine and Niblett on the performance of various attribute selection measures has addressed the topic of random selection of attributes in the construction of decision trees. This article is concerned with the mechanisms underlying the relative performance of conventional and random attribute selection measures. The three experiments reported here employed synthetic data sets, constructed so as to have the precise properties required to test specific hypotheses. The principal underlying idea was that the performance decrement typical of random attribute selection is due to two factors. First, there is a greater chance that informative attributes will be omitted from the subset selected for the final tree. Second, there is a greater risk of overfitting, which is caused by attributes of little or no value in discriminating between classes being “locked in” to the tree structure, near the root. The first experiment showed that the performance decrement increased with the number of available pure-noise attributes. The second experiment indicated that there was little decrement when all the attributes were of equal importance in discriminating between classes. The third experiment showed that a rather greater performance decrement (than in the second experiment) could be expected if the attributes were all informative, but to different degrees.

01 Jan 1994
TL;DR: This work formulate the search for a feature subset as an abstract search problem with probabilistic estimates, and shows how recent feature subset selection algorithms in the machine learning literature fit into this search problem as simple hill climbing approaches.
Abstract: Irrelevant features and weakly relevant features may reduce the comprehensibility and accuracy of concepts induced by supervised learning algorithms. We formulate the search for a feature subset as an abstract search problem with probabilistic estimates. Searching a space using an evaluation function that is a random variable requires trading off accuracy of estimates for increased state exploration. We show how recent feature subset selection algorithms in the machine learning literature fit into this search problem as simple hill climbing approaches, and conduct a small experiment using a best-first search technique.

Proceedings ArticleDOI
09 Oct 1994
TL;DR: A node-pruning procedure is presented to remove insalient nodes in the network to create a parsimonious network and the optimal/suboptimal subset of features are simultaneously selected by the network.
Abstract: Proposes a node saliency measure and a backpropagation type of algorithm to compute the node saliencies. A node-pruning procedure is then presented to remove insalient nodes in the network to create a parsimonious network. The optimal/suboptimal subset of features are simultaneously selected by the network. The performance of the proposed approach for feature selection is compared with Whitney's feature selection method. One advantage of the node-pruning procedure over classical feature selection methods is that the node-pruning procedure can simultaneously "optimize" both the feature set and the classifier, while classical feature selection methods select the "best" subset of features with respect to a fixed classifier.

Proceedings ArticleDOI
01 Jan 1994
TL;DR: Two feature selection techniques and a multilayer perceptron (MLP) neural network (NN) have been used in this study for human chromosome classification and the "knock-out" algorithm emphasized the significance of the centrometric index and of the chromosome length, as features in chromosome classification.
Abstract: Two feature selection techniques and a multilayer perceptron (MLP) neural network (NN) have been used in this study for human chromosome classification The first technique is the "knock-out" algorithm and the second is the principal component analysis (PCA) The "knock-out" algorithm emphasized the significance of the centrometric index and of the chromosome length, as features in chromosome classification The PCA technique demonstrated the importance of retaining most of the image information whenever small training sets are used However, the use of large training sets enables considerable data compression Both techniques yield the benefit of using only about 70% of the available features to get almost the ultimate classification performance >

01 Jan 1994
TL;DR: Two methods of finding relevant attributes, FOCUS and RELIEF, are tested to see how the attributes they select perform with ID3/C4.5 on two learning problems from a calendar scheduling domain.
Abstract: Eliminating irrelevant attributes prior to induction boosts the performance of many learning algorithms. Relevance, however, is no guarantee of usefulness to a particular learner. We test two methods of finding relevant attributes, FOCUS and RELIEF, to see how the attributes they select perform with ID3/C4.5 on two learning problems from a calendar scheduling domain. A more direct attribute selection procedure, hillclimbing in attribute space, finds superior attribute sets.

Book ChapterDOI
26 May 1994
TL;DR: A method is derived, called Optimal Cell Damage -OCD-, which evaluates the usefulness of input variables in a Multi-Layer Network and prunes the least useful, achieved during training of the classifiers, ensuring that the selected set of variables matches the classifier complexity.
Abstract: Neural Networks -NN- have been used in a large variety of real-world applications. In those, one could measure a potentially large number N of variables Xi; probably not all Xi are equally informative: if one could select n« N “best” variables Xi, then one could reduce the amount of data to gather and process; hence reduce costs. Variable selection is thus an important issue in Pattern Recognition and Regression. It is also a complex problem; one needs a criterion to measure the value of a subset of variables and that value will of course depend on the predictor or classifier further used. Conventional variable selection techniques are based upon statistical or heuristics tools [Fukunaga, 90]: the major difficulty comes from the intrinsic combinatorics of the problem. In this paper we show how to use NNs for variable selection with a criterion based upon the evaluation of a variable usefulness. Various methods have been proposed to assess the value of a weight (e.g. saliency [Le Cun et al. 90] in the Optimal Brain-Damage -OBD- procedure): along similar ideas, we derive a method, called Optimal Cell Damage -OCD-, which evaluates the usefulness of input variables in a Multi-Layer Network and prunes the least useful. Variable selection is thus achieved during training of the classifier, ensuring that the selected set of variables matches the classifier complexity. Variable selection is thus viewed here as an extension of weight pruning. One can also use a regularization approach to variable selection, which we will discuss elsewhere [Cibas et al., 94]. We illustrate our method on two relatively small problems: prediction of a synthetic time series and classification of waveforms [Breiman et al., 84], representative of relatively hard problems.

Proceedings ArticleDOI
01 Jan 1994
TL;DR: Two feature selection methods, a distinction-sensitive learning vector quantizer (DSLVQ) and a genetic algorithm (GA) approach, are applied to multichannel electroencephalogram (EEG) patterns, showing the importance of methods automatically selecting the most distinctive out of a number of available features.
Abstract: Two feature selection methods, a distinction-sensitive learning vector quantizer (DSLVQ) and a genetic algorithm (GA) approach, are applied to multichannel electroencephalogram (EEG) patterns. It is shown how DSLVQ adjusts the influence of different input features according to their relevance for classification. Using a weighted distance function DSLVQ thereby performs feature selection along with classification. The results are compared with those of a GA which minimizes the number of features taken for classification while maximizing classification performance. The multichannel EEG patterns used in this paper stem from a study for the construction of a brain-computer interface, which is a system designed for handicapped persons to help them use their EEG for control of their environment. For such a system, reliable EEG classification, i.e. differentiation of several distinctive EEG patterns, is vital. In practice the number of electrodes for EEG recordings can be high (up to 56 and more) and different frequency bands and time intervals for each electrode can be used for classification simultaneously. This shows the importance of methods automatically selecting the most distinctive out of a number of available features. >

Journal ArticleDOI
TL;DR: A novel adaptive algorithm, termed learning vector classifier (LVC) is compared with standard K-means and LVQ2 classifiers and it is found that LVC is a supervised learning classifier that improves performance by increasing the resolution of the decision boundaries.
Abstract: A feature set that captures the dynamics of formant transitions prior to closure in a VCV environment is used to characterize and classify the unvoiced stop consonants. The feature set is derived from a time-varying, data-selective model for the speech signal. Its performance is compared with that of comparable formant data from a standard delta-LPC-based model. The different feature sets are evaluated on a database composed of eight talkers. A 40% reduction in classification error rate is obtained by means of the time-varying model. The performance of three different classifiers is discussed. A novel adaptive algorithm, termed learning vector classifier (LVC) is compared with standard K-means and LVQ2 classifiers. LVC is a supervised learning classifier that improves performance by increasing the resolution of the decision boundaries. Error rates obtained for the three-way (p, t, and k) classification task using LVC and the time-varying analysis are comparable to that of techniques that make use of additional discriminating information contained in the burst. Further improvements are expected when an expanded time-varying feature set is utilized, coupled with information from the burst. >

01 Jan 1994
TL;DR: The method introduced here for deriving an ROC for a backpropagation ANN classifier provides superior performance to the currently used methods, and a novel method for generating ROC curves for backpropaganda ANNs is introduced.
Abstract: In this work we have examined the application of computer image analysis techniques to digitized mammographic images for the purpose of detecting two types of mammographic abnormalities, namely clustered microcalcifications and spiculated lesions. We have used three separate image data sets for the experiments. Two of the data sets have been used in the previous work of other researchers, thus permitting a direct comparison to their results. Our algorithms are region-oriented approaches. Thus, classification is performed on a region of interest, or object, that is segmented from the image. A set of features is computed for each object, and statistical classification is used to label the objects as either normal or abnormal. We have examined the performance of five classifiers. They are: a Linear Bayesian (LC), a Quadratic Bayesian (QC), a K-Nearest Neighbor (KNN), a Binary Decision Tree (BDT), and an Artificial Neural Network (ANN) classifier. Receiver Operating Characteristic (ROC) analysis is utilized to evaluate the performance of the classifiers. We introduce a novel method for generating ROC curves for backpropagation ANNs. We introduce a method (QESFS) for automatic feature selection for determining a nearly optimal feature vector from a pool of many features. The conclusions that can be drawn from this work are: (1) The method introduced here for deriving an ROC for a backpropagation ANN classifier provides superior performance to the currently used methods. (2) The QESFS approach to feature selection introduced here provides a computationally practical way of choosing what should be a nearly optimal feature vector. (3) The apparently most useful features for the problems addressed in this work are texture, shape, and contrast. (4) Feature vectors of moderate size (6 to 8 features) are appropriate for the problems addressed here. (5) The KNN classifier provides generally superior performance for the problems addressed here (relative to the other classifiers: LC, QC, BDT, ANN). (6) Spiculated detection appears to require at least 280 microns per pixel of spatial resolution. (7) Microcalcification detection does appear to require at least 50 microns per pixel of spatial resolution. (8) A KNN classifier appears to make use of information contained in the 10th bit or greater of intensity resolution for spiculated lesion detection. A QC classifier for the same problem appears to need only 8 bits to achieve only slightly lower performance. (9) Both QC and KNN appear to require 10 bits or more of intensity resolution for microcalcification detection. (10) The region-oriented approaches explored here appear to lag behind the pixel-oriented approaches explored by other researchers. Possible explanations for this involve the different volumes of training data possible in the two approaches when the total number of images is still "small", the ability of the pixel-oriented approach to defer declaring an object until after individual pixels of the object have been examined, and the need of the region-oriented approaches to be very liberal in the initial segmentation stage in order to maintain high sensitivity. (Abstract shortened by UMI.)

Book ChapterDOI
TL;DR: This work presents an alternative approach that uses a new metric for discriminatory power called relative feature importance (RFI), which ranks features by discriminatory power based on their distributional structure, without parametric assumptions.
Abstract: Feature selection for classification is the process of determining the discriminatory power of features. Discriminatory power is the relative usefulness of a feature for classification. Traditional feature selection techniques have defined discriminatory power in terms of a particular type of classifier. We present an alternative approach that uses a new metric for discriminatory power called relative feature importance (RFI). RFI ranks features by discriminatory power based on their distributional structure, without parametric assumptions. Because computing RFI is not, in general, practicable, we also present a technique that efficiently estimates RFI. This hybrid technique, the genetic neural RFI estimator (GENFIE), has accurately estimated RFI in preliminary studies.

Journal ArticleDOI
TL;DR: A new method is presented to approximate class conditional densities by a mixture of parameterized densities to facilitate simultaneous decision rule inference and selection of discriminative features which characterize the image entities to be classified.

Proceedings ArticleDOI
27 Jun 1994
TL;DR: A Distinction Sensitive Learning Vector Quantizer (DSLVQ), based on the LVQ3 algorithm, is introduced which automatically adjusts the influence of the input features according to their observed relevance for classification.
Abstract: A Distinction Sensitive Learning Vector Quantizer (DSLVQ), based on the LVQ3 algorithm, is introduced which automatically adjusts the influence of the input features according to their observed relevance for classification. DSLVQ is less sensitive to noisy features than standard LVQ and its importance adjustments are transparent and can be exploited for input data feature selection. As an example, the algorithm is applied to the classification of two artificial data sets: Breiman's (1984) waveform data and Kohonen's "hard" classification task. >

Book ChapterDOI
01 May 1994
TL;DR: A new method for selecting features, or deciding on splitting points in inductive learning, is proposed to take the positions of examples into account instead of just considering the numbers of examples from different classes that fall at different sides of a splitting rule.
Abstract: We propose a new method for selecting features, or deciding on splitting points in inductive learning. Its main innovation is to take the positions of examples into account instead of just considering the numbers of examples from different classes that fall at different sides of a splitting rule. The method gives rise to a family of feature selection techniques. We demonstrate the promise of the developed method with initial empirical experiments in connection of top-down induction of decision trees.

Book ChapterDOI
01 Jan 1994
TL;DR: The Pretended simplicity theorem based on the mutual neighborhood graph (MNG) defined on the CSM is presented, and the feature selection method is realized in terms of the MNG.
Abstract: This paper presents the Cartesian space model (CSM) which is a mathematical model to treat symbolic data. Then, as a similar theorem to the Theorem of the ugly duckling by Watanabe, we present the Pretended simplicity theorem based on the mutual neighborhood graph (MNG) defined on the CSM. Our feature selection method is realized in terms of the MNG. We present a parity problem in order to illustrate the effectiveness of our feature selection method.