Showing papers on "Feature selection published in 1994"

PDF

Open Access

Proceedings Article•DOI•

[...]

Jianbo Shi¹, Tomasi²•Institutions (2)

Cornell University¹, Stanford University²

21 Jun 1994

TL;DR: A feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world are proposed.

...read moreread less

Abstract: No feature-based vision system can work unless good features can be identified and tracked from frame to frame. Although tracking itself is by and large a solved problem, selecting features that can be tracked well and correspond to physical points in the world is still hard. We propose a feature selection criterion that is optimal by construction because it is based on how the tracker works, and a feature monitoring method that can detect occlusions, disocclusions, and features that do not correspond to points in the world. These methods are based on a new tracking algorithm that extends previous Newton-Raphson style search methods to work under affine image transformations. We test performance with several simulations and experiments. >

...read moreread less

8,432 citations

Journal Article•DOI•

Floating search methods in feature selection

[...]

Pavel Pudil¹, Jana Novovičová¹, Josef Kittler¹•Institutions (1)

University of Surrey¹

01 Nov 1994-Pattern Recognition Letters

TL;DR: Sequential search methods characterized by a dynamically changing number of features included or eliminated at each step, henceforth "floating" methods, are presented and are shown to give very good results and to be computationally more effective than the branch and bound method.

...read moreread less

3,104 citations

Report•DOI•

Selection of Relevant Features in Machine Learning

[...]

Pat Langley

01 Nov 1994

TL;DR: This paper describes the problem of selecting rele- vant features for use in machine learning in terms of heuristic search through a space of feature sets, and identifies four dimensions along which approaches to the problem can vary.

...read moreread less

Abstract: In this paper, we review the problem of selecting rele- vant features for use in machine learning. We describe this problem in terms of heuristic search through a space of feature sets, and we identify four dimensions along which approaches to the problem can vary. We consider recent work on feature selection in terms of this framework, then close with some challenges for future work in the area. 1. The Problem of Irrelevant Features accuracy) to grow slowly with the number of irrele- vant attributes. Theoretical results for algorithms that search restricted hypothesis spaces are encouraging. For instance, the worst-case number of errors made by Littlestone's (1987) WINNOW method grows only logarithmically with the number of irrelevant features. Pazzani and Sarrett's (1992) average-case analysis for WHOLIST, a simple conjunctive algorithm, and Lang- ley and Iba's (1993) treatment of the naive Bayesian classifier, suggest that their sample complexities grow at most linearly with the number of irrelevant features. However, the theoretical results are less optimistic for induction methods that search a larger space of concept descriptions. For example, Langley and Iba's (1993) average-case analysis of simple nearest neighbor indicates that its sample complexity grows exponen- tially with the number of irrelevant attributes, even for conjunctive target concepts. Experimental stud- ies of nearest neighbor are consistent with this conclu- sion, and other experiments suggest that similar results hold even for induction algorithms that explicitly se- lect features. For example, the sample complexity for decision-tree methods appears to grow linearly with the number of irrelevants for conjunctive concepts, but exponentially for parity concepts, since the evaluation metric cannot distinguish relevant from irrelevant fea- tures in the latter situation (Langley & Sage, in press). Results of this sort have encouraged machine learn- ing researchers to explore more sophisticated methods for selecting relevant features. In the sections that fol- low, we present a general framework for this task, and then consider some recent examples of work on this important problem.

...read moreread less

735 citations

Book Chapter•DOI•

Greedy attribute selection

[...]

Rich Caruana¹, Dayne Freitag¹•Institutions (1)

Carnegie Mellon University¹

10 Jul 1994

TL;DR: Experiments suggest hillclimbing in attribute space can yield substantial improvements in generalization performance, and a caching scheme is presented that makes attribute hillClimbing more practical computationally.

...read moreread less

Abstract: Many real-world domains bless us with a wealth of attributes to use for learning. This blessing is often a curse: most inductive methods generalize worse given too many attributes than if given a good subset of those attributes. We examine this problem for two learning tasks taken from a calendar scheduling domain. We show that ID3/C4.5 generalizes poorly on these tasks if allowed to use all available attributes. We examine five greedy hillclimbing procedures that search for attribute sets that generalize well with ID3/C4.5. Experiments suggest hillclimbing in attribute space can yield substantial improvements in generalization performance. We present a caching scheme that makes attribute hillclimbing more practical computationally. We also compare the results of hillclimbing in attribute space with FOCUS and RELIEF on the two tasks.

...read moreread less

572 citations

Book Chapter•DOI•

Prototype and feature selection by sampling and random mutation hill climbing algorithms

[...]

David B. Skalak¹•Institutions (1)

University of Massachusetts Amherst¹

10 Jul 1994

TL;DR: On four datasets, it is shown that only three or four prototypes sufficed to give predictive accuracy equal or superior to a basic nearest neighbor algorithm whose run-time storage costs were approximately 10 to 200 times greater.

...read moreread less

Abstract: With the goal of reducing computational costs without sacrificing accuracy, we describe two algorithms to find sets of prototypes for nearest neighbor classification. Here, the term “prototypes” refers to the reference instances used in a nearest neighbor computation — the instances with respect to which similarity is assessed in order to assign a class to a new data item. Both algorithms rely on stochastic techniques to search the space of sets of prototypes and are simple to implement. The first is a Monte Carlo sampling algorithm; the second applies random mutation hill climbing. On four datasets we show that only three or four prototypes sufficed to give predictive accuracy equal or superior to a basic nearest neighbor algorithm whose run-time storage costs were approximately 10 to 200 times greater. We briefly investigate how random mutation hill climbing may be applied to select features and prototypes simultaneously. Finally, we explain the performance of the sampling algorithm on these datasets in terms of a statistical measure of the extent of clustering displayed by the target classes.

...read moreread less

510 citations

Proceedings Article•DOI•

Floating search methods for feature selection with nonmonotonic criterion functions

[...]

Pavel Pudil¹, Francesc J. Ferri, Jana Novovičová, Josef Kittler•Institutions (1)

University of Surrey¹

09 Oct 1994

TL;DR: The recently developed "floating search" algorithms are presented and modified to a more compact form facilitating their direct comparison with the well known (l,r) search.

...read moreread less

Abstract: In this paper the recently developed "floating search" algorithms are presented and modified to a more compact form facilitating their direct comparison with the well known (l,r) search. The properties of the floating search methods are investigated, especially with respect to their tolerance to nonmonotonic criteria. Their computational efficiency is demonstrated by results on real data of high dimensionality.

...read moreread less

368 citations

Journal Article•DOI•

Variable Selection in QSAR Studies. I. An Evolutionary Algorithm

[...]

Hxugo Kubiny

01 Jan 1994-Quantitative Structure-activity Relationships

TL;DR: A comparison of the results for the Selwood data set with those obtained by other groups shows that more relevant models are derived by the evolutionary approach than by other methods.

...read moreread less

Abstract: In QSAR studies of large data sets, variable selection and model building is a difficult, time-consuming and ambiguous procedure. While most often stepwise regression procedures are applied for this purpose, other strategies, like neural networks, cluster significance analysis or genetic algorithms have been used. A simple and efficient evolutionary strategy, including iterative mutation and selection, but avoiding crossover of regression models, is described in this work. The MUSEUM (Mutation and Selection Uncover Models) algorithm starts from a model containing any number of randomly chosen variables. Random mutation, first by addition or elimination of only one or very few variables, afterwards by simultaneous random additions, eliminations and/or exchanges of several variables at a time, leads to new models which are evaluated by an appropriate fitness function. In contrast to common genetic algorithm procedures, only the “fittest” model is stored and used for further mutation and selection, leading to better and better models. In the last steps of mutation, all variables inside the model are eliminated and all variables outside the model are added, one by one, to control whether this systematic strategy detects any mutation which still improves the model. After every generation of a better model, a new random mutation procedure starts from this model. In the very last step, variables not significant at the 95% level are eliminated, starting with the least significant variable. In this manner, “stable” models are produced, containing only significant variables. A comparison of the results for the Selwood data set (n = 31 compounds, k = 53 variables) with those obtained by other groups shows that more relevant models are derived by the evolutionary approach than by other methods.

...read moreread less

259 citations

Book Chapter•DOI•

Comparative study of techniques for large-scale feature selection

[...]

Francesc J. Ferri¹, Pavel Pudil, M. Hatef², Josef Kittler²•Institutions (2)

University of Valencia¹, University of Surrey²

01 Jan 1994-Machine Intelligence and Pattern Recognition

TL;DR: In this paper, the authors compared the performance of the (l, r) search algorithm with the genetic approach to feature subset search in high-dimensional spaces and found that the properties inferred for these techniques from medium scale experiments involving up to a few tens of dimensions extend to dimensionalities of one order of magnitude higher.

...read moreread less

Abstract: The combinatorial search problem arising in feature selection in high dimensional spaces is considered. Recently developed techniques based on the classical sequential methods and the (l, r) search called Floating search algorithms are compared against the Genetic approach to feature subset search. Both approaches have been designed with the view to give a good compromise between efficiency and effectiveness for large problems. The purpose of this paper is to investigate the applicability of these techniques to high dimensional problems of feature selection. The aim is to establish whether the properties inferred for these techniques from medium scale experiments involving up to a few tens of dimensions extend to dimensionalities of one order of magnitude higher. Further, relative merits of these techniques vis-a-vis such high dimensional problems are explored and the possibility of exploiting the best aspects of these methods to create a composite feature selection procedure with superior properties is considered.

...read moreread less

252 citations

Journal Article•DOI•

Interactive variable selection (IVS) for PLS. Part 1: Theory and algorithms

[...]

Fredrick Lindgren¹, Paul Geladi¹, Stefan Rännar¹, Svante Wold¹•Institutions (1)

Umeå University¹

01 Sep 1994-Journal of Chemometrics

TL;DR: A modified PLS algorithm is introduced with the goal of achieving improved prediction ability, based on dimension‐wise selective reweighting of single elements in the PLS weight vector w that leads to rotation of the classical PLS solution.

...read moreread less

Abstract: A modified PLS algorithm is introduced with the goal of achieving improved prediction ability The method, denoted IVS-PLS, is based on dimension-wise selective reweighting of single elements in the PLS weight vector w Cross-validation, a criterion for the estimation of predictive quality, is used for guiding the selection procedure in the modelling stage A threshold that controls the size of the selected values in w is put inside a cross-validation loop This loop is repeated for each dimension and the results are interpreted graphically The manipulation of w leads to rotation of the classical PLS solution The results of IVS-PLS are different from simply selecting X-variables prior to modelling The theory is explained and the algorithm is demonstrated for a simulated data set with 200 variables and 40 objects, representing a typical spectral calibration situation with four analytes Improvements of up to 70% in external PRESS over the classical PLS algorithm are shown to be possible

...read moreread less

207 citations

Journal Article•DOI•

Protein secondary structure from circular dichroism spectroscopy. Combining variable selection principle and cluster analysis with neural network, ridge regression and self-consistent methods.

[...]

Narasimha Sreerama¹, Robert W. Woody¹•Institutions (1)

Colorado State University¹

29 Sep 1994-Journal of Molecular Biology

TL;DR: While the neural network method performed slightly better than the other two methods at the basic level, the inclusion of the variable selection principle led to similar performance indices for all three methods.

...read moreread less

198 citations

Journal Article•DOI•

Application of a genetic algorithm to feature selection under full validation conditions and to outlier detection

[...]

Riccardo Leardi¹•Institutions (1)

University of Genoa¹

01 Jan 1994-Journal of Chemometrics

TL;DR: A development of a previous genetic algorithm is presented so that a full validation of the results can be obtained and this algorithm has been shown to perform very well also as an outlier detector, allowing easy identification of the presence of outliers even in cases where the ‘classical’ techniques fail.

...read moreread less

Abstract: Genetic algorithms have been proved to be a very efficient method in the feature selection problem. However, as for every other method, if the validation of the results is performed in an incomplete way, erroneous conclusions can be drawn. In this paper a development of a previous genetic algorithm is presented so that a full validation of the results can be obtained. Furthermore, this algorithm has been shown to perform very well also as an outlier detector, allowing easy identification of the presence of outliers even in cases where the ‘classical’ techniques fail.

...read moreread less

Journal Article•DOI•

Bias in information-based measures in decision tree induction

[...]

Allan P. White¹, Wei Zhong Liu¹•Institutions (1)

University of Birmingham¹

01 Jun 1994-Machine Learning

TL;DR: A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees and it is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.

...read moreread less

Abstract: A fresh look is taken at the problem of bias in information-based attribute selection measures, used in the induction of decision trees. The approach uses statistical simulation techniques to demonstrate that the usual measures such as information gain, gain ratio, and a new measure recently proposed by Lopez de Mantaras (1991) are all biased in favour of attributes with large numbers of values. It is concluded that approaches which utilise the chi-square distribution are preferable because they compensate automatically for differences between attributes in the number of levels they take.

...read moreread less

Feature Selection for Case-Based Classification of Cloud Types: An Empirical Comparison

[...]

David W. Aha¹, Richard L. Bankert•Institutions (1)

United States Department of the Navy¹

01 Jan 1994

TL;DR: A set of such algorithms that use case-based classifiers, empirically compare them, and introduce novel extensions of backward sequential selection that allows it to scale to this task are described.

...read moreread less

Abstract: Accurate weather prediction is crucial for many activities, including Naval operations. Researchers within the meteorological division of the Naval Research Laboratory have developed and fielded several expert systems for problems such as fog and turbulence forecasting, and tropical storm movement. They are currently developing an automated system for satellite image interpretation, part of which involves cloud classification. Their cloud classification database contains 204 high-level features, but contains only a few thousand instances. The predictive accuracy of classifiers can be improved on this task by employing a feature selection algorithm. We explain why non-parametric case-based classifiers are excellent choices for use in feature selection algorithms. We then describe a set of such algorithms that use case-based classifiers, empirically compare them, and introduce novel extensions of backward sequential selection that allows it to scale to this task. Several of the approaches we tested located feature subsets that attain significantly higher accuracies than those found in previously published research, and some did so with fewer features.

...read moreread less

Journal Article•DOI•

Technical Note: Bias in Information-Based Measures in Decision Tree Induction

[...]

Allan P. White¹, Wei Zhong Liu¹•Institutions (1)

University of Birmingham¹

01 Jun 1994-Machine Learning

...read moreread less

Journal Article•DOI•

Learning and feature selection in stereo matching

[...]

Michael S. Lew¹, Thomas S. Huang¹, Kam W. Wong²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Urbana University²

01 Sep 1994-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel stereo matching algorithm which integrates learning, feature selection, and surface reconstruction, and a self-diagnostic method for determining when apriori knowledge is necessary for finding the correct match is presented.

...read moreread less

Abstract: We present a novel stereo matching algorithm which integrates learning, feature selection, and surface reconstruction. First, a new instance based learning (IBL) algorithm is used to generate an approximation to the optimal feature set for matching. In addition, the importance of two separate kinds of knowledge, image dependent knowledge and image independent knowledge, is discussed. Second, we develop an adaptive method for refining the feature set. This adaptive method analyzes the feature error to locate areas of the image that would lead to false matches. Then these areas are used to guide the search through feature space towards maximizing the class separation distance between the correct match and the false matches. Third, we introduce a self-diagnostic method for determining when apriori knowledge is necessary for finding the correct match. If the a priori knowledge is necessary then we use a surface reconstruction model to discriminate between match possibilities. Our algorithm is comprehensively tested against fixed feature set algorithms and against a traditional pyramid algorithm. Finally, we present and discuss extensive empirical results of our algorithm based on a large set of real images. >

...read moreread less

Journal Article•DOI•

A Predictive Approach to the Analysis of Designed Experiments

[...]

Joseph G. Ibrahim¹, Purushottam W. Laud¹•Institutions (1)

Northern Illinois University¹

01 Mar 1994-Journal of the American Statistical Association

TL;DR: Viewing the analysis of designed experiments as a model selection problem, the use of a predictive Bayesian criterion based on the predictive density of a replicate experiment (PDRE) is introduced and compared with the usual F statistic for two nested models.

...read moreread less

Abstract: Viewing the analysis of designed experiments as a model selection problem, we introduce the use of a predictive Bayesian criterion in this context based on the predictive density of a replicate experiment (PDRE). A calibration of the criterion is provided to assist in the model choice. The relationships of the proposed criterion to other prevalent criteria, such as AIC, BIC, and Mallows's C p , are given. An information theoretic criterion based on the PDRE's of two competing models is also introduced and compared with the usual F statistic for two nested models. Examples are given to illustrate the proposed methodology.

...read moreread less

Journal Article•DOI•

The Importance of Attribute Selection Measures in Decision Tree Induction

[...]

Wei Zhong Liu¹, Allan P. White¹•Institutions (1)

University of Birmingham¹

01 Apr 1994-Machine Learning

TL;DR: Three experiments reported here employed synthetic data sets, constructed so as to have the precise properties required to test specific hypotheses, and showed that the performance decrement typical of random attribute selection increased with the number of available pure-noise attributes.

...read moreread less

Abstract: Recent work by Mingers and by Buntine and Niblett on the performance of various attribute selection measures has addressed the topic of random selection of attributes in the construction of decision trees. This article is concerned with the mechanisms underlying the relative performance of conventional and random attribute selection measures. The three experiments reported here employed synthetic data sets, constructed so as to have the precise properties required to test specific hypotheses. The principal underlying idea was that the performance decrement typical of random attribute selection is due to two factors. First, there is a greater chance that informative attributes will be omitted from the subset selected for the final tree. Second, there is a greater risk of overfitting, which is caused by attributes of little or no value in discriminating between classes being “locked in” to the tree structure, near the root. The first experiment showed that the performance decrement increased with the number of available pure-noise attributes. The second experiment indicated that there was little decrement when all the attributes were of equal importance in discriminating between classes. The third experiment showed that a rather greater performance decrement (than in the second experiment) could be expected if the attributes were all informative, but to different degrees.

...read moreread less

Feature Subset Selection as Search with Probabilistic Estimates

[...]

Ron Kohavi

01 Jan 1994

TL;DR: This work formulate the search for a feature subset as an abstract search problem with probabilistic estimates, and shows how recent feature subset selection algorithms in the machine learning literature fit into this search problem as simple hill climbing approaches.

...read moreread less

Abstract: Irrelevant features and weakly relevant features may reduce the comprehensibility and accuracy of concepts induced by supervised learning algorithms. We formulate the search for a feature subset as an abstract search problem with probabilistic estimates. Searching a space using an evaluation function that is a random variable requires trading off accuracy of estimates for increased state exploration. We show how recent feature subset selection algorithms in the machine learning literature fit into this search problem as simple hill climbing approaches, and conduct a small experiment using a best-first search technique.

...read moreread less

Proceedings Article•DOI•

Parsimonious network design and feature selection through node pruning

[...]

Jianchang Mao¹, K. Mohiuddin, Anil K. Jain•Institutions (1)

IBM¹

09 Oct 1994

TL;DR: A node-pruning procedure is presented to remove insalient nodes in the network to create a parsimonious network and the optimal/suboptimal subset of features are simultaneously selected by the network.

...read moreread less

Abstract: Proposes a node saliency measure and a backpropagation type of algorithm to compute the node saliencies. A node-pruning procedure is then presented to remove insalient nodes in the network to create a parsimonious network. The optimal/suboptimal subset of features are simultaneously selected by the network. The performance of the proposed approach for feature selection is compared with Whitney's feature selection method. One advantage of the node-pruning procedure over classical feature selection methods is that the node-pruning procedure can simultaneously "optimize" both the feature set and the classifier, while classical feature selection methods select the "best" subset of features with respect to a fixed classifier.

...read moreread less

Proceedings Article•DOI•

Feature selection and chromosome classification using a multilayer perceptron neural network

[...]

Boaz Lerner¹, M. Levinstein¹, B. Rosenberg¹, Hugo Guterman¹, L. Dinstein¹, Yitzhak Romem - Show less +2 more•Institutions (1)

Ben-Gurion University of the Negev¹

01 Jan 1994

TL;DR: Two feature selection techniques and a multilayer perceptron (MLP) neural network (NN) have been used in this study for human chromosome classification and the "knock-out" algorithm emphasized the significance of the centrometric index and of the chromosome length, as features in chromosome classification.

...read moreread less

Abstract: Two feature selection techniques and a multilayer perceptron (MLP) neural network (NN) have been used in this study for human chromosome classification The first technique is the "knock-out" algorithm and the second is the principal component analysis (PCA) The "knock-out" algorithm emphasized the significance of the centrometric index and of the chromosome length, as features in chromosome classification The PCA technique demonstrated the importance of retaining most of the image information whenever small training sets are used However, the use of large training sets enables considerable data compression Both techniques yield the benefit of using only about 70% of the available features to get almost the ultimate classification performance >

...read moreread less

How Useful Is Relevance

[...]

Rich Caruana¹, Dayne Freitag•Institutions (1)

Carnegie Mellon University¹

01 Jan 1994

TL;DR: Two methods of finding relevant attributes, FOCUS and RELIEF, are tested to see how the attributes they select perform with ID3/C4.5 on two learning problems from a calendar scheduling domain.

...read moreread less

Abstract: Eliminating irrelevant attributes prior to induction boosts the performance of many learning algorithms. Relevance, however, is no guarantee of usefulness to a particular learner. We test two methods of finding relevant attributes, FOCUS and RELIEF, to see how the attributes they select perform with ID3/C4.5 on two learning problems from a calendar scheduling domain. A more direct attribute selection procedure, hillclimbing in attribute space, finds superior attribute sets.

...read moreread less

Book Chapter•DOI•

Variable Selection with Optimal Cell Damage

[...]

Tautvydas Cibas¹, Françroise Fogelman Soulié, Patrick Gallinari, Sarunas Raudys•Institutions (1)

University of Paris¹

26 May 1994

TL;DR: A method is derived, called Optimal Cell Damage -OCD-, which evaluates the usefulness of input variables in a Multi-Layer Network and prunes the least useful, achieved during training of the classifiers, ensuring that the selected set of variables matches the classifier complexity.

...read moreread less

Abstract: Neural Networks -NN- have been used in a large variety of real-world applications. In those, one could measure a potentially large number N of variables Xi; probably not all Xi are equally informative: if one could select n« N “best” variables Xi, then one could reduce the amount of data to gather and process; hence reduce costs. Variable selection is thus an important issue in Pattern Recognition and Regression. It is also a complex problem; one needs a criterion to measure the value of a subset of variables and that value will of course depend on the predictor or classifier further used. Conventional variable selection techniques are based upon statistical or heuristics tools [Fukunaga, 90]: the major difficulty comes from the intrinsic combinatorics of the problem. In this paper we show how to use NNs for variable selection with a criterion based upon the evaluation of a variable usefulness. Various methods have been proposed to assess the value of a weight (e.g. saliency [Le Cun et al. 90] in the Optimal Brain-Damage -OBD- procedure): along similar ideas, we derive a method, called Optimal Cell Damage -OCD-, which evaluates the usefulness of input variables in a Multi-Layer Network and prunes the least useful. Variable selection is thus achieved during training of the classifier, ensuring that the selected set of variables matches the classifier complexity. Variable selection is thus viewed here as an extension of weight pruning. One can also use a regularization approach to variable selection, which we will discuss elsewhere [Cibas et al., 94]. We illustrate our method on two relatively small problems: prediction of a synthetic time series and classification of waveforms [Breiman et al., 84], representative of relatively hard problems.

...read moreread less

Proceedings Article•DOI•

Feature selection with distinction sensitive learning vector quantisation and genetic algorithms

[...]

D. Flotzinger¹, M. Pregenzer¹, Gert Pfurtscheller¹•Institutions (1)

Graz University of Technology¹

01 Jan 1994

TL;DR: Two feature selection methods, a distinction-sensitive learning vector quantizer (DSLVQ) and a genetic algorithm (GA) approach, are applied to multichannel electroencephalogram (EEG) patterns, showing the importance of methods automatically selecting the most distinctive out of a number of available features.

...read moreread less

Abstract: Two feature selection methods, a distinction-sensitive learning vector quantizer (DSLVQ) and a genetic algorithm (GA) approach, are applied to multichannel electroencephalogram (EEG) patterns. It is shown how DSLVQ adjusts the influence of different input features according to their relevance for classification. Using a weighted distance function DSLVQ thereby performs feature selection along with classification. The results are compared with those of a GA which minimizes the number of features taken for classification while maximizing classification performance. The multichannel EEG patterns used in this paper stem from a study for the construction of a brain-computer interface, which is a system designed for handicapped persons to help them use their EEG for control of their environment. For such a system, reliable EEG classification, i.e. differentiation of several distinctive EEG patterns, is vital. In practice the number of electrodes for EEG recordings can be high (up to 56 and more) and different frequency bands and time intervals for each electrode can be used for classification simultaneously. This shows the importance of methods automatically selecting the most distinctive out of a number of available features. >

...read moreread less

Journal Article•DOI•

Time-varying feature selection and classification of unvoiced stop consonants

[...]

K.S. Nathan¹, Harvey F. Silverman¹•Institutions (1)

Brown University¹

01 Jul 1994-IEEE Transactions on Speech and Audio Processing

TL;DR: A novel adaptive algorithm, termed learning vector classifier (LVC) is compared with standard K-means and LVQ2 classifiers and it is found that LVC is a supervised learning classifier that improves performance by increasing the resolution of the decision boundaries.

...read moreread less

Abstract: A feature set that captures the dynamics of formant transitions prior to closure in a VCV environment is used to characterize and classify the unvoiced stop consonants. The feature set is derived from a time-varying, data-selective model for the speech signal. Its performance is compared with that of comparable formant data from a standard delta-LPC-based model. The different feature sets are evaluated on a database composed of eight talkers. A 40% reduction in classification error rate is obtained by means of the time-varying model. The performance of three different classifiers is discussed. A novel adaptive algorithm, termed learning vector classifier (LVC) is compared with standard K-means and LVQ2 classifiers. LVC is a supervised learning classifier that improves performance by increasing the resolution of the decision boundaries. Error rates obtained for the three-way (p, t, and k) classification task using LVC and the time-varying analysis are comparable to that of techniques that make use of additional discriminating information contained in the burst. Further improvements are expected when an expanded time-varying feature set is utilized, coupled with information from the burst. >

...read moreread less

Automated image analysis techniques for digital mammography

[...]

K. Woods, Kevin W. Bowyer

01 Jan 1994

TL;DR: The method introduced here for deriving an ROC for a backpropagation ANN classifier provides superior performance to the currently used methods, and a novel method for generating ROC curves for backpropaganda ANNs is introduced.

...read moreread less

Abstract: In this work we have examined the application of computer image analysis techniques to digitized mammographic images for the purpose of detecting two types of mammographic abnormalities, namely clustered microcalcifications and spiculated lesions. We have used three separate image data sets for the experiments. Two of the data sets have been used in the previous work of other researchers, thus permitting a direct comparison to their results. Our algorithms are region-oriented approaches. Thus, classification is performed on a region of interest, or object, that is segmented from the image. A set of features is computed for each object, and statistical classification is used to label the objects as either normal or abnormal. We have examined the performance of five classifiers. They are: a Linear Bayesian (LC), a Quadratic Bayesian (QC), a K-Nearest Neighbor (KNN), a Binary Decision Tree (BDT), and an Artificial Neural Network (ANN) classifier. Receiver Operating Characteristic (ROC) analysis is utilized to evaluate the performance of the classifiers. We introduce a novel method for generating ROC curves for backpropagation ANNs. We introduce a method (QESFS) for automatic feature selection for determining a nearly optimal feature vector from a pool of many features. The conclusions that can be drawn from this work are: (1) The method introduced here for deriving an ROC for a backpropagation ANN classifier provides superior performance to the currently used methods. (2) The QESFS approach to feature selection introduced here provides a computationally practical way of choosing what should be a nearly optimal feature vector. (3) The apparently most useful features for the problems addressed in this work are texture, shape, and contrast. (4) Feature vectors of moderate size (6 to 8 features) are appropriate for the problems addressed here. (5) The KNN classifier provides generally superior performance for the problems addressed here (relative to the other classifiers: LC, QC, BDT, ANN). (6) Spiculated detection appears to require at least 280 microns per pixel of spatial resolution. (7) Microcalcification detection does appear to require at least 50 microns per pixel of spatial resolution. (8) A KNN classifier appears to make use of information contained in the 10th bit or greater of intensity resolution for spiculated lesion detection. A QC classifier for the same problem appears to need only 8 bits to achieve only slightly lower performance. (9) Both QC and KNN appear to require 10 bits or more of intensity resolution for microcalcification detection. (10) The region-oriented approaches explored here appear to lag behind the pixel-oriented approaches explored by other researchers. Possible explanations for this involve the different volumes of training data possible in the two approaches when the total number of images is still "small", the ability of the pixel-oriented approach to defer declaring an object until after individual pixels of the object have been examined, and the need of the region-oriented approaches to be very liberal in the initial segmentation stage in order to maintain high sensitivity. (Abstract shortened by UMI.)

...read moreread less

Book Chapter•DOI•

Relative feature importance: A classifier-independent approach to feature selection

[...]

Hilary J. Holz¹, Murray H. Loew¹•Institutions (1)

George Washington University¹

01 Jan 1994-Machine Intelligence and Pattern Recognition

TL;DR: This work presents an alternative approach that uses a new metric for discriminatory power called relative feature importance (RFI), which ranks features by discriminatory power based on their distributional structure, without parametric assumptions.

...read moreread less

Abstract: Feature selection for classification is the process of determining the discriminatory power of features. Discriminatory power is the relative usefulness of a feature for classification. Traditional feature selection techniques have defined discriminatory power in terms of a particular type of classifier. We present an alternative approach that uses a new metric for discriminatory power called relative feature importance (RFI). RFI ranks features by discriminatory power based on their distributional structure, without parametric assumptions. Because computing RFI is not, in general, practicable, we also present a technique that efficiently estimates RFI. This hybrid technique, the genetic neural RFI estimator (GENFIE), has accurately estimated RFI in preliminary studies.

...read moreread less

Journal Article•DOI•

Simultaneous learning of decision rules and important attributes for classification problems in image analysis

[...]

Pavel Pudil¹, Jana Novovičová¹, Josef Kittler¹•Institutions (1)

University of Surrey¹

01 Apr 1994-Image and Vision Computing

TL;DR: A new method is presented to approximate class conditional densities by a mixture of parameterized densities to facilitate simultaneous decision rule inference and selection of discriminative features which characterize the image entities to be classified.

...read moreread less

Proceedings Article•DOI•

Distinction Sensitive Learning Vector Quantisation-a new noise-insensitive classification method

[...]

M. Pregenzer¹, Doris Flotzinger¹, Gert Pfurtscheller¹•Institutions (1)

Graz University of Technology¹

27 Jun 1994

TL;DR: A Distinction Sensitive Learning Vector Quantizer (DSLVQ), based on the LVQ3 algorithm, is introduced which automatically adjusts the influence of the input features according to their observed relevance for classification.

...read moreread less

Abstract: A Distinction Sensitive Learning Vector Quantizer (DSLVQ), based on the LVQ3 algorithm, is introduced which automatically adjusts the influence of the input features according to their observed relevance for classification. DSLVQ is less sensitive to noisy features than standard LVQ and its importance adjustments are transparent and can be exploited for input data feature selection. As an example, the algorithm is applied to the classification of two artificial data sets: Breiman's (1984) waveform data and Kohonen's "hard" classification task. >

...read moreread less

Book Chapter•DOI•

A geometric approach to feature selection

[...]

Tapio Elomaa¹, Esko Ukkonen¹•Institutions (1)

University of Helsinki¹

01 May 1994

TL;DR: A new method for selecting features, or deciding on splitting points in inductive learning, is proposed to take the positions of examples into account instead of just considering the numbers of examples from different classes that fall at different sides of a splitting rule.

...read moreread less

Abstract: We propose a new method for selecting features, or deciding on splitting points in inductive learning. Its main innovation is to take the positions of examples into account instead of just considering the numbers of examples from different classes that fall at different sides of a splitting rule. The method gives rise to a family of feature selection techniques. We demonstrate the promise of the developed method with initial empirical experiments in connection of top-down induction of decision trees.

...read moreread less

Book Chapter•DOI•

Feature Selection for Symbolic Data Classification

[...]

Manabu Ichino¹•Institutions (1)

Tokyo Denki University¹

01 Jan 1994

TL;DR: The Pretended simplicity theorem based on the mutual neighborhood graph (MNG) defined on the CSM is presented, and the feature selection method is realized in terms of the MNG.

...read moreread less

Abstract: This paper presents the Cartesian space model (CSM) which is a mathematical model to treat symbolic data. Then, as a similar theorem to the Theorem of the ugly duckling by Watanabe, we present the Pretended simplicity theorem based on the mutual neighborhood graph (MNG) defined on the CSM. Our feature selection method is realized in terms of the MNG. We present a parity problem in order to illustrate the effectiveness of our feature selection method.

...read moreread less