scispace - formally typeset
Search or ask a question

Showing papers on "Feature selection published in 1998"


01 Jan 1998
TL;DR: This thesis addresses the problem of feature selection for machine learning through a correlation based approach with CFS (Correlation based Feature Selection), an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy.
Abstract: A central problem in machine learning is identifying a representative set of features from which to construct a classification model for a particular task. This thesis addresses the problem of feature selection for machine learning through a correlation based approach. The central hypothesis is that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. A feature evaluation formula, based on ideas from test theory, provides an operational definition of this hypothesis. CFS (Correlation based Feature Selection) is an algorithm that couples this evaluation formula with an appropriate correlation measure and a heuristic search strategy. CFS was evaluated by experiments on artificial and natural datasets. Three machine learning algorithms were used: C4.5 (a decision tree learner), IB1 (an instance based learner), and naive Bayes. Experiments on artificial datasets showed that CFS quickly identifies and screens irrelevant, redundant, and noisy features, and identifies relevant features as long as their relevance does not strongly depend on other features. On natural domains, CFS typically eliminated well over half the features. In most cases, classification accuracy using the reduced feature set equaled or bettered accuracy using the complete feature set. Feature selection degraded machine learning performance in cases where some features were eliminated which were highly predictive of very small areas of the instance space. Further experiments compared CFS with a wrapper—a well known approach to feature selection that employs the target learning algorithm to evaluate feature sets. In many cases CFS gave comparable results to the wrapper, and in general, outperformed the wrapper on small datasets. CFS executes many times faster than the wrapper, which allows it to scale to larger datasets. Two methods of extending CFS to handle feature interaction are presented and experimentally evaluated. The first considers pairs of features and the second incorporates iii feature weights calculated by the RELIEF algorithm. Experiments on artificial domains showed that both methods were able to identify interacting features. On natural domains, the pairwise method gave more reliable results than using weights provided by RELIEF.

3,533 citations


Book
31 Jul 1998
TL;DR: Feature Selection for Knowledge Discovery and Data Mining offers an overview of the methods developed since the 1970's and provides a general framework in order to examine these methods and categorize them and suggests guidelines for how to use different methods under various circumstances.
Abstract: From the Publisher: With advanced computer technologies and their omnipresent usage, data accumulates in a speed unmatchable by the human's capacity to process data. To meet this growing challenge, the research community of knowledge discovery from databases emerged. The key issue studied by this community is, in layman's terms, to make advantageous use of large stores of data. In order to make raw data useful, it is necessary to represent, process, and extract knowledge for various applications. Feature Selection for Knowledge Discovery and Data Mining offers an overview of the methods developed since the 1970's and provides a general framework in order to examine these methods and categorize them. This book employs simple examples to show the essence of representative feature selection methods and compares them using data sets with combinations of intrinsic properties according to the objective of feature selection. In addition, the book suggests guidelines for how to use different methods under various circumstances and points out new challenges in this exciting area of research. Feature Selection for Knowledge Discovery and Data Mining is intended to be used by researchers in machine learning, data mining, knowledge discovery, and databases as a toolbox of relevant tools that help in solving large real-world problems. This book is also intended to serve as a reference book or secondary text for courses on machine learning, data mining, and databases.

1,867 citations


Journal ArticleDOI
TL;DR: The authors' approach uses a genetic algorithm to select subsets of attributes or features to represent the patterns to be classified, achieving multicriteria optimization in terms of generalization accuracy and costs associated with the features.
Abstract: Practical pattern-classification and knowledge-discovery problems require the selection of a subset of attributes or features to represent the patterns to be classified. The authors' approach uses a genetic algorithm to select such subsets, achieving multicriteria optimization in terms of generalization accuracy and costs associated with the features.

1,465 citations


Proceedings Article
24 Jul 1998
TL;DR: Numerical tests on 6 public data sets show that classi ers trained by the concave minimization approach and those trained by a support vector machine have comparable 10fold cross-validation correctness.
Abstract: Computational comparison is made between two feature selection approaches for nding a separating plane that discriminates between two point sets in an n-dimensional feature space that utilizes as few of the n features (dimensions) as possible. In the concave minimization approach [19, 5] a separating plane is generated by minimizing a weighted sum of distances of misclassi ed points to two parallel planes that bound the sets and which determine the separating plane midway between them. Furthermore, the number of dimensions of the space used to determine the plane is minimized. In the support vector machine approach [27, 7, 1, 10, 24, 28], in addition to minimizing the weighted sum of distances of misclassi ed points to the bounding planes, we also maximize the distance between the two bounding planes that generate the separating plane. Computational results show that feature suppression is an indirect consequence of the support vector machine approach when an appropriate norm is used. Numerical tests on 6 public data sets show that classi ers trained by the concave minimization approach and those trained by a support vector machine have comparable 10fold cross-validation correctness. However, in all data sets tested, the classi ers obtained by the concave minimization approach selected fewer problem features than those trained by a support vector machine.

1,074 citations



Journal ArticleDOI
TL;DR: A new approach is proposed, and the positive and negative aspects of the application of GA in selecting variables for a partial least squares (PLS) model are taken into account, showing that this technique almost always produces very good results.

685 citations


Journal ArticleDOI
TL;DR: The concept of GDF offers a unified framework under which complex and highly irregular modeling procedures can be analyzed in the same way as classical linear models and many difficult problems can be solved easily.
Abstract: In the theory of linear models, the concept of degrees of freedom plays an important role. This concept is often used for measurement of model complexity, for obtaining an unbiased estimate of the error variance, and for comparison of different models. I have developed a concept of generalized degrees of freedom (GDF) that is applicable to complex modeling procedures. The definition is based on the sum of the sensitivity of each fitted value to perturbation in the corresponding observed value. The concept is nonasymptotic in nature and does not require analytic knowledge of the modeling procedures. The concept of GDF offers a unified framework under which complex and highly irregular modeling procedures can be analyzed in the same way as classical linear models. By using this framework, many difficult problems can be solved easily. For example, one can now measure the number of observations used in a variable selection process. Different modeling procedures, such as a tree-based regression and a ...

525 citations


01 Jan 1998
TL;DR: A new feature selection algorithm is described that uses a correlation based heuristic to determine the “goodness” of feature subsets, and its effectiveness is evaluated with three common machine learning algorithms.
Abstract: Machine learning algorithms automatically extract knowledge from machine readable information. Unfortunately, their success is usually dependant on the quality of the data that they operate on. If the data is inadequate, or contains extraneous and irrelevant information, machine learning algorithms may produce less accurate and less understandable results, or may fail to discover anything of use at all. Feature subset selectors are algorithms that attempt to identify and remove as much irrelevant and redundant information as possible prior to learning. Feature subset selection can result in enhanced performance, a reduced hypothesis search space, and, in some cases, reduced storage requirement. This paper describes a new feature selection algorithm that uses a correlation based heuristic to determine the “goodness” of feature subsets, and evaluates its effectiveness with three common machine learning algorithms. Experiments using a number of standard machine learning data sets are presented. Feature subset selection gave significant improvement for all three algorithms.

515 citations


Proceedings ArticleDOI
27 Aug 1998
TL;DR: A data mining framework for constructing intrusion detection models to mine system audit data for consistent and useful patterns of program and user behavior, and use the set of relevant system features presented in the patterns to compute classifiers that can recognize anomalies and known intrusions.
Abstract: In this paper we discuss a data mining framework for constructing intrusion detection models. The key ideas are to mine system audit data for consistent and useful patterns of program and user behavior, and use the set of relevant system features presented in the patterns to compute (inductively learned) classifiers that can recognize anomalies and known intrusions. Our past experiments showed that classifiers can be used to detect intrusions, provided that sufficient audit data is available for training and the right set of system features are selected. We propose to use the association rules and frequent episodes computed from audit data as the basis for guiding the audit data gathering and feature selection processes. We modify these two basic algorithms to use axis attribute(s) as a form of item constraints to compute only the relevant ("useful") patterns, and an iterative level-wise approximate mining procedure to uncover the low frequency (but important) patterns. We report our experiments in using these algorithms on real-world audit data.

299 citations


Journal ArticleDOI
TL;DR: A very sparse data structure, the ADtree, is provided to minimize memory use and it is empirically demonstrated that tractably-sized data structures can be produced for large real-world datasets by using a sparse tree structure that never allocates memory for counts of zero.
Abstract: This paper introduces new algorithms and data structures for quick counting for machine learning datasets. We focus on the counting task of constructing contingency tables, but our approach is also applicable to counting the number of records in a dataset that match conjunctive queries. Subject to certain assumptions, the costs of these operations can be shown to be independent of the number of records in the dataset and loglinear in the number of non-zero entries in the contingency table. We provide a very sparse data structure, the ADtree, to minimize memory use. We provide analytical worst-case bounds for this structure for several models of data distribution. We empirically demonstrate that tractably-sized data structures can be produced for large real-world datasets by (a) using a sparse tree structure that never allocates memory for counts of zero, (b) never allocating memory for counts that can be deduced from other counts, and (c) not bothering to expand the tree fully near its leaves. We show how the ADtree can be used to accelerate Bayes net structure finding algorithms, rule learning algorithms, and feature selection algorithms, and we provide a number of empirical results comparing ADtree methods against traditional direct counting approaches. We also discuss the possible uses of ADtrees in other machine learning methods, and discuss the merits of ADtrees in comparison with alternative representations such as kd-trees, R-trees and Frequent Sets.

266 citations


Journal ArticleDOI
TL;DR: The results of this study indicate the potential of using combined morphological and texture features for computer-aided classification of microcalcifications.
Abstract: We are developing computerized feature extraction and classification methods to analyze malignant and benign microcalcifications on digitized mammograms. Morphological features that described the size, contrast, and shape of microcalcifications and their variations within a cluster were designed to characterize microcalcifications segmented from the mammographic background. Texture features were derived from the spatial gray-level dependence (SGLD) matrices constructed at multiple distances and directions from tissue regions containing microcalcifications. A genetic algorithm (GA) based feature selection technique was used to select the best feature subset from the multi-dimensional feature spaces. The GA-based method was compared to the commonly used feature selection method based on the stepwise linear discriminant analysis (LDA) procedure. Linear discriminant classifiers using the selected features as input predictor variables were formulated for the classification task. The discriminant scores output from the classifiers were analyzed by receiver operating characteristic (ROC) methodology and the classification accuracy was quantified by the area, A z , under the ROC curve. We analyzed a data set of 145 mammographic microcalcification clusters in this study. It was found that the feature subsets selected by the GA-based method are comparable to or slightly better than those selected by the stepwise LDA method. The texture features (A z =0.84) were more effective than morphological features (A z =0.79) in distinguishing malignant and benign microcalcifications. The highest classification accuracy (A z =0.89) was obtained in the combined texture and morphological feature space. The improvement was statistically significant in comparison to classification in either the morphological (p=0.002) or the texture (p=0.04) feature space alone. The classifier using the best feature subset from the combined feature space and an appropriate decision threshold could correctly identify 35% of the benign clusters without missing a malignant cluster. When the average discriminant score from all views of the same cluster was used for classification, the A z value increased to 0.93 and the classifier could identify 50% of the benign clusters at 100% sensitivity for malignancy. Alternatively, if the minimum discriminant score from all views of the same cluster was used, the A z value would be 0.90 and a specificity of 32% would be obtained at 100% sensitivity. The results of this study indicate the potential of using combined morphological and texture features for computer-aided classification of microcalcifications.

Book ChapterDOI
01 Jan 1998
TL;DR: Improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes, thus producing more comprehensible models.
Abstract: In the feature subset selection problem, a learning algorithm is faced with the problem of selecting a relevant subset of features upon which to focus its attention, while ignoring the rest. To achieve the best possible performance with a particular learning algorithm on a particular training set, a feature subset selection method should consider how the algorithm and the training set interact. We explore the relation between optimal feature subset selection and relevance. The wrapper method searches for an optimal feature subset tailored to a particular algorithm and a domain. We compare the wrapper approach to induction without feature subset selection and to Relief, a filter approach to feature subset selection. Improvement in accuracy is achieved for some datasets for the two families of induction algorithms used: decision trees and Naive-Bayes. In addition, the feature subsets selected by the wrapper are significantly smaller than the original subsets used by the learning algorithms, thus producing more comprehensible models.

Journal ArticleDOI
TL;DR: Experimental results suggest that the probabilistic algorithm is effective in obtaining optimal/suboptimal feature subsets and its incremental version expedites feature selection further when the number of patterns is large and can scale up without sacrificing the quality of selected features.
Abstract: Feature selection is a problem of finding relevant features. When the number of features of a dataset is large and its number of patterns is huge, an effective method of feature selection can help in dimensionality reduction. An incremental probabilistic algorithm is designed and implemented as an alternative to the exhaustive and heuristic approaches. Theoretical analysis is given to support the idea of the probabilistic algorithm in finding an optimal or near-optimal subset of features. Experimental results suggest that (1) the probabilistic algorithm is effective in obtaining optimal/suboptimal feature subsets; (2) its incremental version expedites feature selection further when the number of patterns is large and can scale up without sacrificing the quality of selected features.

Book ChapterDOI
21 Apr 1998
TL;DR: Experimental comparison given on real-world data collected from Web users shows that characteristics of the problem domain and machine learning algorithm should be considered when feature scoring measure is selected.
Abstract: This paper describes several known and some new methods for feature subset selection on large text data Experimental comparison given on real-world data collected from Web users shows that characteristics of the problem domain and machine learning algorithm should be considered when feature scoring measure is selected Our problem domain consists of hyperlinks given in a form of small-documents represented with word vectors In our learning experiments naive Bayesian classifier was used on text data The best performance was achieved by the feature selection methods based on the feature scoring measure called Odds ratio that is known from information retrieval

Journal ArticleDOI
TL;DR: Computational tests of three approaches to feature selection algorithm via concave minimization on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage method for reducing neural network complexity.
Abstract: The problem of discriminating between two finite point sets in n-dimensional feature space by a separating plane that utilizes as few of the features as possible is formulated as a mathematical program with a parametric objective function and linear constraints. The step function that appears in the objective function can be approximated by a sigmoid or by a concave exponential on the nonnegative real line, or it can be treated exactly by considering the equivalent linear program with equilibrium constraints. Computational tests of these three approaches on publicly available real-world databases have been carried out and compared with an adaptation of the optimal brain damage method for reducing neural network complexity. One feature selection algorithm via concave minimization reduced cross-validation error on a cancer prognosis database by 35.4% while reducing problem features from 32 to 4.

Proceedings Article
24 Jul 1998
TL;DR: A rigorous bound for generalization error under feature selection in thewrap-per model of feature selection suggests that, in the presence of many \irrelevant" features, the main source of error in wrapper model feature selection is from hold-out or cross-validation data.
Abstract: We consider feature selection in the \wrap-per" model of feature selection. This typically involves an NP-hard optimization problem that is approximated by heuristic search for a \good" feature subset. First considering the idealization where this optimization is performed exactly, we give a rigorous bound for generalization error under feature selection. The search heuristics typically used are then immediately seen as trying to achieve the error given in our bounds, and succeeding to the extent that they succeed in solving the optimization. The bound suggests that, in the presence of many \irrelevant" features, the main source of error in wrapper model feature selection is from \overrt-ting" hold-out or cross-validation data. This motivates a new algorithm that, again under the idealization of performing search exactly, has sample complexity (and error) that grows logarithmically in the number of \irrelevant" features { which means it can tolerate having a number of \irrelevant" features exponential in the number of training examples { and search heuristics are again seen to be directly trying to reach this bound. Experimental results on a problem using simulated data show the new algorithm having much higher tolerance to irrelevant features than the standard wrapper model. Lastly, we also discuss ramiications that sample complexity logarithmic in the number of irrelevant features might have for feature design in actual applications of learning.

Journal ArticleDOI
TL;DR: A learning algorithm based on soft consistency and completeness conditions is proposed that combines in a single process rule and feature selection and it is tested on different databases.

Journal ArticleDOI
TL;DR: It has been found that variable selection by simulated annealing (SA) enhances the model's robustness with respect to model transfer and also improves its predictive ability.

Proceedings ArticleDOI
06 Jan 1998
TL;DR: This study uses real-world financial credit-risk data to evaluate several feature selection methods as to their effectiveness in preprocessing input data in data mining systems.
Abstract: Recent advances in computing technology in terms of speed, cost, as well as access to tremendous amounts of computing power and the ability to process huge amounts of data in reasonable time have spurred increased interest in data mining applications. Machine learning has been one of the methods used in most of these data mining applications. The data used as input to any of these learning systems are the primary source of knowledge in terms of what is learned by these systems. There have been relatively few studies on preprocessing data used as input in these data mining systems. In this study, we evaluate several feature selection methods as to their effectiveness in preprocessing input data. We use real-world financial credit-risk data in evaluating these systems.

Book ChapterDOI
01 Jan 1998
TL;DR: This chapter introduces a categorization framework for feature weighting approaches used in lazy similarity learners and briefly surveys some examples in each category.
Abstract: Learning algorithms differ in the degree to which they process their inputs prior to their use in performance tasks. Many algorithms eagerly compile input samples and use only the compilations to make decisions. Others are lazy: they perform less precompilation and use the input samples to guide decision making. The performance of many lazy learners significantly degrades when samples are defined by features containing little or misleading information. Distinguishing feature relevance is a critical issue for these algorithms, and many solutions have been developed that assign weights to features. This chapter introduces a categorization framework for feature weighting approaches used in lazy similarity learners and briefly surveys some examples in each category.

Journal ArticleDOI
TL;DR: A new chemometric method based on the selection of the most important variables in discriminant partial least-squares (VS-DPLS) analysis is described, a simple extension of DPLS where a small number of elements in the weight vector w is retained for each factor.
Abstract: Variable selection enhances the understanding and interpretability of multivariate classification models. A new chemometric method based on the selection of the most important variables in discriminant partial least-squares (VS-DPLS) analysis is described. The suggested method is a simple extension of DPLS where a small number of elements in the weight vector w is retained for each factor. The optimal number of DPLS factors is determined by cross-validation. The new algorithm is applied to four different high-dimensional spectral data sets with excellent results. Spectral profiles from Fourier transform infrared spectroscopy and pyrolysis mass spectrometry are used. To investigate the uniqueness of the selected variables an iterative VS-DPLS procedure is performed. At each iteration, the previously found selected variables are removed to see if a new VS-DPLS classification model can be constructed using a different set of variables. In this manner, it is possible to determine regions rather than individual variables that are important for a successful classification.

01 Dec 1998
TL;DR: This dissertation addresses the problems of information access on the Internet with a system for topical information space navigation that combines the query-based and taxonomic approaches, and develops several Machine Learning methods to allow document collections to be automatically organized at a topical level.
Abstract: The explosion of on-line information has given rise to many query-based search engines (such as Alta Vista) and manually constructed topic hierarchies (such as Yahoo!). But with the current growth rate in the amount of information, query results grow incomprehensibly large and manual classification in topic hierarchies creates an immense information bottleneck. Therefore, these tools are rapidly becoming inadequate for addressing users' information needs. In this dissertation, we address these problems with a system for topical information space navigation that combines the query-based and taxonomic approaches. Our system, named SONIA (Service for Organizing Networked Information Autonomously), is implemented as part of the Stanford Digital Libraries testbed. It enables the creation of dynamic hierarchical document categorizations based on the full-text of articles. Using probability theory as a formal foundation, we develop several Machine Learning methods to allow document collections to be automatically organized at a topical level. First, to generate such topical hierarchies, we employ a novel probabilistic clustering scheme that outperforms traditional methods used in both Information Retrieval and Probabilistic Reasoning. Furthermore, we develop methods for classifying new articles into such automatically generated, or existing manually generated, hierarchies. In contrast to standard classification approaches which do not make use of the taxonomic relations in a topic hierarchy, our method explicitly uses the existing hierarchical relationships between topics, leading to improvements in classification accuracy. Much of this improvement is derived from the fact that the classification decisions in such a hierarchy can be made by considering only the presence (or absence) of a small number of features (words) in each document. The choice of relevant words is made using a novel information theoretic algorithm for feature selection. Many of the components developed as part of SONIA are also general enough that they have been successfully applied to data mining problems in different domains than text. The integration of hierarchical clustering and classification will allow large amounts of information to be organized and presented to users in a individualized and comprehensible way. By alleviating the information bottleneck, we hope to help users with the problems of information access on the Internet.

Journal ArticleDOI
TL;DR: Wavelet regression is used as an extension of the more traditional Fourier regression (where the modelling is performed in the frequency domain without taking into consideration any of the information in the time domain) and truncation of weight vectors in PLS was the most effective method for selecting variables.

Journal ArticleDOI
TL;DR: The nearest-neighbour criterion has been used to estimate the predictive accuracy of the classification based on the selected features, and it was found that the classification according to the first nearest neighbour is correct for 80% of the test samples.
Abstract: MOTIVATION Most of the existing methods for genetic sequence classification are based on a computer search for homologies in nucleotide or amino acid sequences. The standard sequence alignment programs scale very poorly as the number of sequences increases or the degree of sequence identity is <30%. Some new computationally inexpensive methods based on nucleotide or amino acid compositional analysis have been proposed, but prediction results are still unsatisfactory and depend on the features chosen to represent the sequences. RESULTS In this paper, a feature selection method based on the Gamma (or near-neighbour) test is proposed. If there is a continuous or smooth map from feature space to the classification target values, the Gamma test gives an estimate for the mean-squared error of the classification, despite the fact that one has no a priori knowledge of the smooth mapping. We can search a large space of possible feature combinations for a combination which gives a smallest estimated mean-squared error using a genetic algorithm. The method was used for feature selection and classification of the large subunits of rRNA according to RDP (Ribosomal Database Project) phylogenetic classes. The sequences were represented by dinucleotide frequency distribution. The nearest-neighbour criterion has been used to estimate the predictive accuracy of the classification based on the selected features. For examples discussed, we found that the classification according to the first nearest neighbour is correct for 80% of the test samples. If we consider the set of the 10 nearest neighbours, then 94% of the test samples are classified correctly. AVAILABILITY The principal novel component of this method is the Gamma test and this can be downloaded compiled for Unix Sun 4, Windows 95 and MS-DOS from http://www.cs.cf.ac.uk/ec/ CONTACT s.margetts@cs.cf.ac.uk

Journal ArticleDOI
TL;DR: The present work shows the great potential of GAs for feature selection (dimensionality reduction) problems and tested on a practical pattern recognition problem, which consisted on the discrimination between four seed species by artiÐcial vision.
Abstract: Genetic algorithms (GAs) are efficient search methods based on theparadigm of natural selection and population genetics. A simple GA was appliedfor selecting the optimal feature subset among an initial feature set of larger size.The performances were tested on a practical pattern recognition problem, whichconsisted on the discrimination between four seed species (two cultivated andtwo adventitious seed species) by artiÐcial vision. A set of 73 features, describingsize, shape and texture, were extracted from colour images in order to character-ise each seed. The goal of the GA was to select the best subset of features whichgave the highest classiÐcation rates when using the nearest neighbour as a classi-Ðcation method. The selected features were represented by binary chromosomeswhich had 73 elements. The number of selected features was directly related tothe probability of initialisation of the population at the Ðrst generation of theGA. When this probability was Ðxed to 0E1, the GA selected about Ðve features.The classiÐcation performances increased with the number of generations. Forexample, 6E25% of the seeds were misclassiÐed by using Ðve features at gener-ation 140, whereas another subset of the same size led to 3% misclassiÐcation atgeneration 400. The present work shows the great potential of GAs for featureselection (dimensionality reduction) problems. 1998 SCI.(J Sci Food Agric 76,77E86 (1998)Key words: feature selection; genetic algorithm; seed; colour image analysis;classiÐcation; discrimination

Journal ArticleDOI
01 May 1998
TL;DR: An unsupervised discovery method with biases geared toward partitioning objects into clusters that improve interpretability is described, and it is demonstrated that interpretability, from a problem-solving viewpoint, is addressed by the intraclass and interclass measures.
Abstract: The data exploration task can be divided into three interrelated subtasks: 1) feature selection, 2) discovery, and 3) interpretation. This paper describes an unsupervised discovery method with biases geared toward partitioning objects into clusters that improve interpretability. The algorithm ITERATE employs: 1) a data ordering scheme and 2) an iterative redistribution operator to produce maximally cohesive and distinct clusters. Cohesion or intraclass similarity is measured in terms of the match between individual objects and their assigned cluster prototype. Distinctness or interclass dissimilarity is measured by an average of the variance of the distribution match between clusters. The authors demonstrate that interpretability, from a problem-solving viewpoint, is addressed by the intraclass and interclass measures. Empirical results demonstrate the properties of the discovery algorithm and its applications to problem solving.

Journal ArticleDOI
TL;DR: A data-driven constructive-induction method that uses multiple operators to improve the representation space and is able to increase predictive accuracy by up to 29% in their test cases.
Abstract: An inductive learning program's ability to find an accurate hypothesis can depend on the quality of the representation space The authors have developed a data-driven constructive-induction method that uses multiple operators to improve the representation space They have applied it to two real-world problems Constructive-induction integrates ideas and methods previously considered separate: attribute selection, construction, and abstraction By integrating these methods into AQ17-DCI, they were able to increase predictive accuracy by up to 29% in their test cases

Journal ArticleDOI
TL;DR: A neuro-fuzzy methodology is described which involves connectionist minimization of a fuzzy feature evaluation index with unsupervised training and a set of optimal weighing coefficients in terms of networks parameters representing individual feature importance is obtained.

Journal ArticleDOI
TL;DR: This article provides an overview of the methods and techniques for statistical pattern recognition that, based on the user's level of knowledge of a problem, can reduce the problem's dimensionality.
Abstract: Choosing the best method for feature selection depends on the extent of a-priori knowledge of the problem. We present two basic approaches. One involves computationally effective floating-search methods; the other trades off the requirement for a-priori information for the requirement of sufficient data to represent the distributions involved. We've developed methods for statistical pattern recognition that, based on the user's level of knowledge of a problem, can reduce the problem's dimensionality. We believe that these methods can enrich the methodology of subset selection for other fields of AI. This article provides an overview of our methods and techniques. focusing on the basic principles and their potential use.

Book ChapterDOI
21 Apr 1998
TL;DR: A new measure is employed in this work that is monotonic and fast to compute that the search for relevant features according to this measure is guaranteed to be complete but not exhaustive.
Abstract: Feature selection is a problem of choosing a subset of relevant features In general, only exhaustive search can bring about the optimal subset With a monotonic measure, exhaustive search can be avoided without sacrificing optimality Unfortunately, most error- or distancebased measures are not monotonic A new measure is employed in this work that is monotonic and fast to compute The search for relevant features according to this measure is guaranteed to be complete but not exhaustive Experiments are conducted for verification