scispace - formally typeset
Open AccessProceedings Article

Further Research on Feature Selection and Classification Using Genetic Algorithms

Reads0
Chats0
TLDR
This paper summarizes work on an approach that combines feature selection and data classiication using Genetic Algorithms combined with a K-nearest neighbor algorithm to optimize classiications by searching for an optimal feature weight-ing, essentially warping the feature space to coalesce individuals within groups and to separate groups from one another.
Abstract
This paper summarizes work on an approach that combines feature selection and data classiication using Genetic Algorithms. First, it describes our use of Genetic Algorithms combined with a K-nearest neighbor algorithm to optimize classiication by searching for an optimal feature weight-ing, essentially warping the feature space to coalesce individuals within groups and to separate groups from one another. This approach has proven especially useful with large data sets where standard feature selection techniques are computationally expensive. Second, it describes our implementation of the approach in a parallel processing environment, giving nearly linear speed-up in processing time. Third, it will summarize our present results in using the technique to discover the relative importance of features in large biological test sets. Finally, it will indicate areas for future research. 1 The Problem We live in the age of information where data is plentiful , to the extent that we are typically unable to process all of it usefully. Computer science has been challenged to discover approaches that can sort through the mountains of data available and discover the essential features needed to answer a speciic question. These approaches must be able to process large quantities of data, in reasonable time and in the presence of \noisy" data i.e., irrelevant or erroneous data. Consider a typical example in biology. Researchers in the Center for Microbial Ecology (CME) have selected soil samples from three environments found in agriculture. The environments were: near the roots of a crop (rhizosphere), away from the innuence of the crop roots (non-rhizosphere), and from a fallow eld (crop residue). The CME researchers wished to investigate whether samples from those three environments could be distinguished. In particular, they wanted to see if diversity decreased in the rhizosphere as a result of the symbiotic relationship between the roots and its near-neighbor microbes, and if so in what ways. Their rst experiments used the Biolog c test as the discriminator. Biolog consists of a plate of 96 wells, with a diierent substrate in each well. These sub-strates (various sugars, amino acids and other nutrients) are assimilated by some microbes and not by others. If the microbial sample processes the substrate in the well, that well changes color which can be recorded photometrically. Thus large numbers of samples can be processed and characterized based on the substrates they can assimilate. The CME researchers applied the Biolog test to 3 sets of 100 samples …

read more

Citations
More filters
Journal ArticleDOI

A survey on feature selection methods

TL;DR: The objective is to provide a generic introduction to variable elimination which can be applied to a wide array of machine learning problems and focus on Filter, Wrapper and Embedded methods.
Journal ArticleDOI

Feature selection: evaluation, application, and small sample performance

TL;DR: This work studies the problem of choosing an optimal feature set for land use classification based on SAR satellite images using four different texture models and shows that pooling features derived from different texture Models, followed by a feature selection results in a substantial improvement in the classification accuracy.
Journal ArticleDOI

Feature subset selection using a genetic algorithm

TL;DR: The authors' approach uses a genetic algorithm to select subsets of attributes or features to represent the patterns to be classified, achieving multicriteria optimization in terms of generalization accuracy and costs associated with the features.
Journal ArticleDOI

Dimensionality reduction using genetic algorithms

TL;DR: This work presents a new approach to feature extraction in which feature selection and extraction and classifier training are performed simultaneously using a genetic algorithm, and employs this technique in combination with the k nearest neighbor classification rule.

Special Issue on data mining and knowledge discovery with evolutionary algorithms

TL;DR: This book integrates two areas of computer science, namely data mining and evolutionary algorithms, and emphasizes the importance of discovering comprehensible, interesting knowledge, which is potentially useful for intelligent decision making.
References
More filters
Proceedings Article

Fine-Grained Parallel Genetic Algorithms

Book ChapterDOI

Parallel genetic algorithms, population genetics and combinatorial optimization

TL;DR: This paper has applied ASPARAGOS to an important combinatorial optimization problem, the quadratic assignment problem, and found a new optimum for the largest published problem.
Book

A note on genetic algorithms for large-scale feature selection

TL;DR: In this paper, the use of genetic algorithms (GA) for the selection of features in the design of automatic pattern classifiers was introduced and preliminary results suggest that GA is a powerful means of reducing the time for finding near-optimal subsets of features from large sets.