Further Research on Feature Selection and Classification Using Genetic Algorithms

Open AccessProceedings Article

Further Research on Feature Selection and Classification Using Genetic Algorithms

William F. Punch, +5 more

- pp 557-564

Chats0

TLDR

This paper summarizes work on an approach that combines feature selection and data classiication using Genetic Algorithms combined with a K-nearest neighbor algorithm to optimize classiications by searching for an optimal feature weight-ing, essentially warping the feature space to coalesce individuals within groups and to separate groups from one another.

Abstract:

This paper summarizes work on an approach that combines feature selection and data classiication using Genetic Algorithms. First, it describes our use of Genetic Algorithms combined with a K-nearest neighbor algorithm to optimize classiication by searching for an optimal feature weight-ing, essentially warping the feature space to coalesce individuals within groups and to separate groups from one another. This approach has proven especially useful with large data sets where standard feature selection techniques are computationally expensive. Second, it describes our implementation of the approach in a parallel processing environment, giving nearly linear speed-up in processing time. Third, it will summarize our present results in using the technique to discover the relative importance of features in large biological test sets. Finally, it will indicate areas for future research. 1 The Problem We live in the age of information where data is plentiful , to the extent that we are typically unable to process all of it usefully. Computer science has been challenged to discover approaches that can sort through the mountains of data available and discover the essential features needed to answer a speciic question. These approaches must be able to process large quantities of data, in reasonable time and in the presence of \noisy" data i.e., irrelevant or erroneous data. Consider a typical example in biology. Researchers in the Center for Microbial Ecology (CME) have selected soil samples from three environments found in agriculture. The environments were: near the roots of a crop (rhizosphere), away from the innuence of the crop roots (non-rhizosphere), and from a fallow eld (crop residue). The CME researchers wished to investigate whether samples from those three environments could be distinguished. In particular, they wanted to see if diversity decreased in the rhizosphere as a result of the symbiotic relationship between the roots and its near-neighbor microbes, and if so in what ways. Their rst experiments used the Biolog c test as the discriminator. Biolog consists of a plate of 96 wells, with a diierent substrate in each well. These sub-strates (various sugars, amino acids and other nutrients) are assimilated by some microbes and not by others. If the microbial sample processes the substrate in the well, that well changes color which can be recorded photometrically. Thus large numbers of samples can be processed and characterized based on the substrates they can assimilate. The CME researchers applied the Biolog test to 3 sets of 100 samples …

Further Research on Feature Selection and Classification Using Genetic Algorithms

Citations

A survey on feature selection methods

Feature selection: evaluation, application, and small sample performance

Feature subset selection using a genetic algorithm

Dimensionality reduction using genetic algorithms

Special Issue on data mining and knowledge discovery with evolutionary algorithms

References

Fine-Grained Parallel Genetic Algorithms

Parallel genetic algorithms, population genetics and combinatorial optimization

A note on genetic algorithms for large-scale feature selection

parallel genetic algorithm

Hybridizing the Genetic Algorithm and the K Nearest Neighbors Classification Algorithm.

Related Papers (5)

Irrelevant features and the subset selection problem

Adaptation in natural and artificial systems

Genetic algorithms in search, optimization and machine learning

Wrappers for feature subset selection

UCI Repository of machine learning databases