scispace - formally typeset
Search or ask a question

Showing papers on "C4.5 algorithm published in 1997"


01 Jan 1997
TL;DR: The Cultural Algorithm approach often produced substantial reductions in tree size as well as increases in accuracy over the other two approaches, particularly in situations where there was a good deal of variability in site description, which suggests that Evolutionary-enhanced decision tree approaches are practical solutions to complex data discovery problems.
Abstract: Recently decision trees have been used in data mining application to extract new concepts. While current decision tree algorithms exhibit many improvements over earlier versions, there are still problems with the generation of optimal trees in situations that use attributes that vary widely in their possible outcomes. Quinlan's gain-ratio measure has been needed to reduce the bias towards variables with multiple categories inherent in earlier approaches. Utgoff has incorporated this approach into an incremental learning algorithm based upon decision tree restructuring, ITI. One of the operations is to locally "Pull up attributes" that are useful in processing a new case. Here we extend the ITI algorithm in order to allow the global "pulling up" of building blocks of variables that provide optimal partitioning when used at higher levels of the trees. These variables may, in fact, not be characterized by many choices. To this end, ITI is embedded in an evolutionary algorithm, Cultural Algorithm. This algorithm is used to guide the generation of subsets of possible attributes that are most likely to produce optimal trees. It uses an EP algorithm as the population component and icons that represent generalization of successful individuals in the Belief Space. A comparison of this approach with that of ITI, and the only Evolutionary Programming Algorithm on its own are applied to learning of 26 spatio-temporally related concepts concerning site location and evidence for warfare. The tests were run using a large-scale spatio-temporal database consisting of the results of a long term survey of archeological sites in the Valley of Oaxaca, Mexico between 9000 BC and 1500 AD, the time of Spanish conquest. Cultural Algorithms produced a tree with the best performance score in all 26 concept formation problems when compared to EP-alone and ITI alone. The Cultural Algorithm approach often produced substantial reductions in tree size as well as increases in accuracy over the other two approaches, particularly in situations where there was a good deal of variability in site description. In situations where there was reduced variability, ITI by itself was able to produce some innovative new results relating to state formation in the valley. It was also shown that the overhead (in terms of CPU time) was substantially less than required in the worst case applications for both of the Evolutionary-extensions given here. This suggests that Evolutionary-enhanced decision tree approaches are practical solutions to complex data discovery problems.

14 citations