scispace - formally typeset
Search or ask a question

Showing papers on "C4.5 algorithm published in 2005"


Journal ArticleDOI
01 Sep 2005
TL;DR: The authors propose an efficient algorithm that uses an extended decision tree data structure and constructs any node that is common to multiple decision trees only once and reports on results demonstrating the efficiency of the algorithm in this paper.
Abstract: A shortcoming of univariate decision tree learners is that they do not learn intermediate concepts and select only one of the input features in the branching decision at each intermediate tree node. It has been empirically demonstrated that cascading other classification methods, which learn intermediate concepts, with decision tree learners can alleviate such representational bias of decision trees and potentially improve classification performance. However, a more complex model that fits training data better may not necessarily perform better on unseen data, commonly referred to as the overfitting problem. To find the most appropriate degree of such cascade generalization, a decision forest (i.e., a set of decision trees with other classification models cascaded to different degrees) needs to be generated, from which the best decision tree can then be identified. In this paper, the authors propose an efficient algorithm for generating such decision forests. The algorithm uses an extended decision tree data structure and constructs any node that is common to multiple decision trees only once. The authors have empirically evaluated the algorithm using 32 data sets for classification problems from the University of California, Irvine (UCI) machine learning repository and report on results demonstrating the efficiency of the algorithm in this paper.

20 citations


Proceedings ArticleDOI
28 Mar 2005
TL;DR: A new construction algorithm for binary oblique decision tree classifier, MESODT, which combines Multimembered evolution strategies integrated with the perceptron algorithm as the optimization algorithm to find the appropriate split that minimizes the evaluation function at each node of a decision tree.
Abstract: A new construction algorithm for binary oblique decision tree classifier, MESODT, is described. Multimembered evolution strategies (μ,λ) integrated with the perceptron algorithm is adopted as the optimization algorithm to find the appropriate split that minimizes the evaluation function at each node of a decision tree. To better explore the benefits of this optimization algorithm, two splitting rules, the criterion based on the concept of degree of linear separability, and one of the traditional impurity measures -- information gain, are each applied to MESODT. The experiments conducted on public and artificial domains demonstrate that the trees generated by MESODT have, in most cases, higher accuracy and smaller size than the classical oblique decision trees (OC1) and axis-parallel decision trees (See5.0). Comparison with (1+1) evolution strategies is also described.

4 citations


Journal Article
TL;DR: Case study shows that this mining algorithm can classify data of employment correctly and find some valuable results for analysis and decision.
Abstract: This paper presents a data mining model to deal with the employment of university graduates.The decision tree is very effective means for cassification,which is proposed according to the characteristics of employment data and C4.5 algorithm.The C4.5 algorithm is improved from ID3 algorithm that is the core algorithm in the decision tree.The C4.5 algorithm is suitable for its simple construction,high processing speed and easy implementation.The model includes preprocess of the data of employment,selection of decision attributes,implementation of mining algorithm,and obtainment of rules from the decision tree.The rules point out which decision attributes decide the classification of employers.Case study shows that this mining algorithm can classify data of employment correctly and find some valuable results for analysis and decision.

3 citations


Journal Article
TL;DR: The experimental results show that the decision tree algorithm ID3 and C4.5 for medical image data mining performs well in accuracy, verifying the great potential of data minging in assistant medical treatment.
Abstract: Aim To study the application of decision tree algorithm for medical image data mining.Methods Decision tree algorithms are applied to the data mining of the mammography classification, proposes a medical images classifier based on decision tree algorithm.Results The decision tree algorithm ID3 and C4.5 for medical image data mining are realized,the experiment results are given.Conclusion The experimental results show that the system performs well in accuracy,verifying the great potential of data minging in assistant medical treatment.

3 citations


Journal Article
TL;DR: According to the characteristic and essence of data mining and taking advantage of Bayesian method, a new classification method named BD1.0 algorithm was presented, which combined the prior information and information gain method of decision tree.
Abstract: According to the characteristic and essence of data mining and taking advantage of Bayesian method,a new classification method named BD10 algorithm was presentedThis method combined the prior information and information gain method of decision treeThe design and analysis of the algorithm was introduced tooThe experiment results show that the algorithm can deal with dirty data such as incomplete data or inconsistent data,and it is more accurate than only useing Bayesian method or decision treeIt has approximate time complexity with C45 algorithm

3 citations


01 Jan 2005
TL;DR: Choi et al. as mentioned in this paper analyzed the efficiency of the C4.5 algorithm to classify types of industrial accidents data and identify potential weak points in disaster risk grouping, and provided an acceptable level of accuracy(%) and error rate(%) in order to measure tree accuracy about created trees.
Abstract: Decision tree algorithm is one of the data mining techniques, which conducts grouping or prediction into several sub-groups from interested groups. This technique can analyze a feature of type on groups and can be used to detect differences in the type of industrial accidents. This paper uses C4.5 algorithm for the feature analysis. The data set consists of 24,887 features through data selection from total data of 25,159 taken from 2 year observation of industrial accidents in Korea For the purpose of this paper, one target value and eight independent variables are detailed by type of industrial accidents. There are 222 total tree nodes and 151 leaf nodes after grouping. This paper Provides an acceptable level of accuracy(%) and error rate(%) in order to measure tree accuracy about created trees. The objective of this paper is to analyze the efficiency of the C4.5 algorithm to classify types of industrial accidents data and thereby identify potential weak points in disaster risk grouping.

2 citations


Book ChapterDOI
06 Jul 2005
TL;DR: A novel decision tree algorithm, called Real-Coded Genetic Algorithm-based Linear Decision Tree Algorithm with k-D Trees (RCGA-based LDT with kDT), is proposed that speeds up the construction of linear decision trees without sacrificing the quality of the constructed decision trees.
Abstract: Although genetic algorithm-based decision tree algorithms are applied successfully in various classification tasks, their execution times are quite long on large datasets. A novel decision tree algorithm, called Real-Coded Genetic Algorithm-based Linear Decision Tree Algorithm with k-D Trees (RCGA-based LDT with kDT), is proposed. In the proposed algorithm, a k-D tree is built when a new node of a linear decision tree is created. The use of k-D trees speeds up the construction of linear decision trees without sacrificing the quality of the constructed decision trees.

2 citations


Journal Article
TL;DR: The algorithm adopts a new pruning method,redictive pruning, which makes use of variable precision positive areas to revise the partition pattern of attribute to the data set at a tree node before the calculation of choosing attribute, thus more effectively eliminating the effect of noise data on choosing attributes and generating leaf nodes.
Abstract: For the problem that classical classification algorithms such as value reduction algorithm based on Rough set are not suitable for large data sets, this paper proposes a decision tree algorithm based on Rough set. The algorithm takes a novel measure-attribute classification rough degree as the heuristic of choosing attribute at a tree node, which more synthetically measures contribution of an attribute for classification than other measures in Rough set and is simpler in calculation than information gain and information gain ratio. The algorithm adopts a new pruning method,predictive pruning, which makes use of variable precision positive areas to revise the partition pattern of attribute to the data set at a tree node before the calculation of choosing attribute, thus more effectively eliminating the effect of noise data on choosing attributes and generating leaf nodes. The algorithm takes a simple and efficient method to deal with inconsistent data, which is highly merged with decision tree algorithm, hence it can deal with both consistent and inconsistent data efficiently. The mining results of 6 data sets ol UCI machine learning repository show that the size of trees generated by the algorithm is smaller than that by ID3, and is at the same scale as that generated by the decision tree algorithm using information gain ratio as heuristic. The algorithm can directly generate decision trees or classification rule sets and is easy to realize by database technology, which makes it suitable for large data sets.

2 citations


Journal Article
TL;DR: An improved and C4.5 based algorithm is introduced by adding boosting techniques, which thus improves accuracy and is used to analyze supermarket customer data.
Abstract: Abstrcat The key to the management of a modern company is the customer value analysis The classification algorithm can deal with this analysis very well The Decision Tree Algorithm, especially C45, is an important kind of classification algorithm However, C45 Algorithm has some shortcomings in accuracy An improved and C45 based algorithm is introduced by adding boosting techniques, which thus improves accuracy The paper uses the new algorithm to analyze supermarket customer data The experiment proves that the accuracy of the improved algorithm is better than C45

1 citations


Proceedings ArticleDOI
07 Nov 2005
TL;DR: In this paper, the idea of algorithm for building a decision tree is introduced by comparing the algorithm of information gain or entropy with the theory of rough sets, and the method of constructing decision Tree is discussed.
Abstract: Classification is an important problem in data mining. Given a database of records, each with a class label, a classifier generates a concise and meaningful description for each class that can be used to classify subsequent records. A number of popular classifiers construct decision trees to generate class models. In this paper, the idea of algorithm for building a decision tree is introduced by comparing the algorithm of information gain or entropy. According to the theory of rough sets, the method of constructing decision tree is discussed. The produced process of decision tree is given as an example of surface modeling. Compared with ID3 algorithm, the complexity of decision tree is decreased, the construction of decision tree is optimized the better rule of data mining could be built.