Showing papers on "C4.5 algorithm published in 1996"

PDF

Open Access

Book Chapter•DOI•

SLIQ: A Fast Scalable Classifier for Data Mining

[...]

Manish Mehta¹, Rakesh Agrawal¹, Jorma Rissanen¹•Institutions (1)

25 Mar 1996

TL;DR: Issues in building a scalable classifier are discussed and the design of SLIQ, a new classifier that uses a novel pre-sorting technique in the tree-growth phase to enable classification of disk-resident datasets is presented.

...read moreread less

Abstract: Classification is an important problem in the emerging field of data mining Although classification has been studied extensively in the past, most of the classification algorithms are designed only for memory-resident data, thus limiting their suitability for data mining large data sets This paper discusses issues in building a scalable classifier and presents the design of SLIQ, a new classifier SLIQ is a decision tree classifier that can handle both numeric and categorical attributes It uses a novel pre-sorting technique in the tree-growth phase This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining

...read moreread less

860 citations

Journal Article•DOI•

Learning decision tree classifiers

[...]

J. R. Quinlan¹•Institutions (1)

University of Sydney¹

01 Mar 1996-ACM Computing Surveys

TL;DR: A decision tree is a formalism for expressing a mapping from attribute values to classes and is either a leaf node labeled with a class or a structure consisting of a test node linked to two or more subtrees.

...read moreread less

Abstract: Inductive inference is the process of moving from concrete examples to general models. In one form, the goal is to learn how to classify objects or situations by analyzing a set of instances whose classes are known. Classes here are mutually exclusive labels such as medical diagnoses, qualitative economic projections, image categories, or failure modes. Instances are typically represented as attribute-value vectors that give the numerical or nominal values of a fixed collection of properties. Learning input consists of a set of such vectors, each belonging to a known class, and the output consists of a mapping from attribute values to classes. This mapping should accurately classify both the given instances and other unseen instances. A decision tree is a formalism for expressing such mappings. A tree is either a leaf node labeled with a class or a structure consisting of a test node linked to two or more subtrees. A test node computes some outcome based on the attribute values of an instance, where each possible outcome is associated with one of the subtrees. An instance is classified by starting at the root node of the tree. If this node is a test, the outcome for the instance is determined and the process continues using the appropriate subtree. When a leaf is eventually encountered, its label gives the predicted class of the instance. A decision tree can be constructed from a set of instances by a divide-andconquer strategy. If all the instances belong to the same class, the tree is a leaf with that class as label. Otherwise, a test is chosen that has different outcomes for at least two of the instances, which are partitioned according to this outcome. The tree has as its root a node specifying the test and, for each outcome in turn, the corresponding subtree is obtained by applying the same procedure to the subset of instances with that outcome. From a geometric perspective, a set of x attributes defines an x-dimensional description space in which each instance is a point. Partitioning a set of instances according to test outcome corresponds to inserting decision surfaces in this space. In many systems, each test is constrained to reference the value of a single attribute A, so that a test outcome might be A 5 c for some value c of a nominal attribute, or A , t in which a numeric attribute is compared to a threshold t. Surfaces produced by single-attribute tests are hyperplanes orthogonal to the tested attribute A; any decision tree using only such tests will partition the description space into hyperrectangles, each associated with a class. More complex tests can be used to reduce the number of times that the instances are subdivided, thus avoiding the data fragmentation problem. For multivalued nominal attributes, the simplest generalization is to subset tests A [ {c1, c2, . . . }. Other sophisticated tests use more than one attribute, for example, logical combinations, sets of conditions, or linear combinations of numeric attributes (this latter permitting decision surfaces that are arbitrary hyperplanes). Unless there are identical attributevalue vectors labeled with different classes, any choice of tests leads to a tree that is consistent with the in-

...read moreread less

297 citations

Proceedings Article•

An efficient algorithm for finding optimal gain-ratio multiple-split tests on hierarchical attributes in decision tree learning

[...]

Hussein Almuallim¹, Yasuhiro Akiba, Shigeo Kaneda•Institutions (1)

King Fahd University of Petroleum and Minerals¹

04 Aug 1996

TL;DR: An efficient algorithm is introduced for solving the problem of finding a multiple-split test defined on x that maximizes Quinlan's gain-ratio measure for a set of training examples S and a tree-structured attribute x.

...read moreread less

Abstract: Given a set of training examples S and a tree-structured attribute x, the goal in this work is to find a multiple-split test defined on x that maximizes Quinlan's gain-ratio measure. The number of possible such multiple-split tests grows exponentially in the size of the hierarchy associated with the attribute. It is, therefore, impractical to enumerate and evaluate all these tests in order to choose the best one. We introduce an efficient algorithm for solving this problem that guarantees maximizing the gain-ratio over all possible tests. For a training set of m examples and an attribute hierarchy of height d, our algorithm runs in time proportional to dm, which makes it efficient enough for practical use.

...read moreread less

11 citations

Fast Decision Tree Ensembles for Optical Character Recognition

[...]

Harris Drucker¹•Institutions (1)

Monmouth University¹

01 Jan 1996

TL;DR: A new boosting algorithm of Freund and Schapire is used to improve the performance of an ensemble of decision trees which are constructed using the information ratio criterion of Quinlan's C4.5 algorithm, able to obtain a speed up of a factor of eight over the neural network and yet achieve a much lower error rate than the tree ensemble alone.

...read moreread less

Abstract: A new boosting algorithm of Freund and Schapire is used to improve the performance of an ensemble of decision trees which are constructed using the information ratio criterion of Quinlan’s C4.5 algorithm. This boosting algorithm iteratively constructs a series of decision trees, each decision tree being trained and pruned on examples that have been filtered by previously trained trees. Examples that have been incorrectly classified by the previous trees in the ensemble are resampled with higher probability to give a new probability distribution for the next tree in the ensemble to train on. By combining the very fast decision tree ensemble with a more accurate (but slower) neural network, we are able to obtain a speed up of a factor of eight over the neural network and yet achieve a much lower error rate than the tree ensemble alone.

...read moreread less

6 citations