Topic

ID3 algorithm

About: ID3 algorithm is a research topic. Over the lifetime, 2309 publications have been published within this topic receiving 115546 citations. The topic is also known as: Iterative Dichotomiser 3.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

An Empirical Comparison of Selection Measures for Decision-Tree Induction

[...]

John Mingers¹•Institutions (1)

University of Warwick¹

01 Mar 1989-Machine Learning

TL;DR: The paper considers a number of different measures and experimentally examines their behavior in four domains and shows that the choice of measure affects the size of a tree but not its accuracy, which remains the same even when attributes are selected randomly.

...read moreread less

Abstract: One approach to induction is to develop a decision tree from a set of examples. When used with noisy rather than deterministic data, the method involves three main stages – creating a complete tree able to classify all the examples, pruning this tree to give statistical reliability, and processing the pruned tree to improve understandability. This paper is concerned with the first stage – tree creation – which relies on a measure for “goodness of split,” that is, how well the attributes discriminate between classes. Some problems encountered at this stage are missing data and multi-valued attributes. The paper considers a number of different measures and experimentally examines their behavior in four domains. The results show that the choice of measure affects the size of a tree but not its accuracy, which remains the same even when attributes are selected randomly.

...read moreread less

502 citations

Journal Article•DOI•

Mining Educational Data to Analyze Students" Performance

[...]

Brijesh Kumar Baradwaj, Saurabh Pal

01 Jan 2011-International Journal of Advanced Computer Science and Applications

TL;DR: In this article, a data mining model for higher education system in the university is presented, where the classification task is used to evaluate student's performance and as there are many approaches that are used for data classification, the decision tree method is used here.

...read moreread less

Abstract: The main objective of higher education institutions is to provide quality education to its students. One way to achieve highest level of quality in higher education system is by discovering knowledge for prediction regarding enrolment of students in a particular course, alienation of traditional classroom teaching model, detection of unfair means used in online examination, detection of abnormal values in the result sheets of the students, prediction about students' performance and so on. The knowledge is hidden among the educational data set and it is extractable through data mining techniques. Present paper is designed to justify the capabilities of data mining techniques in context of higher education by offering a data mining model for higher education system in the university. In this research, the classification task is used to evaluate student's performance and as there are many approaches that are used for data classification, the decision tree method is used here. By this task we extract knowledge that describes students' performance in end semester examination. It helps earlier in identifying the dropouts and students who need special attention and allow the teacher to provide appropriate advising/counseling. Keywords-Educational Data Mining (EDM); Classification; Knowledge Discovery in Database (KDD); ID3 Algorithm.

...read moreread less

492 citations

Journal Article•DOI•

A Distance-Based Attribute Selection Measure for Decision Tree Induction

[...]

R. Lopez de Mantaras¹•Institutions (1)

Spanish National Research Council¹

03 Jan 1991-Machine Learning

TL;DR: A new attribute selection measure for ID3-like inductive algorithms based on a distance between partitions such that the selected attribute in a node induces the partition which is closest to the correct partition of the subset of training examples corresponding to this node.

...read moreread less

Abstract: This note introduces a new attribute selection measure for ID3-like inductive algorithms. This measure is based on a distance between partitions such that the selected attribute in a node induces the partition which is closest to the correct partition of the subset of training examples corresponding to this node. The relationship of this measure with Quinlan's information gain is also established. It is also formally proved that our distance is not biased towards attributes with large numbers of values. Experimental studies with this distance confirm previously reported results showing that the predictive accuracy of induced decision trees is not sensitive to the goodness of the attribute selection measure. However, this distance produces smaller trees than the gain ratio measure of Quinlan, especially in the case of data whose attributes have significantly different numbers of values.

...read moreread less

458 citations

Journal Article•DOI•

CAIM discretization algorithm

[...]

Lukasz Kurgan¹, Krzysztof J. Cios²•Institutions (2)

University of Alberta¹, University of Colorado Denver²

01 Feb 2004-IEEE Transactions on Knowledge and Data Engineering

TL;DR: The tests performed using CAIM and six other state-of-the-art discretization algorithms show that discrete attributes generated by the CAIM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency.

...read moreread less

Abstract: The task of extracting knowledge from databases is quite often performed by machine learning algorithms. The majority of these algorithms can be applied only to data described by discrete numerical or nominal attributes (features). In the case of continuous attributes, there is a need for a discretization algorithm that transforms continuous attributes into discrete ones. We describe such an algorithm, called CAIM (class-attribute interdependence maximization), which is designed to work with supervised data. The goal of the CAIM algorithm is to maximize the class-attribute interdependence and to generate a (possibly) minimal number of discrete intervals. The algorithm does not require the user to predefine the number of intervals, as opposed to some other discretization algorithms. The tests performed using CAIM and six other state-of-the-art discretization algorithms show that discrete attributes generated by the CAIM algorithm almost always have the lowest number of intervals and the highest class-attribute interdependency. Two machine learning algorithms, the CLIP4 rule algorithm and the decision tree algorithm, are used to generate classification rules from data discretized by CAIM. For both the CLIP4 and decision tree algorithms, the accuracy of the generated rules is higher and the number of the rules is lower for data discretized using the CAIM algorithm when compared to data discretized using six other discretization algorithms. The highest classification accuracy was achieved for data sets discretized with the CAIM algorithm, as compared with the other six algorithms.

...read moreread less

436 citations

Journal Article•DOI•

Learning classification trees

[...]

Wray Buntine¹•Institutions (1)

Ames Research Center¹

01 Jun 1992-Statistics and Computing

TL;DR: This paper introduces Bayesian techniques for splitting, smoothing, and tree averaging, which are similar to Quinlan's information gain, while smoothing and averaging replace pruning.

...read moreread less

Abstract: Algorithms for learning classification trees have had successes in artificial intelligence and statistics over many years. This paper outlines how a tree learning algorithm can be derived using Bayesian statistics. This introduces Bayesian techniques for splitting, smoothing, and tree averaging. The splitting rule is similar to Quinlan's information gain, while smoothing and averaging replace pruning. Comparative experiments with reimplementations of a minimum encoding approach,c4 (Quinlanet al., 1987) andcart (Breimanet al., 1984), show that the full Bayesian algorithm can produce more accurate predictions than versions of these other approaches, though pays a computational price.

...read moreread less

418 citations

Collapse

Network Information

Performance

Metrics

2,409

Papers

127,739

Citations

No. of papers in the topic in previous years
Year	Papers
2023	31
2022	69
2021	22
2020	18
2019	42
2018	20

ID3 algorithm

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics