scispace - formally typeset
Open AccessProceedings Article

Lazy decision trees

Reads0
Chats0
TLDR
This work proposes a lazy decision tree algorithm--LAZYDT--that conceptually constructs the "best" decision tree for each test instance, and is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees.
Abstract
Lazy learning algorithms, exemplified by nearest-neighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single "best" decision tree during the training phase, and this tree is then used to classify test instances. The tests at the nodes of the constructed tree are good on average, but there may be better tests for classifying a specific instance. We propose a lazy decision tree algorithm--LAZYDT--that conceptually constructs the "best" decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. The algorithm is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees. Experiments on real and artificial problems are presented.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Mining with rarity: a unifying framework

TL;DR: It is demonstrated that rare classes and rare cases are very similar phenomena---both forms of rarity are shown to cause similar problems during data mining and benefit from the same remediation methods.
Journal ArticleDOI

An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics

TL;DR: This work carries out a thorough discussion on the main issues related to using data intrinsic characteristics in this classification problem, and introduces several approaches and recommendations to address these problems in conjunction with imbalanced data.
Book

Data Mining: The Textbook

TL;DR: This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues.

CART: Classification and Regression Trees

Dan Steinberg
TL;DR: 10.1 Antecedents .
References
More filters
Book

Elements of information theory

TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Book

C4.5: Programs for Machine Learning

TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Book

Classification and regression trees

Leo Breiman
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.