Open AccessProceedings Article
Lazy decision trees
Jerome H. Friedman
- pp 717-724
Reads0
Chats0
TLDR
This work proposes a lazy decision tree algorithm--LAZYDT--that conceptually constructs the "best" decision tree for each test instance, and is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees.Abstract:
Lazy learning algorithms, exemplified by nearest-neighbor algorithms, do not induce a concise hypothesis from a given training set; the inductive process is delayed until a test instance is given. Algorithms for constructing decision trees, such as C4.5, ID3, and CART create a single "best" decision tree during the training phase, and this tree is then used to classify test instances. The tests at the nodes of the constructed tree are good on average, but there may be better tests for classifying a specific instance. We propose a lazy decision tree algorithm--LAZYDT--that conceptually constructs the "best" decision tree for each test instance. In practice, only a path needs to be constructed, and a caching scheme makes the algorithm fast. The algorithm is robust with respect to missing values without resorting to the complicated methods usually seen in induction of decision trees. Experiments on real and artificial problems are presented.read more
Citations
More filters
Journal ArticleDOI
Top 10 algorithms in data mining
Xindong Wu,Vipin Kumar,J. Ross Quinlan,Joydeep Ghosh,Qiang Yang,Hiroshi Motoda,Geoffrey J. McLachlan,Angus S. K. Ng,Bing Liu,Philip S. Yu,Zhi-Hua Zhou,Michael Steinbach,David J. Hand,Dan Steinberg +13 more
TL;DR: This paper presents the top 10 data mining algorithms identified by the IEEE International Conference on Data Mining (ICDM) in December 2006: C4.5, k-Means, SVM, Apriori, EM, PageRank, AdaBoost, kNN, Naive Bayes, and CART.
Journal ArticleDOI
Mining with rarity: a unifying framework
TL;DR: It is demonstrated that rare classes and rare cases are very similar phenomena---both forms of rarity are shown to cause similar problems during data mining and benefit from the same remediation methods.
Journal ArticleDOI
An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics
TL;DR: This work carries out a thorough discussion on the main issues related to using data intrinsic characteristics in this classification problem, and introduces several approaches and recommendations to address these problems in conjunction with imbalanced data.
Book
Data Mining: The Textbook
TL;DR: This textbook explores the different aspects of data mining from the fundamentals to the complex data types and their applications, capturing the wide diversity of problem domains for data mining issues.
References
More filters
Book
Elements of information theory
Thomas M. Cover,Joy A. Thomas +1 more
TL;DR: The author examines the role of entropy, inequality, and randomness in the design of codes and the construction of codes in the rapidly changing environment.
Journal ArticleDOI
Classification and Regression Trees.
Book
C4.5: Programs for Machine Learning
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Book
Classification and regression trees
TL;DR: The methodology used to construct tree structured rules is the focus of a monograph as mentioned in this paper, covering the use of trees as a data analysis method, and in a more mathematical framework, proving some of their fundamental properties.