scispace - formally typeset
Search or ask a question
Topic

Decision tree model

About: Decision tree model is a research topic. Over the lifetime, 2256 publications have been published within this topic receiving 38142 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Efficient algorithms for the construction of optimal decision trees and optimal one-time-only branching programs for symmetric Boolean functions are presented and an exponential lower bound on the decision tree complexity of some Boolean function is shown having linear formula size.
Abstract: Combinational complexity and depth are the most important complexity measures for Boolean functions. It has turned out to be very hard to prove good lower bounds on the combinational complexity or the depth of explicitly defined Boolean functions. Therefore one has restricted oneself to models where nontrivial lower bounds are easier to prove. Here decision trees, branching programs, and one-time-only branching programs are considered, where each variable may be tested on each path of computation only once. Efficient algorithms for the construction of optimal decision trees and optimal one-time-only branching programs for symmetric Boolean functions are presented. Furthermore, the following trade-off results are proved. An exponential lower bound on the decision tree complexity of some Boolean function is shown having linear formula size and linear one-time-only branching program complexity. Furthermore, a quadratic lower bound on the one-time-only branching program complexity of some Boolean function is shown having linear combinational complexity.

41 citations

Journal ArticleDOI
01 Oct 2015
TL;DR: The results reveal that the hierarchy and the multiple labels do help to obtain a better single tree model, while this is not preserved for the ensemble models.
Abstract: We address the task of hierarchical multi-label classification (HMC). HMC is a task of structured output prediction where the classes are organized into a hierarchy and an instance may belong to multiple classes. In many problems, such as gene function prediction or prediction of ecological community structure, classes inherently follow these constraints. The potential for application of HMC was recognized by many researchers and several such methods were proposed and demonstrated to achieve good predictive performances in the past. However, there is no clear understanding when is favorable to consider such relationships (hierarchical and multi-label) among classes, and when this presents unnecessary burden for classification methods. To this end, we perform a detailed comparative study over 8 datasets that have HMC properties. We investigate two important influences in HMC: the multiple labels per example and the information about the hierarchy. More specifically, we consider four machine learning tasks: multi-label classification, hierarchical multi-label classification, single-label classification and hierarchical single-label classification. To construct the predictive models, we use predictive clustering trees (a generalized form of decision trees), which are able to tackle each of the modelling tasks listed. Moreover, we investigate whether the influence of the hierarchy and the multiple labels carries over for ensemble models. For each of the tasks, we construct a single tree and two ensembles (random forest and bagging). The results reveal that the hierarchy and the multiple labels do help to obtain a better single tree model, while this is not preserved for the ensemble models.

40 citations

Journal ArticleDOI
TL;DR: By trading-off comprehensibility and performance using a multi-objective genetic programming optimization algorithm, this paper can induce polynomial-fuzzy decision trees (PFDT) that are smaller, more compact and of better performance than their linear decision tree (LDT) counterparts.
Abstract: Decision tree induction has been studied extensively in machine learning as a solution for classification problems. The way the linear decision trees partition the search space is found to be comprehensible and hence appealing to data modelers. Comprehensibility is an important aspect of models used in medical data mining as it determines model credibility and even acceptability. In the practical sense though, inordinately long decision trees compounded by replication problems detracts from comprehensibility. This demerit can be partially attributed to their rigid structure that is unable to handle complex non-linear or/and continuous data. To address this issue we introduce a novel hybrid multivariate decision tree composed of polynomial, fuzzy and decision tree structures. The polynomial nature of these multivariate trees enable them to perform well in non-linear territory while the fuzzy members are used to squash continuous variables. By trading-off comprehensibility and performance using a multi-objective genetic programming optimization algorithm, we can induce polynomial-fuzzy decision trees (PFDT) that are smaller, more compact and of better performance than their linear decision tree (LDT) counterparts. In this paper we discuss the structural differences between PFDT and LDT (C4.5) and compare the size and performance of their models using medical data.

40 citations

Journal ArticleDOI
TL;DR: All existing lower bounds for comparison-based algorithms are valid for general k-bounded decision trees, where k is a constant, and are shown to hold for nondeterministic and probabilistic decision trees as well.
Abstract: Combinatorial techniques for extending lower bound results for decision trees to general types of queries are presented. Problems that are defined by simple inequalities between inputs, called order invariant problems, are considered. A decision tree is called k-bounded if each query depends on at most k variables. No further assumptions on the type of queries are made. It is proved that one can replace the queries of any k-bounded decision tree that solves an order-invariant problem over a large enough input domain with k-bounded queries whose outcome depends only on the relative order of the inputs. As a consequence, all existing lower bounds for comparison-based algorithms are valid for general k-bounded decision trees, where k is a constant.An O(n log n) lower bound for the element uniqueness problem and several other problems for any k-bounded decision tree, such that k = O(nc) and c

40 citations

Journal ArticleDOI
TL;DR: A new model of computation for VLSI, based on the assumption that time for propagating information is at least linear in the distance, is proposed, which is especially suited for deriving lower bounds and trade-offs.
Abstract: A new model of computation for VLSI, based on the assumption that time for propagating information is at least linear in the distance, is proposed. While accommodating for basic laws of physics, the model is designed to be general and technology independent. Thus, from a complexity viewpoint, it is especially suited for deriving lower bounds and trade-offs. New results for a number of problems, including fan-in, transitive functions, matrix multiplication, and sorting are presented. As regards upper bounds, it must be noted that, because of communication costs, the model clearly favors regular and pipelined architectures (e.g., systolic arrays).

40 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
80% related
Artificial neural network
207K papers, 4.5M citations
78% related
Fuzzy logic
151.2K papers, 2.3M citations
77% related
The Internet
213.2K papers, 3.8M citations
77% related
Deep learning
79.8K papers, 2.1M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202310
202224
2021101
2020163
2019158
2018121