scispace - formally typeset
Search or ask a question
Topic

Decision tree model

About: Decision tree model is a research topic. Over the lifetime, 2256 publications have been published within this topic receiving 38142 citations.


Papers
More filters
Journal Article
TL;DR: Experimental results show that with respect to the comprehensibility and the generalization capability, either SSID or MCID is significantly superior to the frequently used See5 system (the improved version of C4.5).
Abstract: Since inductive bias exists during the process of selection of expanded attributes, attributes with more values are usually preferred to be selected. It consequently results in a decision tree with large scale and with poor generalization capability. Therefore it is necessary to simplify the decision tree including pre-pruning and post-pruning. This paper focuses on the pre-pruning. A new strategy of pre-pruning is given, that is, at the process of tree growth, two branches (or more) from the same node are merged into one branch and then the tree growth process continues. This paper investigates the impact of merging branches on decision tree induction. The main concerns are whether the comprehensibility, the size and the generalization accuracy of a decision tree can be improved if an appropriate merging strategy is selected and applied. Based on information gain, this paper analyzes the complexity of a decision tree before and after merging branches, and designs two algorithms of merging branches, SSID (based on the proportion of positive samples) and MCID (based on the most gain compensation). Experimental results show that with respect to the comprehensibility and the generalization capability, either SSID or MCID is significantly superior to the frequently used See5 system (the improved version of C4.5).

12 citations

Patent
07 Dec 2016
TL;DR: In this paper, a fuzzy rough set and decision tree-based track circuit red light strip fault positioning method is proposed for intelligent fault diagnosis of track circuits, which can rapidly and correctly position fault points of uninsulated frequency shift track circuit RED light strip faults, greatly reduce the blindness and complexity of fault diagnosis, have relatively good rule explanation and relatively good robustness, improve the fault positioning speed and correctness.
Abstract: The invention discloses a fuzzy rough set and decision tree-based track circuit red light strip fault positioning method. The method mainly comprises the following steps of: 1) establishing an initial decision table; 2) carrying out fuzzy discretization on continuous fault feature attributes to establish a fuzzy decision table; 3) inputting fault sample training data to obtain a reduced decision table; 4) establishing a diagnosis decision tree model; 5) inputting measured data into the diagnosis decision tree model, carrying out calculation to obtain a fault diagnosis result, inputting the measured data into a diagnosis positioning decision tree model, carrying out preliminary judgement to obtain a fault positioning result, judging faults of specific equipment by combining expert experiences, and giving corresponding fault maintenance suggestions. The method can rapidly and correctly position fault points of uninsulated frequency shift track circuit red light strip faults, greatly reduce the blindness and complexity of fault diagnosis, have relatively good rule explanation and relatively good robustness, improve the fault positioning speed and correctness and provide a new fault positioning technological means for intelligent fault diagnosis of track circuits.

12 citations

Journal ArticleDOI
01 Jan 2020
TL;DR: This paper proposes a design pattern detection approach based on tree-based machine learning algorithms and software metrics to study the effectiveness of software metrics in distinguishing between similar structural design patterns.
Abstract: Design patterns are general reusable solutions for recurrent occurring problems. When software systems become more complicated due to the lack of documentation of design patterns in software and the maintenance and evolution costs become a challenge. Design pattern detection is used to reduce the complexity and to increase the understandability of the design in the software. In this paper, we propose a design pattern detection approach based on tree-based machine learning algorithms and software metrics to study the effectiveness of software metrics in distinguishing between similar structural design patterns. We build our datasets using P-MARt repository by extracting the roles of design patterns and calculating the metrics for each role. We used parameter optimization techniques based on the Grid search algorithm to define the optimal parameter of each algorithm. We used two feature selection methods based on a genetic algorithm to find features that influence the most in the distinguishing process. Through our experimental study, we showed the effectiveness of machine learning and software metrics when distinguishing similar structure design patterns. Moreover, we extracted the essential metrics in each dataset that supported the machine learning model to take its decision. We presented the detection conditions for each role in the design pattern by extracting them from the decision tree model.

12 citations

Journal Article
TL;DR: It is shown that every regular language L has either constant, logarithmic or linear two-party communication complexity (in a worst-case partition sense) and a similar trichotomy for simultaneous and probabilistic communication complexity is proved.
Abstract: We show that every regular language L has either constant, logarithmic or linear two-party communication complexity (in a worst-case partition sense). We prove a similar trichotomy for simultaneous communication complexity and a quadrichotomy for probabilistic communication complexity.

12 citations

Dissertation
01 Jan 2012
TL;DR: This thesis investigates the power and limits of efficient joint computation, in several computational models: query algorithms, circuits, and Turing machines; significantly improve and extend past results on limits; identify barriers to progress towards better circuit lower bounds for multiple-output operators; and begin an original line of inquiry into the complexity of joint computation.
Abstract: Joint computation is the ubiquitous scenario in which a computer is presented with not one, but many computational tasks to perform. A fundamental question arises: when can we cleverly combine computations, to perform them with greater efficiency or reliability than by tackling them separately? This thesis investigates the power and, especially, the limits of efficient joint computation, in several computational models: query algorithms, circuits, and Turing machines. We significantly improve and extend past results on limits to efficient joint computation for multiple independent tasks; identify barriers to progress towards better circuit lower bounds for multiple-output operators; and begin an original line of inquiry into the complexity of joint computation. In more detail, we make contributions in the following areas: Improved direct product theorems for randomized query complexity: The "direct product problem" seeks to understand how the difficulty of computing a function on each of k independent inputs scales with k. We prove the following direct product theorem (DPT) for query complexity: if every T-query algorithm has success probability at most 1 – e in computing the Boolean function f on input distribution μ, then for α ≤ 1, every αeTk-query algorithm has success probability at most (2αe(1 – e))k in computing the k-fold direct product f ⊗k correctly on k independent inputs from μ. In light of examples due to Shaltiel, this statement gives an essentially optimal tradeoff between the query bound and the error probability. Using this DPT, we show that for an absolute constant α > 0, the worst-case success probability of any αR 2(f)k-query randomized algorithm for f⊗k falls exponentially with k. The best previous statement of this type, due to Klauck, Spalek, and de Wolf, required a query bound of O( bs(f)k). Our proof technique involves defining and analyzing a collection of martingales associated with an algorithm attempting to solve f ⊗k. Our method is quite general and yields a new XOR lemma and threshold DPT for the query model, as well as DPTs for the query complexity of learning tasks, search problems, and tasks involving interaction with dynamic entities. We also give a version of our DPT in which decision tree size is the resource of interest. Joint complexity in the Decision Tree Model: We study the diversity of possible behaviors of the joint computational complexity of a collection f1, …, fk of Boolean functions over a shared input. We focus on the deterministic decision tree model, with depth as the complexity measure; in this model, we prove a result to the effect that the "obvious" constraints on joint computational complexity are essentially the only ones. The proof uses an intriguing new type of cryptographic data structure called a "mystery bin," which we construct using a polynomial separation between deterministic and unambiguous query complexity shown by Savický. We also pose a conjecture in the communication model which, if proved, would extend our result to that model. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.) (Abstract shortened by UMI.)

12 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
80% related
Artificial neural network
207K papers, 4.5M citations
78% related
Fuzzy logic
151.2K papers, 2.3M citations
77% related
The Internet
213.2K papers, 3.8M citations
77% related
Deep learning
79.8K papers, 2.1M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202310
202224
2021101
2020163
2019158
2018121