scispace - formally typeset
Search or ask a question
Topic

Decision tree model

About: Decision tree model is a research topic. Over the lifetime, 2256 publications have been published within this topic receiving 38142 citations.


Papers
More filters
Journal Article
Zhiwei Fu1
TL;DR: GAIT as discussed by the authors combines genetic algorithm, statistical sampling, and decision tree to develop intelligent decision trees that can alleviate scalability, accuracy and efficiency concerns regarding how to effectively deal with large and complex data sets.
Abstract: Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio-Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

3 citations

DOI
01 Jan 2012
TL;DR: This dissertation proves lower bounds on the randomized two-party communication complexity of functions that arise from read-once boolean formulae, and explores the applicability of the information-theoretic method in the number-on-the-forehead model.
Abstract: This dissertation is concerned with the application of notions and methods from the field of information theory to the field of communication complexity. It consists of two main parts. In the first part of the dissertation, we prove lower bounds on the randomized two-party communication complexity of functions that arise from read-once boolean formulae. A read-once boolean formula is a formula in propositional logic with the property that every variable appears exactly once. Such a formula can be represented by a tree, where the leaves correspond to variables, and the internal nodes are labeled by binary connectives. Under certain assumptions, this representation is unique. Thus, one can define the depth of a formula as the depth of the tree that represents it. The complexity of the evaluation of general read-once formulae has attracted interest mainly in the decision tree model. In the communication complexity model many interesting results deal with specific read-once formulae, such as disjointness and tribes. In this dissertation we use information theory methods to prove lower bounds that hold for any read-once formula. Our lower bounds are of the form n(f)/cd(f) , where n(f) is the number of variables and d(f) is the depth of the formula, and they are optimal up to the constant in the base of the denominator. In the second part of the dissertation, we explore the applicability of the information-theoretic method in the number-on-the-forehead model. The work of Bar-Yossef, Jayram, Kumar & Sivakumar [BYJKS04] revealed a beautiful connection between Hellinger distance and two-party randomized communication protocols. Inspired by their work and motivated by the open questions in the number-on-the-forehead model, we introduce the notion of Hellinger volume. We show that it lower bounds the information cost of multi-party protocols. We provide a small toolbox that allows one to manipulate several Hellinger volume terms and also to lower bound a Hellinger volume when the distributions involved satisfy certain conditions. In doing so, we prove a new upper bound on the difference between the arithmetic mean and the geometric mean in terms of relative entropy. Finally, we show how to apply the new tools to obtain a lower bound on the informational complexity of the ANDk function.

3 citations

Journal Article
TL;DR: In this paper, a random model based decision tree algorithm is applied, and it is verified by experiment that this algorithm is evidently powerful for IDS.
Abstract: The traditional decision tree category methods(such as:ID3,C45)are effective on small data setsBut,when these methods are applied to massive data of IDS,its effectivity appears to be not enoughIn this paper,a random model based decision tree algorithm is applied,and it is verified by experiment that this algorithm is evidently powerful for IDS

3 citations

Journal ArticleDOI
TL;DR: Logistic regression is more appropriate as compare to decision tree classification model in the preterm birth data to illustrate the importance of machine learning classification models and to identify the significant environmental factors behind pre-term birth.
Abstract: Background and aimPreterm birth is one of the major cause of neonatal death in the developing countries and environmental factors are playing vital role in pre term birth. Nowadays Machine learning techniques are very useful for finding the hidden factors and classifications. The purpose of this study is to illustrate the importance of machine learning classification models and to identify the significant environmental factors behind pre-term birth.MethodWe have used 90 pregnant mothers, of whom 40 are preterm and 50 are full-term births. We have checked the model accuracy of the dataset through logistic regression and decision tree classifier model.ResultsThe comparative outcome of the logistic and decision tree model reveals that logistic regression is stronger in terms of metrics (precision = 0.92, F1-score = 0.96 and AUROC = 0.97), while the weak result shows by the decision tree (precision = 0.75, F1-score = 0.86 and AUROC = 0.87).ConclusionsThe conclusion shows that logistic regression is more appropriate as compare to decision tree classification model in the preterm birth data. The most influential factors for preterm birth are variables like α -HCH, total HCH and MDA (Malondialdehyde).

3 citations

Proceedings ArticleDOI
24 Oct 2016
TL;DR: The proposed approach is inspired by collaborative filtering, with the main challenge being to find the set of similar queries, while also taking efficiency into account, and aims to address this challenge by proposing a combination of a similarity graph and a locality sensitive hashing scheme.
Abstract: This paper describes a novel approach to re-ranking search engine result pages (SERP): Its fundamental principle is to re-rank results to a given query, based on exploiting evidence gathered from past similar search queries. Our approach is inspired by collaborative filtering, with the main challenge being to find the set of similar queries, while also taking efficiency into account. In particular, our approach aims to address this challenge by proposing a combination of a similarity graph and a locality sensitive hashing scheme. We construct a set of features from our similarity graph and build a prediction model using the Hoeffding decision tree algorithm. We have evaluated the effectiveness of our model in terms of P@1, MAP@10, and nDCG@10, using the Yandex Data Challenge data set. We have compared the performance of our model against two baselines, namely, the Yandex initial ranking and the decision tree model learnt on the same set of features when extracted based on query repetition (i.e. excluding the evidence of similar queries in our approach). Our results reveal that the proposed approach consistently and (statistically) significantly outperforms both baselines.

3 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
80% related
Artificial neural network
207K papers, 4.5M citations
78% related
Fuzzy logic
151.2K papers, 2.3M citations
77% related
The Internet
213.2K papers, 3.8M citations
77% related
Deep learning
79.8K papers, 2.1M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202310
202224
2021101
2020163
2019158
2018121