Topic

Decision tree model

About: Decision tree model is a research topic. Over the lifetime, 2256 publications have been published within this topic receiving 38142 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•

Using genetic algorithms-based approach for better decision trees: A computational study

[...]

Zhiwei Fu¹•Institutions (1)

Fannie Mae¹

01 Jan 2002-Lecture Notes in Computer Science

TL;DR: GAIT as discussed by the authors combines genetic algorithm, statistical sampling, and decision tree to develop intelligent decision trees that can alleviate scalability, accuracy and efficiency concerns regarding how to effectively deal with large and complex data sets.

...read moreread less

Abstract: Recently, decision tree algorithms have been widely used in dealing with data mining problems to find out valuable rules and patterns. However, scalability, accuracy and efficiency are significant concerns regarding how to effectively deal with large and complex data sets in the implementation. In this paper, we propose an innovative machine learning approach (we call our approach GAIT), combining genetic algorithm, statistical sampling, and decision tree, to develop intelligent decision trees that can alleviate some of these problems. We design our computational experiments and run GAIT on three different data sets (namely Socio-Olympic data, Westinghouse data, and FAA data) to test its performance against standard decision tree algorithm, neural network classifier, and statistical discriminant technique, respectively. The computational results show that our approach outperforms standard decision tree algorithm profoundly at lower sampling levels, and achieves significantly better results with less effort than both neural network and discriminant classifiers.

...read moreread less

3 citations

DOI•

Information theory methods in communication complexity

[...]

Michael Saks¹, Nikolaos Leonardos¹•Institutions (1)

Rutgers University¹

01 Jan 2012

TL;DR: This dissertation proves lower bounds on the randomized two-party communication complexity of functions that arise from read-once boolean formulae, and explores the applicability of the information-theoretic method in the number-on-the-forehead model.

...read moreread less

Abstract: This dissertation is concerned with the application of notions and methods from the field of information theory to the field of communication complexity. It consists of two main parts. In the first part of the dissertation, we prove lower bounds on the randomized two-party communication complexity of functions that arise from read-once boolean formulae. A read-once boolean formula is a formula in propositional logic with the property that every variable appears exactly once. Such a formula can be represented by a tree, where the leaves correspond to variables, and the internal nodes are labeled by binary connectives. Under certain assumptions, this representation is unique. Thus, one can define the depth of a formula as the depth of the tree that represents it. The complexity of the evaluation of general read-once formulae has attracted interest mainly in the decision tree model. In the communication complexity model many interesting results deal with specific read-once formulae, such as disjointness and tribes. In this dissertation we use information theory methods to prove lower bounds that hold for any read-once formula. Our lower bounds are of the form n(f)/cd(f) , where n(f) is the number of variables and d(f) is the depth of the formula, and they are optimal up to the constant in the base of the denominator. In the second part of the dissertation, we explore the applicability of the information-theoretic method in the number-on-the-forehead model. The work of Bar-Yossef, Jayram, Kumar & Sivakumar [BYJKS04] revealed a beautiful connection between Hellinger distance and two-party randomized communication protocols. Inspired by their work and motivated by the open questions in the number-on-the-forehead model, we introduce the notion of Hellinger volume. We show that it lower bounds the information cost of multi-party protocols. We provide a small toolbox that allows one to manipulate several Hellinger volume terms and also to lower bound a Hellinger volume when the distributions involved satisfy certain conditions. In doing so, we prove a new upper bound on the difference between the arithmetic mean and the geometric mean in terms of relative entropy. Finally, we show how to apply the new tools to obtain a lower bound on the informational complexity of the ANDk function.

...read moreread less

3 citations

Journal Article•

Research on weighted multi-random decision tree and its application to intrusion detection

[...]

Ye Zhen¹•Institutions (1)

Hefei University of Technology¹

01 Jan 2007-Computer Engineering and Applications

TL;DR: In this paper, a random model based decision tree algorithm is applied, and it is verified by experiment that this algorithm is evidently powerful for IDS.

...read moreread less

Abstract: The traditional decision tree category methods(such as:ID3,C45)are effective on small data setsBut,when these methods are applied to massive data of IDS,its effectivity appears to be not enoughIn this paper,a random model based decision tree algorithm is applied,and it is verified by experiment that this algorithm is evidently powerful for IDS

...read moreread less

3 citations

Journal Article•DOI•

Environmental Factors Prediction in Preterm Birth Using Comparison between Logistic Regression and Decision Tree Methods: An Exploratory Analysis

[...]

Rakesh Kumar Saroj¹, Madhu Anand¹, Neha Kumari²•Institutions (2)

SRM University¹, Dr. B. R. Ambedkar University²

09 Mar 2021-Social Science Research Network

TL;DR: Logistic regression is more appropriate as compare to decision tree classification model in the preterm birth data to illustrate the importance of machine learning classification models and to identify the significant environmental factors behind pre-term birth.

...read moreread less

Abstract: Background and aimPreterm birth is one of the major cause of neonatal death in the developing countries and environmental factors are playing vital role in pre term birth. Nowadays Machine learning techniques are very useful for finding the hidden factors and classifications. The purpose of this study is to illustrate the importance of machine learning classification models and to identify the significant environmental factors behind pre-term birth.MethodWe have used 90 pregnant mothers, of whom 40 are preterm and 50 are full-term births. We have checked the model accuracy of the dataset through logistic regression and decision tree classifier model.ResultsThe comparative outcome of the logistic and decision tree model reveals that logistic regression is stronger in terms of metrics (precision = 0.92, F1-score = 0.96 and AUROC = 0.97), while the weak result shows by the decision tree (precision = 0.75, F1-score = 0.86 and AUROC = 0.87).ConclusionsThe conclusion shows that logistic regression is more appropriate as compare to decision tree classification model in the preterm birth data. The most influential factors for preterm birth are variables like α -HCH, total HCH and MDA (Malondialdehyde).

...read moreread less

3 citations

Proceedings Article•DOI•

Improving Search Results with Prior Similar Queries

[...]

Yashar Moshfeghi¹, Kristiyan Velinov¹, Peter Triantafillou¹•Institutions (1)

University of Glasgow¹

24 Oct 2016

TL;DR: The proposed approach is inspired by collaborative filtering, with the main challenge being to find the set of similar queries, while also taking efficiency into account, and aims to address this challenge by proposing a combination of a similarity graph and a locality sensitive hashing scheme.

...read moreread less

Abstract: This paper describes a novel approach to re-ranking search engine result pages (SERP): Its fundamental principle is to re-rank results to a given query, based on exploiting evidence gathered from past similar search queries. Our approach is inspired by collaborative filtering, with the main challenge being to find the set of similar queries, while also taking efficiency into account. In particular, our approach aims to address this challenge by proposing a combination of a similarity graph and a locality sensitive hashing scheme. We construct a set of features from our similarity graph and build a prediction model using the Hoeffding decision tree algorithm. We have evaluated the effectiveness of our model in terms of P@1, MAP@10, and nDCG@10, using the Yandex Data Challenge data set. We have compared the performance of our model against two baselines, namely, the Yandex initial ranking and the decision tree model learnt on the same set of features when extracted based on query repetition (i.e. excluding the evidence of similar queries in our approach). Our results reveal that the proposed approach consistently and (statistically) significantly outperforms both baselines.

...read moreread less

3 citations

Collapse

Network Information

Performance

Metrics

2,288

Papers

43,502

Citations

No. of papers in the topic in previous years
Year	Papers
2023	10
2022	24
2021	101
2020	163
2019	158
2018	121

Decision tree model

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics