scispace - formally typeset
Search or ask a question
Topic

Decision tree model

About: Decision tree model is a research topic. Over the lifetime, 2256 publications have been published within this topic receiving 38142 citations.


Papers
More filters
Posted Content
TL;DR: This paper develops novel proposal mechanisms for efficient sampling in the Bayesian Additive Regression Tree (BART) model and implements this sampling algorithm in the model and demonstrates its effectiveness on a prediction problem from computer experiments and a test function where structural tree variability is needed to fully explore the posterior.
Abstract: Bayesian regression trees are flexible non-parametric models that are well suited to many modern statistical regression problems. Many such tree models have been proposed, from the simple single- tree model to more complex tree ensembles. Their non-parametric formulation allows for effective and efficient modeling of datasets exhibiting complex non-linear relationships between the model pre- dictors and observations. However, the mixing behavior of the Markov Chain Monte Carlo (MCMC) sampler is sometimes poor. This is because the proposals in the sampler are typically local alterations of the tree structure, such as the birth/death of leaf nodes, which does not allow for efficient traversal of the model space. This poor mixing can lead to inferential problems, such as under-representing uncertainty. In this paper, we develop novel proposal mechanisms for efficient sampling. The first is a rule perturbation proposal while the second we call tree rotation. The perturbation proposal can be seen as an efficient variation of the change proposal found in existing literature. The novel tree rotation proposal is simple to implement as it only requires local changes to the regression tree structure, yet it efficiently traverses disparate regions of the model space along contours of equal probability. When combined with the classical birth/death proposal, the resulting MCMC sampler exhibits good acceptance rates and properly represents model uncertainty in the posterior samples. We implement this sampling algorithm in the Bayesian Additive Regression Tree (BART) model and demonstrate its effectiveness on a prediction problem from computer experiments and a test function where structural tree variability is needed to fully explore the posterior.

17 citations

Journal ArticleDOI
TL;DR: Based on legacy soil data from a soil survey conducted recently in the traditional manner in Hong Kong of China, a digital soil mapping method was applied to produce soil order information for mountain areas of Hong Kong as discussed by the authors.

17 citations

Posted ContentDOI
30 Nov 2018-bioRxiv
TL;DR: CDT unpruned tree shows highest accuracy, precision, recall, f-measure, second highest AUROC and lowest RMSE than other models and plasma glucose, plasma glucose 2hr after glucose and HDL-cholesterol have been found as the most significant features to predict the severity of Diabetes Mellitus.
Abstract: Diabetes is a chronic condition which is associated with an abnormally high level of sugar in the blood. It is a lifelong disease that causes harmful effects in human life. The goal of this research is to predict the severity of diabetes and find out significant features of it. In this work, we gathered diabetes patients records from Noakhali Diabetes Association, Noakhali, Bangladesh. Thus, We preprocessed our raw dataset by replacing and removing missing and wrong records respectively. Thus, CDT, J48, NBTree and REPtree decision tree based classification techniques were used to analyze this dataset. After this analysis, we evaluated classification outcomes of these decision tree classifiers and found the best decision tree model from them. In this work, CDT unpruned tree shows highest accuracy, precision, recall, f-measure, second highest AUROC and lowest RMSE than other models. Then, we extracted possible rules and significant features from this model and plasma glucose, plasma glucose 2hr after glucose and HDL-cholesterol have been found as the most significant features to predict the severity of Diabetes Mellitus. We hope this work will be beneficial to build a predictive system and complementary tool for diabetes treatment in future.

17 citations

Book ChapterDOI
15 Sep 2008
TL;DR: Using the trusted hardware based model, the computation complexity of the scheme, including offline computation, is linear to the number of queries and is bounded by ${\mathrm{O}}(\sqrt{n})$ after optimization.
Abstract: For a private information retrieval (PIR) scheme to be deployed in practice, low communication complexity and low computation complexity are two fundamental requirements it must meet. Most existing PIR schemes only focus on the communication complexity. The reduction on the computational complexity did not receive the due treatment mainly because of its O(n) lower bound. By using the trusted hardware based model, we design a novel scheme which breaks this barrier. With constant storage, the computation complexity of our scheme, including offline computation, is linear to the number of queries and is bounded by ${\mathrm{O}}(\sqrt{n})$ after optimization.

17 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
80% related
Artificial neural network
207K papers, 4.5M citations
78% related
Fuzzy logic
151.2K papers, 2.3M citations
77% related
The Internet
213.2K papers, 3.8M citations
77% related
Deep learning
79.8K papers, 2.1M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202310
202224
2021101
2020163
2019158
2018121