scispace - formally typeset
Search or ask a question
Topic

Decision tree model

About: Decision tree model is a research topic. Over the lifetime, 2256 publications have been published within this topic receiving 38142 citations.


Papers
More filters
01 Jan 2008
TL;DR: A probabilistic generative approach for constructing topographic maps of tree-structured data induced by a smooth mapping from low-dimensional latent space, which allows for calculation of magnification factors—a useful tool for the detection of data clusters.
Abstract: In this paper, we present a probabilistic generative approach for constructing topographic maps of tree-structured data. Our model defines a low-dimensional manifold of local noise models, namely, (hidden) Markov tree models, induced by a smooth mapping from low-dimensional latent space. We contrast our approach with that of topographic map formation using re- cursive neural-based techniques, namely, the self-organizing map for structured data (SOMSD) (Hagenbuchner et al., 2003). The probabilistic nature of our model brings a number of benefits: 1) naturally defined cost function that drives the model optimization; 2) principled model comparison and testing for overfitting; 3) a potential for transparent interpretation of the map by inspecting the underlying local noise models; 4) natural accommodation of alternative local noise models implicitly expressing different notions of structured data similarity. Furthermore, in contrast with the recursive neural-based approaches, the smooth nature of the mapping from the latent space to the local model space allows for calculation of magnification factors—a useful tool for the detection of data clusters. We demonstrate our approach on three data sets: a toy data set, an artificially generated data set, and on a data set of images represented as quadtrees. Index Terms—Hidden Markov tree model (HMTM), structured data, topographic mapping.
Dissertation
01 Feb 2020
TL;DR: This research studies learning NAT model-based BNs from data by applying the Minimum Description Length principle and heuristic search, and advances BN structure learning with local models by focusing on inequality constraints.
Abstract: LEARNING NON-IMPEDING NOISY-AND TREE MODEL BASED BAYESIAN NETWORKS FROM DATA Qian Wang Advisor: University of Guelph, 2020 Dr. Yang Xiang Bayesian Networks (BNs) are a widely utilized formalism for representing knowledge in intelligent agents on partially observable and stochastic application environments. When conditional probability tables are used in BNs to quantify strength of dependency between each variable and its parents, the space complexity is exponential on the number m of parents per variable. The time complexity of inference is also lower-bounded exponentially by m. The non-impeding noisy-AND Tree (NAT) model-based BNs can significantly improve both space and time complexity above, rendering both complexity measures linear on m, for a wide range of sparse BN structures. This research studies learning NAT model-based BNs from data by applying the Minimum Description Length principle and heuristic search. It advances BN structure learning with local models by focusing on inequality constraints. Practitioners can make tractable inferences using such BNs learned from data, especially when data admits high treewidth and low-density structures.
Proceedings ArticleDOI
01 Oct 2020
TL;DR: In this paper, a decision tree algorithm was used to classify the relevant graduate employment data, and the balance coefficient was introduced to improve the algorithm, so that the decision tree has higher accuracy.
Abstract: In this paper, the concept of data mining, algorithm, the actual mining process are discussed in detail. Aiming at the large amount of data accumulated in the university employment information management system, and taking the actual employment case analysis as an example, the decision tree algorithm in data mining is used to classify the relevant graduate employment data. For the improvement of C4.5, the balance coefficient is introduced to improve the algorithm, so that the decision tree has higher accuracy. Then, the analysis and decision tree model of the employment information system of college graduates is established by the sample data mining. Finally, the model is used to analyze the data of graduates and predict the success rate of graduates. This paper also summarizes the advantages of the improved algorithm in mining accuracy, rule number and so on, and illustrates the effectiveness of the improved algorithm.
Posted ContentDOI
10 Jul 2020-bioRxiv
TL;DR: Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.
Abstract: Summary treeheatr is an R package for creating interpretable decision tree visualizations with the data represented as a heatmap at the tree’s leaf nodes. The integrated presentation of the tree structure along with an overview of the data efficiently illustrates how the tree nodes split up the feature space and how well the tree model performs. This visualization can also be examined in depth to uncover the correlation structure in the data and importance of each feature in predicting the outcome. Implemented in an easily installed package with a detailed vignette, treeheatr can be a useful teaching tool to enhance students’ understanding of a simple decision tree model before diving into more complex tree-based machine learning methods.Availability The treeheatr package is freely available under the permissive MIT license at https://trang1618.github.io/treeheatr and https://cran.r-project.org/package=treeheatr. It comes with a detailed vignette that is automatically built with GitHub Actions continuous integration.Contact ttle{at}pennmedicine.upenn.eduCompeting Interest StatementThe authors have declared no competing interest.View Full Text

Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
80% related
Artificial neural network
207K papers, 4.5M citations
78% related
Fuzzy logic
151.2K papers, 2.3M citations
77% related
The Internet
213.2K papers, 3.8M citations
77% related
Deep learning
79.8K papers, 2.1M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202310
202224
2021101
2020163
2019158
2018121