scispace - formally typeset
Search or ask a question
Topic

Decision tree model

About: Decision tree model is a research topic. Over the lifetime, 2256 publications have been published within this topic receiving 38142 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A nonparametric ensemble tree model called gradient boosting survival tree (GBST) is proposed that extends the survival tree models with a gradient boosting algorithm and outperforms the existing survival models measured by the concordance index, Kolmogorov–Smirnov index, and the area under the receiver operating characteristic curve of each time period.
Abstract: Credit scoring plays a vital role in the field of consumer finance. Survival analysis provides an advanced solution to the credit-scoring problem by quantifying the probability of survival time. In order to deal with highly heterogeneous industrial data collected in Chinese market of consumer finance, we propose a nonparametric ensemble tree model called gradient boosting survival tree (GBST) that extends the survival tree models with a gradient boosting algorithm. The survival tree ensemble is learned by minimizing the negative log-likelihood in an additive manner. The proposed model optimizes the survival probability simultaneously for each time period, which can reduce the overall error significantly. Finally, as a test of the applicability, we apply the GBST model to quantify the credit risk with large-scale real market datasets. The results show that the GBST model outperforms the existing survival models measured by the concordance index (C-index), Kolmogorov-Smirnov (KS) index, as well as by the area under the receiver operating characteristic curve (AUC) of each time period.

7 citations

Journal ArticleDOI
TL;DR: The experimental results on the complex biomedical datasets show that the performance of the proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.
Abstract: Due to the exponential growth of biomedical repositories such as PubMed and Medline, an accurate predictive model is essential for knowledge discovery in Hadoop environment. Traditional decision tree models such as multivariate Bernoulli model, random forest and multinominal naive Bayesian tree use attribute selection measures to decide best split at each node of the decision tree. Also, the efficiency of document analysis in Hadoop framework is limited mainly due to the class imbalance problem and large candidate sets. In this paper, we proposed a two phase map-reduce framework with text preprocessor and classification model. In the first phase, mapper based preprocessing method was designed to eliminate irrelevant features, missing values and outliers from the biomedical data. In the second phase, a map-reduce based multi-class ensemble decision tree model was designed and implemented on the preprocessed mapper data to improve the true positive rate and computational time. The experimental results on the complex biomedical datasets show that the performance of our proposed Hadoop based multi-class ensemble model significantly outperforms state-of-the-art baselines.

7 citations

Proceedings ArticleDOI
Lei Wang1, Jiefeng Jin1, Ruochen Huang1, Xin Wei1, Jianxin Chen1 
04 May 2016
TL;DR: This paper discusses the relationship between the status of IPTV set-top box and user's QoE, and proposes the unbiased decision tree model to deal with the imbalance dataset.
Abstract: Nowadays, Internet Protocol Television (IPTV) is gradually replacing the traditional TV. IPTV Users require better experience. Therefore, media providers are interested in finding the key factors which influence the Quality of Experience (QoE), and it is necessary to find a model to predict the QoE. In this paper, we discuss the relationship between the status of IPTV set-top box and user's QoE. There is not a uniform standard to measure or improve user's QoE in IPTV, so we combine the status data from IPTV set-top box with user's complaints, selecting the appropriate model and using it for predicting user's QoE. As the data from IPTV set-top box is imbalance, the traditional algorithm does not perform well in terms of predicting user's QoE. To solve this problem, we propose the unbiased decision tree model to deal with the imbalance dataset. First of all, we clean the dataset. Then, we select important features influencing QoE by the feature selection technology. Finally, we compare CART model and the unbiased decision tree model. We demonstrate that the unbiased decision tree model performs well in the imbalance dataset and achieve a high accuracy.

7 citations

Proceedings ArticleDOI
19 Jun 1995
TL;DR: A tight lower bound of /spl theta/(k log(n/k))) is proved for the required depth of a decision tree for the threshold-k function and a tighter lower bound for the "direct sum" problem of computing simultaneously k copies of threshold-2 is proved.
Abstract: We investigate decision trees in which one is allowed to query threshold functions of subsets of variables. We are mainly interested in the case where only queries of AND and OR are allowed. This model is a generalization of the classical decision tree model. Its complexity (depth) is related to the parallel time that is required to compute Boolean functions in certain CRCW PRAM machines with only one cell of constant size. It is also related to the computation using the Ethernet channel. We prove a tight lower bound of /spl theta/(k log(n/k)) for the required depth of a decision tree for the threshold-k function. As a corollary of the method we also prove a tight lower bound for the "direct sum" problem of computing simultaneously k copies of threshold-2 in this model. Next, the size complexity is considered. A relation to depth-three circuits is established and a lower bound is proven. Finally the relation between randomization, nondeterminism and determinism is also investigated, we show separation results between these models.

7 citations

Patent
01 Feb 2006
TL;DR: In this article, a method, system, and computer program product for counting predictor-target pairs for a decision tree model provides the capability to generate count tables that are quicker and more efficient than previous techniques.
Abstract: A method, system, and computer program product for counting predictor-target pairs for a decision tree model provides the capability to generate count tables that is quicker and more efficient than previous techniques. A method of counting predictor-target pairs for a decision tree model, the decision tree model based on data stored in a database, the data comprising a plurality of rows of data, at least one predictor and at least one target, comprises generating a bitmap for each split node of data stored in a database system by intersecting a parent node bitmap and a bitmap of a predictor that satisfies a condition of the node, intersecting each split node bitmap with each predictor bitmap and with each target bitmap to form intersected bitmaps, and counting bits of each intersected bitmap to generate a count of predictor-target pairs.

7 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
80% related
Artificial neural network
207K papers, 4.5M citations
78% related
Fuzzy logic
151.2K papers, 2.3M citations
77% related
The Internet
213.2K papers, 3.8M citations
77% related
Deep learning
79.8K papers, 2.1M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202310
202224
2021101
2020163
2019158
2018121