scispace - formally typeset
Search or ask a question
Topic

Decision tree model

About: Decision tree model is a research topic. Over the lifetime, 2256 publications have been published within this topic receiving 38142 citations.


Papers
More filters
Proceedings ArticleDOI
19 Mar 2018
TL;DR: Experimental results showed that bagged ensemble outperforms other modeling algorithms and the prediction accuracies of these models are benchmarked against the Artificial Neural Network in terms of statistical accuracy, specificity, sensitivity, precision, true positive rate, true negative rate and F-score.
Abstract: Erythemato-squamous diseases (ESDs) are common skin diseases. They consist of six different categories: psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis and pityriasis rubra pilaris. They all share the clinical features of erythema and scaling with very little differences. Their automatic detection is a challenging problem as they have overlapping signs and symptoms. This study evaluates the performance of CHAID decision trees (DTs) for the analysis and diagnosis of ESDs. DTs are nonparametric methods with no priori assumptions about the space distribution with the ability to generate understandable classification rules. This property makes them very efficient tools for physicians and medical specialists to understand the data and inspect the knowledge behind. The Chi-Squared Automatic Interaction Detection (CHAID) decision tree model is a very fast model with the ability to build wider decision trees and to handle all kinds of input variables (features). The CHAID model has many successful achievements especially when used as an interpreter rather than a classifier. Due to the small number of samples, this study uses Chi-square test with the Likelihood Ratio (LR) to get robust results. Ensembles of bagged and boosted CHAIDs were introduced to improve the stability and the accuracy of the model, but on the expense of interpretability. This paper presents the experimental results of the application of CHAID decision trees and their bagged and boosted ensembles for the deferential diagnosis of ESD using both clinical and histopathological features. The prediction accuracies of these models are benchmarked against the Artificial Neural Network (ANN) in terms of statistical accuracy, specificity, sensitivity, precision, true positive rate, true negative rate and F-score. Experimental results showed that bagged ensemble outperforms other modeling algorithms.

17 citations

Book ChapterDOI
TL;DR: This study indicated that the genetic algorithm model yielded better results than other data mining models for the analysis of the data of breast cancer patients in terms of the overall accuracy of the patient classification, the expression and complexity of the classification rule.
Abstract: Data mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. For instance, a clinical pattern might indicate a female who have diabetes or hypertension are easier suffered from stroke for 5 years in a future. Then, a physician can learn valuable knowledge from the data mining processes. Here, we present a study focused on the investigation of the application of artificial intelligence and data mining techniques to the prediction models of breast cancer. The artificial neural network, decision tree, logistic regression, and genetic algorithm were used for the comparative studies and the accuracy and positive predictive value of each algorithm were used as the evaluation indicators. 699 records acquired from the breast cancer patients at the University of Wisconsin, nine predictor variables, and one outcome variable were incorporated for the data analysis followed by the tenfold cross-validation. The results revealed that the accuracies of logistic regression model were 0.9434 (sensitivity 0.9716 and specificity 0.9482), the decision tree model 0.9434 (sensitivity 0.9615, specificity 0.9105), the neural network model 0.9502 (sensitivity 0.9628, specificity 0.9273), and the genetic algorithm model 0.9878 (sensitivity 1, specificity 0.9802). The accuracy of the genetic algorithm was significantly higher than the average predicted accuracy of 0.9612. The predicted outcome of the logistic regression model was higher than that of the neural network model but no significant difference was observed. The average predicted accuracy of the decision tree model was 0.9435 which was the lowest of all four predictive models. The standard deviation of the tenfold cross-validation was rather unreliable. This study indicated that the genetic algorithm model yielded better results than other data mining models for the analysis of the data of breast cancer patients in terms of the overall accuracy of the patient classification, the expression and complexity of the classification rule. The results showed that the genetic algorithm described in the present study was able to produce accurate results in the classification of breast cancer data and the classification rule identified was more acceptable and comprehensible.

17 citations

Journal ArticleDOI
TL;DR: A machine learning method to identify marketing intentions from large-scale The authors-Media data is proposed and the proposed Latent Semantic Analysis (LSI)-Word2vec model can reflect the semantic features and the decision tree model is simplified by decision tree pruning to save computing resources and reduce the time complexity.
Abstract: Social network services for self-media, such as Weibo, Blog, and WeChat Public, constitute a powerful medium that allows users to publish posts every day. Due to insufficient information transparency, malicious marketing of the Internet from self-media posts imposes potential harm on society. Therefore, it is necessary to identify news with marketing intentions for life. We follow the idea of text classification to identify marketing intentions. Although there are some current methods to address intention detection, the challenge is how the feature extraction of text reflects semantic information and how to improve the time complexity and space complexity of the recognition model. To this end, this paper proposes a machine learning method to identify marketing intentions from large-scale We-Media data. First, the proposed Latent Semantic Analysis (LSI)-Word2vec model can reflect the semantic features. Second, the decision tree model is simplified by decision tree pruning to save computing resources and reduce the time complexity. Finally, this paper examines the effects of classifier associations and uses the optimal configuration to help people efficiently identify marketing intention. Finally, the detailed experimental evaluation on several metrics shows that our approaches are effective and efficient. The F1 value can be increased by about 5%, and the running time is increased by 20%, which prove that the newly-proposed method can effectively improve the accuracy of marketing news recognition.

17 citations

Book ChapterDOI
09 Jul 2007
TL;DR: It follows from the existential result that any function that is complete for the class of functions with polylogarithmic nondeterministic k-party communication complexity does not have polylogARithmic deterministic complexity.
Abstract: We solve some fundamental problems in the number-onforehead (NOF) k-party communication model. We show that there exists a function which has at most logarithmic communication complexity for randomized protocols with a one-sided error probability of 1/3 but which has linear communication complexity for deterministic protocols. The result is true for k = nO(1) players, where n is the number of bits on each players' forehead. This separates the analogues of RP and P in the NOF communication model. We also show that there exists a function which has constant randomized complexity for public coin protocols but at least logarithmic complexity for private coin protocols. No larger gap between private and public coin protocols is possible. Our lower bounds are existential and we do not know of any explicit function which allows such separations. However, for the 3-player case we exhibit an explicit function which has Ω(log log n) randomized complexity for private coins but only constant complexity for public coins. It follows from our existential result that any function that is complete for the class of functions with polylogarithmic nondeterministic k-party communication complexity does not have polylogarithmic deterministic complexity. We show that the set intersection function, which is complete in the number-in-hand model, is not complete in the NOF model under cylindrical reductions.

17 citations

Proceedings ArticleDOI
01 Nov 2009
TL;DR: A financial statement analysis using decision tree, where Fifty financial ratios are selected to predict the direction of one-year-ahead earnings changes and the Bagging technique is introduced to improve the classification accuracy of decision tree.
Abstract: There is a vast amount of financial information on companies' financial performance. This information is of great interest for different stakeholders, i.e., stockholders, creditors, auditors, financial analysts, and managers. For stakeholders it is important to extract relevant performance information of the companies they are interested in. As a common method for classification and prediction, decision tree has merits, such as intelligible, rapid, and simple. In this paper, we design a financial statement analysis using decision tree. Fifty financial ratios are selected to predict the direction of one-year-ahead earnings changes. A Bagging technique is introduced to improve the classification accuracy of decision tree. Other methods are also examined in order to make comparison. The results show that, compared with the standard-decision tree model and Boosting-decision tree model, the Bagging-decision tree model works better in stock return prediction.

17 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
80% related
Artificial neural network
207K papers, 4.5M citations
78% related
Fuzzy logic
151.2K papers, 2.3M citations
77% related
The Internet
213.2K papers, 3.8M citations
77% related
Deep learning
79.8K papers, 2.1M citations
77% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
202310
202224
2021101
2020163
2019158
2018121