Topic

Decision tree model

About: Decision tree model is a research topic. Over the lifetime, 2256 publications have been published within this topic receiving 38142 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Analysis and Diagnosis of Erythemato-Squamous Diseases Using CHAID Decision Trees

[...]

Alaa M. Elsayad¹, Mujahed Al-Dhaifallah², Ahmed M. Nassef¹•Institutions (2)

Salman bin Abdulaziz University¹, King Fahd University of Petroleum and Minerals²

19 Mar 2018

TL;DR: Experimental results showed that bagged ensemble outperforms other modeling algorithms and the prediction accuracies of these models are benchmarked against the Artificial Neural Network in terms of statistical accuracy, specificity, sensitivity, precision, true positive rate, true negative rate and F-score.

...read moreread less

Abstract: Erythemato-squamous diseases (ESDs) are common skin diseases. They consist of six different categories: psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis and pityriasis rubra pilaris. They all share the clinical features of erythema and scaling with very little differences. Their automatic detection is a challenging problem as they have overlapping signs and symptoms. This study evaluates the performance of CHAID decision trees (DTs) for the analysis and diagnosis of ESDs. DTs are nonparametric methods with no priori assumptions about the space distribution with the ability to generate understandable classification rules. This property makes them very efficient tools for physicians and medical specialists to understand the data and inspect the knowledge behind. The Chi-Squared Automatic Interaction Detection (CHAID) decision tree model is a very fast model with the ability to build wider decision trees and to handle all kinds of input variables (features). The CHAID model has many successful achievements especially when used as an interpreter rather than a classifier. Due to the small number of samples, this study uses Chi-square test with the Likelihood Ratio (LR) to get robust results. Ensembles of bagged and boosted CHAIDs were introduced to improve the stability and the accuracy of the model, but on the expense of interpretability. This paper presents the experimental results of the application of CHAID decision trees and their bagged and boosted ensembles for the deferential diagnosis of ESD using both clinical and histopathological features. The prediction accuracies of these models are benchmarked against the Artificial Neural Network (ANN) in terms of statistical accuracy, specificity, sensitivity, precision, true positive rate, true negative rate and F-score. Experimental results showed that bagged ensemble outperforms other modeling algorithms.

...read moreread less

17 citations

Book Chapter•DOI•

Applying data mining for the analysis of breast cancer data.

[...]

Der Ming Liou, Wei Pin Chang

01 Jan 2015-Methods of Molecular Biology

TL;DR: This study indicated that the genetic algorithm model yielded better results than other data mining models for the analysis of the data of breast cancer patients in terms of the overall accuracy of the patient classification, the expression and complexity of the classification rule.

...read moreread less

Abstract: Data mining, also known as Knowledge-Discovery in Databases (KDD), is the process of automatically searching large volumes of data for patterns. For instance, a clinical pattern might indicate a female who have diabetes or hypertension are easier suffered from stroke for 5 years in a future. Then, a physician can learn valuable knowledge from the data mining processes. Here, we present a study focused on the investigation of the application of artificial intelligence and data mining techniques to the prediction models of breast cancer. The artificial neural network, decision tree, logistic regression, and genetic algorithm were used for the comparative studies and the accuracy and positive predictive value of each algorithm were used as the evaluation indicators. 699 records acquired from the breast cancer patients at the University of Wisconsin, nine predictor variables, and one outcome variable were incorporated for the data analysis followed by the tenfold cross-validation. The results revealed that the accuracies of logistic regression model were 0.9434 (sensitivity 0.9716 and specificity 0.9482), the decision tree model 0.9434 (sensitivity 0.9615, specificity 0.9105), the neural network model 0.9502 (sensitivity 0.9628, specificity 0.9273), and the genetic algorithm model 0.9878 (sensitivity 1, specificity 0.9802). The accuracy of the genetic algorithm was significantly higher than the average predicted accuracy of 0.9612. The predicted outcome of the logistic regression model was higher than that of the neural network model but no significant difference was observed. The average predicted accuracy of the decision tree model was 0.9435 which was the lowest of all four predictive models. The standard deviation of the tenfold cross-validation was rather unreliable. This study indicated that the genetic algorithm model yielded better results than other data mining models for the analysis of the data of breast cancer patients in terms of the overall accuracy of the patient classification, the expression and complexity of the classification rule. The results showed that the genetic algorithm described in the present study was able to produce accurate results in the classification of breast cancer data and the classification rule identified was more acceptable and comprehensible.

...read moreread less

17 citations

Journal Article•DOI•

Stacking-Based Ensemble Learning of Self-Media Data for Marketing Intention Detection

[...]

Yufeng Wang, Shuangrong Liu, Songqian Li, Jidong Duan, Zhihao Hou, Jia Yu, Kun Ma - Show less +3 more

10 Jul 2019-Future Internet

TL;DR: A machine learning method to identify marketing intentions from large-scale The authors-Media data is proposed and the proposed Latent Semantic Analysis (LSI)-Word2vec model can reflect the semantic features and the decision tree model is simplified by decision tree pruning to save computing resources and reduce the time complexity.

...read moreread less

Abstract: Social network services for self-media, such as Weibo, Blog, and WeChat Public, constitute a powerful medium that allows users to publish posts every day. Due to insufficient information transparency, malicious marketing of the Internet from self-media posts imposes potential harm on society. Therefore, it is necessary to identify news with marketing intentions for life. We follow the idea of text classification to identify marketing intentions. Although there are some current methods to address intention detection, the challenge is how the feature extraction of text reflects semantic information and how to improve the time complexity and space complexity of the recognition model. To this end, this paper proposes a machine learning method to identify marketing intentions from large-scale We-Media data. First, the proposed Latent Semantic Analysis (LSI)-Word2vec model can reflect the semantic features. Second, the decision tree model is simplified by decision tree pruning to save computing resources and reduce the time complexity. Finally, this paper examines the effects of classifier associations and uses the optimal configuration to help people efficiently identify marketing intention. Finally, the detailed experimental evaluation on several metrics shows that our approaches are effective and efficient. The F1 value can be increased by about 5%, and the running time is increased by 20%, which prove that the newly-proposed method can effectively improve the accuracy of marketing news recognition.

...read moreread less

17 citations

Book Chapter•DOI•

Separating deterministic from nondeterministic nof multiparty communication complexity

[...]

Paul Beame¹, Matei David², Toniann Pitassi², Philipp Woelfel²•Institutions (2)

University of Washington¹, University of Toronto²

09 Jul 2007

TL;DR: It follows from the existential result that any function that is complete for the class of functions with polylogarithmic nondeterministic k-party communication complexity does not have polylogARithmic deterministic complexity.

...read moreread less

Abstract: We solve some fundamental problems in the number-onforehead (NOF) k-party communication model. We show that there exists a function which has at most logarithmic communication complexity for randomized protocols with a one-sided error probability of 1/3 but which has linear communication complexity for deterministic protocols. The result is true for k = nO(1) players, where n is the number of bits on each players' forehead. This separates the analogues of RP and P in the NOF communication model. We also show that there exists a function which has constant randomized complexity for public coin protocols but at least logarithmic complexity for private coin protocols. No larger gap between private and public coin protocols is possible. Our lower bounds are existential and we do not know of any explicit function which allows such separations. However, for the 3-player case we exhibit an explicit function which has Ω(log log n) randomized complexity for private coins but only constant complexity for public coins. It follows from our existential result that any function that is complete for the class of functions with polylogarithmic nondeterministic k-party communication complexity does not have polylogarithmic deterministic complexity. We show that the set intersection function, which is complete in the number-in-hand model, is not complete in the NOF model under cylindrical reductions.

...read moreread less

17 citations

Proceedings Article•DOI•

Stock return prediction based on Bagging-decision tree

[...]

Huacheng Wang¹, Yanxia Jiang¹, Hui Wang¹•Institutions (1)

Renmin University of China¹

01 Nov 2009

TL;DR: A financial statement analysis using decision tree, where Fifty financial ratios are selected to predict the direction of one-year-ahead earnings changes and the Bagging technique is introduced to improve the classification accuracy of decision tree.

...read moreread less

Abstract: There is a vast amount of financial information on companies' financial performance. This information is of great interest for different stakeholders, i.e., stockholders, creditors, auditors, financial analysts, and managers. For stakeholders it is important to extract relevant performance information of the companies they are interested in. As a common method for classification and prediction, decision tree has merits, such as intelligible, rapid, and simple. In this paper, we design a financial statement analysis using decision tree. Fifty financial ratios are selected to predict the direction of one-year-ahead earnings changes. A Bagging technique is introduced to improve the classification accuracy of decision tree. Other methods are also examined in order to make comparison. The results show that, compared with the standard-decision tree model and Boosting-decision tree model, the Bagging-decision tree model works better in stock return prediction.

...read moreread less

17 citations

Collapse

Network Information

Performance

Metrics

2,288

Papers

43,502

Citations

No. of papers in the topic in previous years
Year	Papers
2023	10
2022	24
2021	101
2020	163
2019	158
2018	121

Decision tree model

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics