Showing papers on "Decision tree model published in 2020"

PDF

Open Access

Journal Article•DOI•

Early diagnosis of COVID-19-affected patients based on X-ray and computed tomography images using deep learning algorithm.

[...]

Debabrata Dansana¹, Raghvendra Kumar¹, Aishik Bhattacharjee¹, D. Jude Hemanth², Deepak Gupta³, Ashish Khanna³, Oscar Castillo - Show less +3 more•Institutions (3)

Gandhi Institute of Engineering and Technology¹, Karunya University², Maharaja Agrasen Institute of Technology³

28 Aug 2020

TL;DR: In this article, a convolution neural networks method is used for binary classification pneumonia-based conversion of VGG-19, Inception_V2 and decision tree model on X-ray and CT scan images dataset, which contains 360 images.

...read moreread less

Abstract: The novel coronavirus infection (COVID-19) that was first identified in China in December 2019 has spread across the globe rapidly infecting over ten million people. The World Health Organization (WHO) declared it as a pandemic on March 11, 2020. What makes it even more critical is the lack of vaccines available to control the disease, although many pharmaceutical companies and research institutions all over the world are working toward developing effective solutions to battle this life-threatening disease. X-ray and computed tomography (CT) images scanning is one of the most encouraging exploration zones; it can help in finding and providing early diagnosis to diseases and gives both quick and precise outcomes. In this study, convolution neural networks method is used for binary classification pneumonia-based conversion of VGG-19, Inception_V2 and decision tree model on X-ray and CT scan images dataset, which contains 360 images. It can infer that fine-tuned version VGG-19, Inception_V2 and decision tree model show highly satisfactory performance with a rate of increase in training and validation accuracy (91%) other than Inception_V2 (78%) and decision tree (60%) models.

...read moreread less

98 citations

Journal Article•DOI•

Comparison of machine learning models for gully erosion susceptibility mapping

[...]

Alireza Arabameri¹, Wei Chen², Wei Chen³, Marco Loche⁴, Xia Zhao², Yang Li², Luigi Lombardo⁵, Artemi Cerdà⁶, Biswajeet Pradhan⁷, Biswajeet Pradhan⁸, Dieu Tien Bui⁹ - Show less +7 more•Institutions (9)

Tarbiat Modares University¹, Xi'an University of Science and Technology², Ministry of Land and Resources of the People's Republic of China³, University of Cagliari⁴, University of Twente⁵, University of Valencia⁶, Sejong University⁷, University of Technology, Sydney⁸, Duy Tan University⁹

01 Sep 2020-Geoscience frontiers

TL;DR: The susceptibility mapping procedure is performed by testing three extensions of a decision tree model namely, Alternating Decision Tree (ADTree), Naive-Bayes tree (NBTree), and Logistic Model Tree (LMT) by dichotomizing the gully information over space into gully presence/absence conditions, which are further explored in their calibration and validation stages.

...read moreread less

Abstract: Gully erosion is a disruptive phenomenon which extensively affects the Iranian territory, especially in the Northern provinces. A number of studies have been recently undertaken to study this process and to predict it over space and ultimately, in a broader national effort, to limit its negative effects on local communities. We focused on the Bastam watershed where 9.3% of its surface is currently affected by gullying. Machine learning algorithms are currently under the magnifying glass across the geomorphological community for their high predictive ability. However, unlike the bivariate statistical models, their structure does not provide intuitive and quantifiable measures of environmental preconditioning factors. To cope with such weakness, we interpret preconditioning causes on the basis of a bivariate approach namely, Index of Entropy. And, we performed the susceptibility mapping procedure by testing three extensions of a decision tree model namely, Alternating Decision Tree (ADTree), Naive-Bayes tree (NBTree), and Logistic Model Tree (LMT). We dichotomized the gully information over space into gully presence/absence conditions, which we further explored in their calibration and validation stages. Being the presence/absence information and associated factors identical, the resulting differences are only due to the algorithmic structures of the three models we chose. Such differences are not significant in terms of performances; in fact, the three models produce outstanding predictive AUC measures (ADTree = 0.922; NBTree = 0.939; LMT = 0.944). However, the associated mapping results depict very different patterns where only the LMT is associated with reasonable susceptibility patterns. This is a strong indication of what model combines best performance and mapping for any natural hazard – oriented application.

...read moreread less

92 citations

Journal Article•DOI•

Privacy Preserving Vertical Federated Learning for Tree-based Models

[...]

Yuncheng Wu¹, Shaofeng Cai¹, Xiaokui Xiao¹, Gang Chen², Beng Chin Ooi¹ - Show less +1 more•Institutions (2)

National University of Singapore¹, Zhejiang University²

14 Aug 2020-arXiv: Cryptography and Security

TL;DR: This paper proposes Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output).

...read moreread less

Abstract: Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies {\it vertical} federated learning, which tackles the scenarios where (i) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise $m-1$ out of $m$ clients. We further identify two privacy leakages when the trained decision tree model is released in plaintext and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Theoretical and experimental analysis suggest that Pivot is efficient for the privacy achieved.

...read moreread less

87 citations

Journal Article•DOI•

Improving land cover classification in an urbanized coastal area by random forests: The role of variable selection

[...]

Fang Zhang¹, Xiaojun Yang²•Institutions (2)

Hainan University¹, Florida State University²

15 Dec 2020-Remote Sensing of Environment

TL;DR: This study proposed a series of methods to select the optimal feature domain to improve land cover classification in a complex urbanized coastal area and found that compared to the traditional band-only model, the variable selection process can significantly improve the model parsimony and computational efficiency.

...read moreread less

79 citations

Journal Article•DOI•

BehavDT: A Behavioral Decision Tree Learning to Build User-Centric Context-Aware Predictive Model

[...]

Iqbal H. Sarker¹, Iqbal H. Sarker², Alan Colman¹, Jun Han¹, Asif Irshad Khan³, Yoosef B. Abushark³, Khaled Salah⁴ - Show less +3 more•Institutions (4)

Swinburne University of Technology¹, Chittagong University of Engineering & Technology², King Abdulaziz University³, Khalifa University⁴

01 Jun 2020-Mobile Networks and Applications

TL;DR: The experimental results show that the proposed BehavDT context-aware model is more effective when compared with the traditional machine learning approaches, in predicting user diverse behaviors considering multi-dimensional contexts.

...read moreread less

Abstract: This paper formulates the problem of building a context-aware predictive model based on user diverse behavioral activities with smartphones. In the area of machine learning and data science, a tree-like model as that of decision tree is considered as one of the most popular classification techniques, which can be used to build a data-driven predictive model. The traditional decision tree model typically creates a number of leaf nodes as decision nodes that represent context-specific rigid decisions, and consequently may cause overfitting problem in behavior modeling. However, in many practical scenarios within the context-aware environment, the generalized outcomes could play an important role to effectively capture user behavior. In this paper, we propose a behavioral decision tree, “BehavDT” context-aware model that takes into account user behavior-oriented generalization according to individual preference level. The BehavDT model outputs not only the generalized decisions but also the context-specific decisions in relevant exceptional cases. The effectiveness of our BehavDT model is studied by conducting experiments on individual user real smartphone datasets. Our experimental results show that the proposed BehavDT context-aware model is more effective when compared with the traditional machine learning approaches, in predicting user diverse behaviors considering multi-dimensional contexts.

...read moreread less

75 citations

Journal Article•DOI•

Privacy preserving vertical federated learning for tree-based models

[...]

Yuncheng Wu¹, Shaofeng Cai¹, Xiaokui Xiao¹, Gang Chen², Beng Chin Ooi¹ - Show less +1 more•Institutions (2)

National University of Singapore¹, Zhejiang University²

01 Jul 2020

TL;DR: Pivot as discussed by the authors is a solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output).

...read moreread less

Abstract: Federated learning (FL) is an emerging paradigm that enables multiple organizations to jointly train a model without revealing their private data to each other. This paper studies vertical federated learning, which tackles the scenarios where (i) collaborating organizations own data of the same set of users but with disjoint features, and (ii) only one organization holds the labels. We propose Pivot, a novel solution for privacy preserving vertical decision tree training and prediction, ensuring that no intermediate information is disclosed other than those the clients have agreed to release (i.e., the final tree model and the prediction output). Pivot does not rely on any trusted third party and provides protection against a semi-honest adversary that may compromise m - 1 out of m clients. We further identify two privacy leakages when the trained decision tree model is released in plain-text and propose an enhanced protocol to mitigate them. The proposed solution can also be extended to tree ensemble models, e.g., random forest (RF) and gradient boosting decision tree (GBDT) by treating single decision trees as building blocks. Theoretical and experimental analysis suggest that Pivot is efficient for the privacy achieved.

...read moreread less

55 citations

Journal Article•DOI•

Using near-infrared hyperspectral imaging with multiple decision tree methods to delineate black tea quality

[...]

Guangxin Ren¹, Yujie Wang¹, Jingming Ning¹, Zhengzhu Zhang¹•Institutions (1)

Anhui Agricultural University¹

15 Aug 2020-Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy

TL;DR: This work demonstrated that HSI coupled with intelligence algorithms as a rapid and effective strategy could be successfully applied to accurately identify the rank quality of black tea.

...read moreread less

43 citations

Journal Article•DOI•

Learning a Tree-Structured Ising Model in Order to Make Predictions

[...]

Guy Bresler, Mina Karzand

01 Apr 2020-Annals of Statistics

TL;DR: One of the main messages of this paper is that far fewer samples are needed than for recovering the underlying tree, which means that accurate predictions are possible using the wrong tree.

...read moreread less

Abstract: We study the problem of learning a tree Ising model from samples such that subsequent predictions made using the model are accurate. The prediction task considered in this paper is that of predicting the values of a subset of variables given values of some other subset of variables. Virtually all previous work on graphical model learning has focused on recovering the true underlying graph. We define a distance (“small set TV” or ssTV) between distributions $P$ and $Q$ by taking the maximum, over all subsets $\mathcal{S}$ of a given size, of the total variation between the marginals of $P$ and $Q$ on $\mathcal{S}$; this distance captures the accuracy of the prediction task of interest. We derive nonasymptotic bounds on the number of samples needed to get a distribution (from the same class) with small ssTV relative to the one generating the samples. One of the main messages of this paper is that far fewer samples are needed than for recovering the underlying tree, which means that accurate predictions are possible using the wrong tree.

...read moreread less

42 citations

Journal Article•DOI•

ContextPCA: Predicting Context-Aware Smartphone Apps Usage Based On Machine Learning Techniques

[...]

Iqbal H. Sarker, Yoosef B. Abushark, Asif Irshad Khan

01 Apr 2020-Symmetry

TL;DR: The experimental results on smartphone apps usage datasets show that “ContextPCA” model effectively predicts context-aware smartphone apps in terms of precision, recall, f-score and ROC values in various test cases.

...read moreread less

Abstract: This paper mainly formulates the problem of predicting context-aware smartphone apps usage based on machine learning techniques. In the real world, people use various kinds of smartphone apps differently in different contexts that include both the user-centric context and device-centric context. In the area of artificial intelligence and machine learning, decision tree model is one of the most popular approaches for predicting context-aware smartphone usage. However, real-life smartphone apps usage data may contain higher dimensions of contexts, which may cause several issues such as increases model complexity, may arise over-fitting problem, and consequently decreases the prediction accuracy of the context-aware model. In order to address these issues, in this paper, we present an effective principal component analysis (PCA) based context-aware smartphone apps prediction model, “ContextPCA” using decision tree machine learning classification technique. PCA is an unsupervised machine learning technique that can be used to separate symmetric and asymmetric components, and has been adopted in our “ContextPCA” model, in order to reduce the context dimensions of the original data set. The experimental results on smartphone apps usage datasets show that “ContextPCA” model effectively predicts context-aware smartphone apps in terms of precision, recall, f-score and ROC values in various test cases.

...read moreread less

34 citations

Journal Article•DOI•

Opinion fraud detection via neural autoencoder decision forest

[...]

Manqing Dong¹, Lina Yao¹, Xianzhi Wang¹, Boualem Benatallah¹, Chaoran Huang¹, Xiaodong Ning¹ - Show less +2 more•Institutions (1)

University of New South Wales¹

01 Apr 2020-Pattern Recognition Letters

TL;DR: In this paper, an end-to-end trainable unified model is presented to leverage the appealing properties from Autoencoder and random forest to evaluate the quality of products by distinguishing spamming reviews.

...read moreread less

30 citations

Proceedings Article•DOI•

Zero Knowledge Proofs for Decision Tree Predictions and Accuracy

[...]

Jiaheng Zhang¹, Zhiyong Fang², Yupeng Zhang², Dawn Song¹•Institutions (2)

University of California, Berkeley¹, Texas A&M University²

30 Oct 2020

TL;DR: The study of zero knowledge machine learning is initiated and protocols for zero knowledge decision tree predictions and accuracy tests are proposed, which allow the owner of a decision tree model to convince others that the model computes a prediction on a data sample, or achieves a certain accuracy on a public dataset without leaking any information about the model itself.

...read moreread less

Abstract: Machine learning has become increasingly prominent and is widely used in various applications in practice. Despite its great success, the integrity of machine learning predictions and accuracy is a rising concern. The reproducibility of machine learning models that are claimed to achieve high accuracy remains challenging, and the correctness and consistency of machine learning predictions in real products lack any security guarantees. In this paper, we initiate the study of zero knowledge machine learning and propose protocols for zero knowledge decision tree predictions and accuracy tests. The protocols allow the owner of a decision tree model to convince others that the model computes a prediction on a data sample, or achieves a certain accuracy on a public dataset, without leaking any information about the model itself. We develop approaches to efficiently turn decision tree predictions and accuracy into statements of zero knowledge proofs. We implement our protocols and demonstrate their efficiency in practice. For a decision tree model with 23 levels and 1,029 nodes, it only takes 250 seconds to generate a zero knowledge proof proving that the model achieves high accuracy on a dataset of 5,000 samples and 54 attributes, and the proof size is around 287 kilobytes.

...read moreread less

Journal Article•DOI•

Machine learning and decision support system on credit scoring

[...]

Germanno Teles¹, Joel J. P. C. Rodrigues, Kashif Saleem², Sergei A. Kozlov³, Ricardo A. L. Rabelo⁴ - Show less +1 more•Institutions (4)

University of Beira Interior¹, King Saud University², Saint Petersburg State University of Information Technologies, Mechanics and Optics³, Federal University of Piauí⁴

01 Jul 2020-Neural Computing and Applications

TL;DR: The study concludes that the two models make modelling of uncertainty in the credit scoring process possible and fuzzy logic is more accurate for modelling the uncertainty, however, the decision tree model is more favourable to the presentation of the problem.

...read moreread less

Abstract: Among the numerous alternatives used in the world of risk balance, it highlights the provision of guarantees in the formalization of credit agreements. The objective of this paper is to compare the achievement of fuzzy sets with that of artificial neural network-based decision trees on credit scoring to predict the recovered value using a sample of 1890 borrowers. Comparing with fuzzy logic, the decision analytic approach can more easily present the outcomes of the analysis. On the other hand, fuzzy logic makes some implicit assumptions that may make it even harder for credit-grantors to follow the logical decision-making process. This paper leads an initial study of collateral as a variable in the calculation of the credit scoring. The study concludes that the two models make modelling of uncertainty in the credit scoring process possible. Although more difficult to implement, fuzzy logic is more accurate for modelling the uncertainty. However, the decision tree model is more favourable to the presentation of the problem.

...read moreread less

Journal Article•DOI•

A new quantitative approach to tree attributes estimation based on LiDAR point clouds

[...]

Guangpeng Fan, Liangliang Nan, Feixiang Chen, Yanqi Dong, Zhiming Wang, Hao Li, Danyu Chen - Show less +3 more

01 Jun 2020-Remote Sensing

TL;DR: A new model for nondestructive estimation of tree volume, above-ground biomass (AGB) or carbon stock based on LiDAR data is provided and is in better consistency with the reference value based on field survey data.

...read moreread less

Abstract: Tree-level information can be estimated based on light detection and ranging (LiDAR) point clouds. We propose to develop a quantitative structural model based on terrestrial laser scanning (TLS) point clouds to automatically and accurately estimate tree attributes and to detect real trees for the first time. This model is suitable for forest research where branches are involved in the calculation. First, the Adtree method was used to approximate the geometry of the tree stem and branches by fitting a series of cylinders. Trees were represented as a broad set of cylinders. Then, the end of the stem or all branches were closed. The tree model changed from a cylinder to a closed convex hull polyhedron, which was to reconstruct a 3D model of the tree. Finally, to extract effective tree attributes from the reconstructed 3D model, a convex hull polyhedron calculation method based on the tree model was defined. This calculation method can be used to extract wood (including tree stem and branches) volume, diameter at breast height (DBH) and tree height. To verify the accuracy of tree attributes extracted from the model, the tree models of 153 Chinese scholartrees from TLS data were reconstructed and the tree volume, DBH and tree height were extracted from the model. The experimental results show that the DBH and tree height extracted based on this model are in better consistency with the reference value based on field survey data. The bias, RMSE and R2 of DBH were 0.38 cm, 1.28 cm and 0.92, respectively. The bias, RMSE and R2 of tree height were −0.76 m, 1.21 m and 0.93, respectively. The tree volume extracted from the model is in better consistency with the reference value. The bias, root mean square error (RMSE) and determination coefficient (R2) of tree volume were −0.01236 m3, 0.03498 m3 and 0.96, respectively. This study provides a new model for nondestructive estimation of tree volume, above-ground biomass (AGB) or carbon stock based on LiDAR data.

...read moreread less

Journal Article•DOI•

Deep Fuzzy Tree for Large-Scale Hierarchical Visual Classification

[...]

Yu Wang¹, Qinghua Hu¹, Pengfei Zhu¹, Linhao Li², Bingxu Lu¹, Jonathan M. Garibaldi³, Xianling Li - Show less +3 more•Institutions (3)

Tianjin University¹, Hebei University of Technology², University of Nottingham³

01 Jul 2020-IEEE Transactions on Fuzzy Systems

TL;DR: A deep fuzzy tree model is proposed which learns a better tree structure and classifiers for hierarchical classification with theory guarantee and experimental results show the effectiveness and efficiency of the proposed model in various visual classification datasets.

...read moreread less

Abstract: Deep learning models often use a flat softmax layer to classify samples after feature extraction in visual classification tasks. However, it is hard to make a single decision of finding the true label from massive classes. In this scenario, hierarchical classification is proved to be an effective solution and can be utilized to replace the softmax layer. A key issue of hierarchical classification is to construct a good label structure, which is very significant for classification performance. Several works have been proposed to address the issue, but they have some limitations and are almost designed heuristically. In this article, inspired by fuzzy rough set theory, we propose a deep fuzzy tree model which learns a better tree structure and classifiers for hierarchical classification with theory guarantee. Experimental results show the effectiveness and efficiency of the proposed model in various visual classification datasets.

...read moreread less

Journal Article•DOI•

Efficient mapping of crash risk at intersections with connected vehicle data and deep learning models.

[...]

Jiajie Hu¹, Ming-Chun Huang¹, Xiong Yu¹•Institutions (1)

Case Western Reserve University¹

01 Sep 2020-Accident Analysis & Prevention

TL;DR: Combination of CVs data (V2V and V2I and deep learning networks) is promising to determine crash risks at intersections with high time efficiency and at low CV penetration rates, which help to deploy countermeasures to reduce the crash rates and resolve traffic safety problems.

...read moreread less

Journal Article•DOI•

A decision tree to improve identification of pathogenic mutations in clinical practice

[...]

Priscilla Machado do Nascimento¹, Inácio Gomes Medeiros¹, Raul Maia Falcão¹, Beatriz Stransky¹, Jorge Estefano Santana de Souza¹ - Show less +1 more•Institutions (1)

Federal University of Rio Grande do Norte¹

10 Mar 2020-BMC Medical Informatics and Decision Making

TL;DR: The decision tree algorithm can be successfully applied as an alternative for the determination of potential pathogenicity of VOUS, producing consistently relevant forecasts for the sample tests with an accuracy close to the best ones achieved from supervised ML algorithms.

...read moreread less

Abstract: A variant of unknown significance (VUS) is a variant form of a gene that has been identified through genetic testing, but whose significance to the organism function is not known. An actual challenge in precision medicine is to precisely identify which detected mutations from a sequencing process have a suitable role in the treatment or diagnosis of a disease. The average accuracy of pathogenicity predictors is 85%. However, there is a significant discordance about the identification of mutational impact and pathogenicity among them. Therefore, manual verification is necessary for confirming the real effect of a mutation in its casuistic. In this work, we use variables categorization and selection for building a decision tree model, and later we measure and compare its accuracy with four known mutation predictors and seventeen supervised machine-learning (ML) algorithms. The results showed that the proposed tree reached the highest precision among all tested variables: 91% for True Neutrals, 8% for False Neutrals, 9% for False Pathogenic, and 92% for True Pathogenic. The decision tree exceptionally demonstrated high classification precision with cancer data, producing consistently relevant forecasts for the sample tests with an accuracy close to the best ones achieved from supervised ML algorithms. Besides, the decision tree algorithm is easier to apply in clinical practice by non-IT experts. From the cancer research community perspective, this approach can be successfully applied as an alternative for the determination of potential pathogenicity of VOUS.

...read moreread less

Journal Article•DOI•

Fake News Classification Using Random Forest and Decision Tree (J48)

[...]

Reham Jehad, Suhad A. Yousif¹•Institutions (1)

Nahrain University¹

30 Nov 2020

TL;DR: This research proposed utilizing two different machine learning algorithms (random forest and decision tree (J48)) to detect the fake news using the full dataset size and testing sample size.

...read moreread less

Abstract: Fake News is one of the most popular phenomena that have considerable effects on our social life, especially in the political domain. Nowadays, creating fake news becomes very easy because of users' widespread using the internet and social media. Therefore, the detection of elusiveness news is a crucial problem that needs to be considerable mainly because of its challenges like the limited amount of the benchmark datasets and the amount of the published news every second. This research proposed utilizing two different machine learning algorithms (random forest and decision tree (J48)) to detect the fake news. In this paper, the full dataset size equals 20,761 samples, while the testing sample size equals 4,345 samples. The preprocessing steps start with cleaning data by removing unnecessary special characters, numbers, English letters, and white spaces, and finally, removing stop words is implemented. After that, the most popular feature extraction method (TF-IDF) is used before applying the two suggested classification algorithms. The results show that the best accuracy achieved equals 89.11% using the decision tree model while using the random forest; the accuracy achieved equals 84.97 %.

...read moreread less

Journal Article•DOI•

Decision Tree-Based Classification for Planetary Gearboxes' Condition Monitoring with the Use of Vibration Data in Multidimensional Symptom Space.

[...]

Piotr Lipinski¹, Edyta Brzychczy², Radoslaw Zimroz³•Institutions (3)

University of Wrocław¹, AGH University of Science and Technology², Wrocław University of Technology³

22 Oct 2020-Sensors

TL;DR: By a combination of spectral analysis and the application of decision trees to a set of spectral features, this paper is able to take advantage of the multidimensionality of diagnostic data and classify/recognize the gearbox condition almost faultlessly even in non-stationary operating conditions.

...read moreread less

Abstract: Monitoring the condition of rotating machinery, especially planetary gearboxes, is a challenging problem. In most of the available approaches, diagnostic procedures are related to advanced signal pre-processing/feature extraction methods or advanced data (features) analysis by using artificial intelligence. In this paper, the second approach is explored, so an application of decision trees for the classification of spectral-based 15D vectors of diagnostic data is proposed. The novelty of this paper is that by a combination of spectral analysis and the application of decision trees to a set of spectral features, we are able to take advantage of the multidimensionality of diagnostic data and classify/recognize the gearbox condition almost faultlessly even in non-stationary operating conditions. The diagnostics of time-varying systems are a complicated issue due to time-varying probability densities estimated for features. Using multidimensional data instead of an aggregated 1D feature, it is possible to improve the efficiency of diagnostics. It can be underlined that in comparison to previous work related to the same data, where the aggregated 1D variable was used, the efficiency of the proposed approach is around 99% (ca. 19% better). We tested several algorithms: classification and regression trees with the Gini index and entropy, as well as the random tree. We compare the obtained results with the K-nearest neighbors classification algorithm and meta-classifiers, namely: random forest and AdaBoost. As a result, we created the decision tree model with 99.74% classification accuracy on the test dataset.

...read moreread less

Journal Article•DOI•

Machine learning-based e-commerce platform repurchase customer prediction model.

[...]

Cheng-Ju Liu, Tien-Shou Huang, Ping-Tsan Ho¹, Jui-Chan Huang, Ching-Tang Hsieh - Show less +1 more•Institutions (1)

Cheng Shiu University¹

03 Dec 2020-PLOS ONE

TL;DR: It is proved that the algorithm selected in this paper can effectively filter the features, which simplifies the complexity of the model to a certain extent and improves the classification accuracy of machine learning.

...read moreread less

Abstract: In recent years, China's e-commerce industry has developed at a high speed, and the scale of various industries has continued to expand. Service-oriented enterprises such as e-commerce transactions and information technology came into being. This paper analyzes the shortcomings and challenges of traditional online shopping behavior prediction methods, and proposes an online shopping behavior analysis and prediction system. The paper chooses linear model logistic regression and decision tree based XGBoost model. After optimizing the model, it is found that the nonlinear model can make better use of these features and get better prediction results. In this paper, we first combine the single model, and then use the model fusion algorithm to fuse the prediction results of the single model. The purpose is to avoid the accuracy of the linear model easy to fit and the decision tree model over-fitting. The results show that the model constructed by the article has further improvement than the single model. Finally, through two sets of contrast experiments, it is proved that the algorithm selected in this paper can effectively filter the features, which simplifies the complexity of the model to a certain extent and improves the classification accuracy of machine learning. The XGBoost hybrid model based on p/n samples is simpler than a single model. Machine learning models are not easily over-fitting and therefore more robust.

...read moreread less

Journal Article•DOI•

TNT: An Interpretable Tree-Network-Tree Learning Framework using Knowledge Distillation.

[...]

Jiawei Li¹, Yiming Li¹, Xingchun Xiang¹, Shu-Tao Xia¹, Siyi Dong, Yun Cai - Show less +2 more•Institutions (1)

Tsinghua University¹

24 Oct 2020-Entropy

TL;DR: A Tree-Network-Tree (TNT) learning framework for explainable decision-making, where the knowledge is alternately transferred between the tree model and DNNs is proposed, and extensive experiments demonstrated the effectiveness of the proposed method.

...read moreread less

Abstract: Deep Neural Networks (DNNs) usually work in an end-to-end manner. This makes the trained DNNs easy to use, but they remain an ambiguous decision process for every test case. Unfortunately, the interpretability of decisions is crucial in some scenarios, such as medical or financial data mining and decision-making. In this paper, we propose a Tree-Network-Tree (TNT) learning framework for explainable decision-making, where the knowledge is alternately transferred between the tree model and DNNs. Specifically, the proposed TNT learning framework exerts the advantages of different models at different stages: (1) a novel James–Stein Decision Tree (JSDT) is proposed to generate better knowledge representations for DNNs, especially when the input data are in low-frequency or low-quality; (2) the DNNs output high-performing prediction result from the knowledge embedding inputs and behave as a teacher model for the following tree model; and (3) a novel distillable Gradient Boosted Decision Tree (dGBDT) is proposed to learn interpretable trees from the soft labels and make a comparable prediction as DNNs do. Extensive experiments on various machine learning tasks demonstrated the effectiveness of the proposed method.

...read moreread less

Journal Article•DOI•

Classification Models for Determining Types of Academic Risk and Predicting Dropout in University Students

[...]

Norka Bedregal-Alpaca, Víctor Cornejo-Aparicio, Joshua Zárate-Valderrama, Pedro Yanque-Churo

01 Jan 2020-International Journal of Advanced Computer Science and Applications

TL;DR: An application that uses academic information provided by the university and generates classification models from three different algorithms, artificial neural networks, ID3 and C4.5, concluded that the ratio of credits approved by a student to the credits that he should have taken is the variable more significant.

...read moreread less

Abstract: Academic performance is a topic studied not only to identify those students who could drop out of their studies, but also to classify them according to the type of academic risk they could find themselves. An application has been implemented that uses academic information provided by the university and generates classification models from three different algorithms: artificial neural networks, ID3 and C4.5. The models created use a set of variables and criteria for their construction and can be used to classify student desertion and more specifically to predict their type of academic risk. The performance of these models was compared to define the one that provided the best results and that will serve to make the classification of students. Decision tree algorithms, C4.5 and ID3, presented better measurements with respect to the artificial neural network. The tree generated using the C4.5 algorithm presented the best performance metrics with correctness, accuracy, and sensitivity equal to 0.83, 0.87, and 0.90 respectively. As a result of the classification to determine student desertion it was concluded, according to the model generated using the C4.5 algorithm, that the ratio of credits approved by a student to the credits that he should have taken is the variable more significant. The classification, depending on the type of academic risk, generated a tree model indicating that the number of abandoned subjects is the most significant variable. The admission scan modality through which the student entered the university did not turn out to be significant, as it does not appear in the generated decision tree.

...read moreread less

Book Chapter•DOI•

Decision Tree with Sensitive Pruning in Network-based Intrusion Detection System

[...]

Yee Jian Chew¹, Shih Yin Ooi¹, Kok-Seng Wong², Ying Han Pang¹•Institutions (2)

Multimedia University¹, Nazarbayev University²

01 Jan 2020

TL;DR: A sensitive pruning-based decision tree to tackle the privacy issues in this domain is proposed and the proposed pruning algorithm is modified based on C4.8 decision tree (better known as J48 in Weka package).

...read moreread less

Abstract: Machine learning techniques have been extensively adopted in the domain of Network-based Intrusion Detection System (NIDS) especially in the task of network traffics classification. A decision tree model with its kinship terminology is very suitable in this application. The merit of its straightforward and simple “if-else” rules makes the interpretation of network traffics easier. Despite its powerful classification and interpretation capacities, the visibility of its tree rules is introducing a new privacy risk to NIDS where it reveals the network posture of the owner. In this paper, we propose a sensitive pruning-based decision tree to tackle the privacy issues in this domain. The proposed pruning algorithm is modified based on C4.8 decision tree (better known as J48 in Weka package). The proposed model is tested with the 6 percent GureKDDCup NIDS dataset.

...read moreread less

Journal Article•DOI•

Software Metrics and tree-based machine learning algorithms for distinguishing and detecting similar structure design patterns

[...]

Mohammad Y. Mhawish¹, Manjari Gupta¹•Institutions (1)

Banaras Hindu University¹

01 Jan 2020

TL;DR: This paper proposes a design pattern detection approach based on tree-based machine learning algorithms and software metrics to study the effectiveness of software metrics in distinguishing between similar structural design patterns.

...read moreread less

Abstract: Design patterns are general reusable solutions for recurrent occurring problems. When software systems become more complicated due to the lack of documentation of design patterns in software and the maintenance and evolution costs become a challenge. Design pattern detection is used to reduce the complexity and to increase the understandability of the design in the software. In this paper, we propose a design pattern detection approach based on tree-based machine learning algorithms and software metrics to study the effectiveness of software metrics in distinguishing between similar structural design patterns. We build our datasets using P-MARt repository by extracting the roles of design patterns and calculating the metrics for each role. We used parameter optimization techniques based on the Grid search algorithm to define the optimal parameter of each algorithm. We used two feature selection methods based on a genetic algorithm to find features that influence the most in the distinguishing process. Through our experimental study, we showed the effectiveness of machine learning and software metrics when distinguishing similar structure design patterns. Moreover, we extracted the essential metrics in each dataset that supported the machine learning model to take its decision. We presented the detection conditions for each role in the design pattern by extracting them from the decision tree model.

...read moreread less

Journal Article•DOI•

Classification of multiclass imbalanced data using cost-sensitive decision tree C5.0

[...]

M. Aldiki Febriantono¹, Sholeh Hadi Pramono¹, Rahmadwati Rahmadwati¹, Golshah Naghdy²•Institutions (2)

University of Brawijaya¹, University of Wollongong²

01 Mar 2020-IAES International Journal of Artificial Intelligence

TL;DR: In this research, cost sensitive decision tree C5.0 was used to solve multiclass imbalanced data problems and had better performance than C4.5 and ID3 algorithms.

...read moreread less

Abstract: The multiclass imbalanced data problems in data mining were an interesting to study currently. The problems had an influence on the classification process in machine learning processes. Some cases showed that minority class in the dataset had an important information value compared to the majority class. When minority class was misclassification, it would affect the accuracy value and classifier performance. In this research, cost sensitive decision tree C5.0 was used to solve multiclass imbalanced data problems. The first stage, making the decision tree model uses the C5.0 algorithm then the cost sensitive learning uses the metacost method to obtain the minimum cost model. The results of testing the C5.0 algorithm had better performance than C4.5 and ID3 algorithms. The percentage of algorithm performance from C5.0, C4.5 and ID3 were 40.91%, 40, 24% and 19.23%.

...read moreread less

Journal Article•DOI•

Prediction of Flight Time Deviation for Lithuanian Airports Using Supervised Machine Learning Model.

[...]

Pavel Stefanovič¹, Rokas Štrimaitis¹, Olga Kurasova²•Institutions (2)

Vilnius Gediminas Technical University¹, Vilnius University²

26 Oct 2020-Computational Intelligence and Neuroscience

TL;DR: The research results showed that the highest accuracy is obtained using the tree model classifiers and the best algorithm of this type to predict is gradient boosted trees.

...read moreread less

Abstract: In the paper, the flight time deviation of Lithuania airports has been analyzed. The supervised machine learning model has been implemented to predict the interval of time delay deviation of new flights. The analysis has been made using seven algorithms: probabilistic neural network, multilayer perceptron, decision trees, random forest, tree ensemble, gradient boosted trees, and support vector machines. To find the best parameters which give the highest accuracy for each algorithm, the grid search has been used. To evaluate the quality of each algorithm, the five measures have been calculated: sensitivity/recall, precision, specificity, F-measure, and accuracy. All experimental investigation has been made using the newly collected dataset from Lithuania airports and weather information on departure/landing time. The departure flights and arrival flights have been investigated separately. To balance the dataset, the SMOTE technique is used. The research results showed that the highest accuracy is obtained using the tree model classifiers and the best algorithm of this type to predict is gradient boosted trees.

...read moreread less

Journal Article•DOI•

Dimensional Reduction on Cross Project Defect Prediction

[...]

Aries Saifudin, Yulianti Yulianti

01 Mar 2020

TL;DR: Several dimensional reduction algorithm and Decision Tree as classifier are used and all models that implement dimensional reduction can significantly improve the performance of the Decision Tree model.

...read moreread less

Abstract: The complexity of the software can increase the possibility of defects. Defective software can cause high losses. The software containing defects can cause large losses. Most software developers don't document their work properly so that making it difficult to analyse software development history data. The cross-project software defect prediction used several datasets from different projects and combining for training and testing. The dataset with high dimension can cause bias, contain irrelevance data, and require large resources to process it. In this study, several dimensional reduction algorithm and Decision Tree as classifier. Based on the analysis using ANOVA, all models that implement dimensional reduction can significantly improve the performance of the Decision Tree model.

...read moreread less

Journal Article•DOI•

Estimation of Traffic Incident Duration: A Comparative Study of Decision Tree Models

[...]

Abdulsamet Saracoglu¹, Halit Ozen¹•Institutions (1)

Yıldız Technical University¹

01 Oct 2020-Arabian Journal for Science and Engineering

TL;DR: In this article, the authors presented a methodology to establish incident duration estimation models by utilizing decision tree models of CHAID, CART, C4.5 and LMT.

...read moreread less

Abstract: Unexpected events such as crashes, disabled vehicles, flat tires and spilled loads cause traffic congestion or extend the duration of the traffic congestion on the roadways. It is possible to reduce the effects of such incidents by implementing intelligent transportation systems solutions that require the estimation of the incident duration to identify well-fitted strategies. This paper presents a methodology to establish incident duration estimation models by utilizing decision tree models of CHAID, CART, C4.5 and LMT. For this study, the data contained traffic incidents that occurred on the Istanbul Trans European Motorway were obtained and separated into three groups according to duration by utilizing some studies about classification of traffic incidents. By using classified data, decision tree models of CHAID, CART, C4.5 and LMT were established and validated to estimate the incident duration. According to the results, although the models used different variables, the decision tree models of CHAID, CART and C4.5 have nearly the same prediction accuracy which is approximately 74%. On the other hand, the prediction accuracy of decision tree model of LMT is 75.4% which is somewhat better than the others. However, C4.5 model required less number of parameters than the others, while its accuracy is the same with others.

...read moreread less

Journal Article•DOI•

Estimation of Diabetes in a High-Risk Adult Chinese Population Using J48 Decision Tree Model

[...]

Dongmei Pei¹, TengFei Yang¹, Chengpu Zhang¹•Institutions (1)

China Medical University (PRC)¹

26 Nov 2020-Diabetes, Metabolic Syndrome and Obesity: Targets and Therapy

TL;DR: By applying simple and cost-effective classification rules, the decision tree model estimates the development of diabetes in a high-risk adult Chinese population with strong potential for implementation of diabetes management.

...read moreread less

Abstract: Background To predict and make an early diagnosis of diabetes is a critical approach in a population with high risk of diabetes, one of the devastating diseases globally. Traditional and conventional blood tests are recommended for screening the suspected patients; however, applying these tests could have health side effects and expensive cost. The goal of this study was to establish a simple and reliable predictive model based on the risk factors associated with diabetes using a decision tree algorithm. Methods A retrospective cross-sectional study was used in this study. A total of 10,436 participants who had a health check-up from January 2017 to July 2017 were recruited. With appropriate data mining approaches, 3454 participants remained in the final dataset for further analysis. Seventy percent of these participants (2420 cases) were then randomly allocated to either the training dataset for the construction of the decision tree or the testing dataset (30%, 1034 cases) for evaluation of the performance of the decision tree. For this purpose, the cost-sensitive J48 algorithm was used to develop the decision tree model. Results Utilizing all the key features of the dataset consisting of 14 input variables and two output variables, the constructed decision tree model identified several key factors that are closely linked to the development of diabetes and are also modifiable. Furthermore, our model achieved an accuracy of classification of 90.3% with a precision of 89.7% and a recall of 90.3%. Conclusion By applying simple and cost-effective classification rules, our decision tree model estimates the development of diabetes in a high-risk adult Chinese population with strong potential for implementation of diabetes management.

...read moreread less

Journal Article•DOI•

A brain connectivity characterization of children with different levels of mathematical achievement based on graph metrics

[...]

Sulema Torres-Ramos¹, Ricardo A. Salido-Ruiz¹, Aurora Espinoza-Valdez¹, Fabiola R. Gómez-Velázquez¹, Andrés A. González-Garrido¹, Israel Román-Godínez¹ - Show less +2 more•Institutions (1)

University of Guadalajara¹

17 Jan 2020-PLOS ONE

TL;DR: An alternative EEG signal characterization using graph metrics and, based on such features, a classification analysis using a decision tree model is introduced to identify group differences in brain connectivity networks with respect to mathematical skills in elementary school children.

...read moreread less

Abstract: Recent studies aiming to facilitate mathematical skill development in primary school children have explored the electrophysiological characteristics associated with different levels of arithmetic achievement. The present work introduces an alternative EEG signal characterization using graph metrics and, based on such features, a classification analysis using a decision tree model. This proposal aims to identify group differences in brain connectivity networks with respect to mathematical skills in elementary school children. The methods of analysis utilized were signal-processing (EEG artifact removal, Laplacian filtering, and magnitude square coherence measurement) and the characterization (Graph metrics) and classification (Decision Tree) of EEG signals recorded during performance of a numerical comparison task. Our results suggest that the analysis of quantitative EEG frequency-band parameters can be used successfully to discriminate several levels of arithmetic achievement. Specifically, the most significant results showed an accuracy of 80.00% (α band), 78.33% (δ band), and 76.67% (θ band) in differentiating high-skilled participants from low-skilled ones, averaged-skilled subjects from all others, and averaged-skilled participants from low-skilled ones, respectively. The use of a decision tree tool during the classification stage allows the identification of several brain areas that seem to be more specialized in numerical processing.

...read moreread less

Journal Article•DOI•

Learning Explainable Decision Rules via Maximum Satisfiability

[...]

Henrik E. C. Cao¹, Riku Sarlin², Alexander Jung¹•Institutions (2)

Aalto University¹, Social Insurance Institution²

27 Nov 2020-IEEE Access

TL;DR: This work applies tools from constraint satisfaction to learn optimal decision trees in the form of sparse k-CNF (Conjunctive Normal Form) rules, which are significantly more accurate than those learned by existing heuristic approaches.

...read moreread less

Abstract: Decision trees are a popular choice for providing explainable machine learning, since they make explicit how different features contribute towards the prediction. We apply tools from constraint satisfaction to learn optimal decision trees in the form of sparse k-CNF (Conjunctive Normal Form) rules. We develop two methods offering different trade-offs between accuracy and computational complexity: one offline method that learns decision trees using the entire training dataset and one online method that learns decision trees over a local subset of the training dataset. This subset is obtained from training examples near a query point. The developed methods are applied on a number of datasets both in an online and an offline setting. We found that our methods learn decision trees which are significantly more accurate than those learned by existing heuristic approaches. However, the global decision tree model tends to be computationally more expensive compared to heuristic approaches. The online method is faster to train and finds smaller decision trees with an accuracy comparable to that of the k-nearest-neighbour method.

...read moreread less

Collapse