Showing papers on "Decision tree model published in 2018"

PDF

Open Access

Book Chapter•DOI•

Comparative Analysis of Decision Tree Algorithms

[...]

Mridula Batra¹, Rashmi Agrawal¹•Institutions (1)

01 Jan 2018

TL;DR: The aim of this paper is to do detailed analysis of decision tree and its variants for determining the best appropriate decision and to analyze and compare various decision tree algorithms such as ID3, C4.5, CART, and CHAID.

...read moreread less

Abstract: Decision trees are outstanding tools to help anyone to select the best course of action. They generate a highly valuable arrangement in which one can place options and study possible outcomes of those options. They also facilitate users to make a fair idea of the pros and cons related to each possible action. A decision tree is used to represent graphically the decisions, the events, and the outcomes related to decisions and events. Events are probabilistic and determined for each outcome. The aim of this paper is to do detailed analysis of decision tree and its variants for determining the best appropriate decision. For this, we will analyze and compare various decision tree algorithms such as ID3, C4.5, CART, and CHAID.

...read moreread less

51 citations

Journal Article•DOI•

Threesomes, Degenerates, and Love Triangles

[...]

Allan Grønlund¹, Seth Pettie²•Institutions (2)

Aarhus University¹, University of Michigan²

24 Apr 2018-Journal of the ACM

TL;DR: This article proves that the decision tree complexity of 3SUM is O(n3/2√ log n) and proves the conjecture that its complexity is Ω (n2) in the linear decision tree model.

...read moreread less

Abstract: The 3SUM problem is to decide, given a set of n real numbers, whether any three sum to zero. It is widely conjectured that a trivial O(n2)-time algorithm is optimal on the Real RAM, and optimal even in the nonuniform linear decision tree model. Over the years the consequences of this conjecture have been revealed. This 3SUM conjecture implies Ω (n2) lower bounds on numerous problems in computational geometry, and a variant of the conjecture for integer inputs implies strong lower bounds on triangle enumeration, dynamic graph algorithms, and string matching data structures. In this article, we refute the conjecture that 3SUM requires Ω (n2) in the Real RAM and refute more forcefully the conjecture that its complexity is Ω (n2) in the linear decision tree model. In particular, we prove that the decision tree complexity of 3SUM is O(n3/2S log n) and give two subquadratic 3SUM algorithms, a deterministic one running in O(n2 / (log n/ log log n)2/3) time and a randomized one running in O(n2(log log n)2 / log n) time with high probability. Our results lead directly to improved bounds on the decision tree complexity of k-variate linear degeneracy testing for all odd k≥ 3.Finally, we give a subcubic algorithm for a generalization of the (min ,+)-product over real-valued matrices and apply it to the problem of finding zero-weight triangles in edge-weighted graphs. We give a depth-O(n5/2S log n) decision tree for this problem, as well as a deterministic algorithm running in time O(n3 (log log n)2/log n).

...read moreread less

44 citations

Journal Article•

5. Freeing the Comparative Method from the Tree Model : A Framework for Historical Glottometry

[...]

Siva Kalyan, Alexandre François

16 Mar 2018-Senri ethnological studies

TL;DR: This paper defines “Historical Glottometry”, a new method capable of identifying and representing genealogical subgroups even when they intersect, and applies this glottometric method to a specific linkage, consisting of 17 Oceanic languages spoken in northern Vanuatu.

...read moreread less

Abstract: Since the beginnings of historical linguistics, the family tree has been the most widely accepted model for representing historical relations between languages. While this sort of representation is easy to grasp, and allows for a simple, attractive account of the development of a language family, the assumptions made by the tree model are applicable in only a small number of cases: namely, when a speaker population undergoes successive splits followed by complete loss of contact. A tree structure is unsuited for dealing with dialect continua, and language families that develop out of dialect continua (“linkages”, as Ross 1988 calls them); in these situations, the scopes of innovations (their isoglosses) are not nested, but rather they constantly intersect, so that any proposed tree representation is met with abundant counterexamples. In this paper, we define “Historical Glottometry”, a new method capable of identifying and representing genealogical subgroups even when they intersect. We apply this glottometric method to a specific linkage, consisting of 17 Oceanic languages spoken in northern Vanuatu.

...read moreread less

34 citations

Journal Article•DOI•

Predication of different stages of Alzheimer's disease using neighborhood component analysis and ensemble decision tree.

[...]

Mingwu Jin¹, Weishu Deng¹•Institutions (1)

University of Texas at Arlington¹

15 May 2018-Journal of Neuroscience Methods

TL;DR: The results suggest that NCA be a better feature selection strategy than PCA and SFS for the data used in this study, and the boosting tree model with NCA features outperforms all other combinations of feature selection and classification methods.

...read moreread less

32 citations

Journal Article•DOI•

Fault Diagnosis of Face Milling Tool using Decision Tree and Sound Signal

[...]

C. K. Madhusudana¹, Hemantha Kumar¹, S. Narendranath¹•Institutions (1)

National Institute of Technology, Karnataka¹

01 Jan 2018-Materials Today: Proceedings

TL;DR: It is suggested that the proposed method which comprises of decision tree and DWT techniques with sound signals can be recommended for the applications of fault diagnosis of the face milling tool.

...read moreread less

32 citations

Journal Article•DOI•

Machine Learning Approach for Bottom 40 Percent Households (B40) Poverty Classification

[...]

Nor Samsiah Sani¹, Mariah Abdul Rahman¹, Azuraliza Abu Bakar¹, Shahnurbanon Sahran¹, Hafiz Mohd Sarim¹ - Show less +1 more•Institutions (1)

National University of Malaysia¹

30 Sep 2018-International Journal on Advanced Science, Engineering and Information Technology

TL;DR: This paper is aimed at identifying the best machine learning models using Naive Bayes, Decision Tree and k-Nearest Neighbors algorithm for classifying the B40 population in Malaysia and demonstrates that the overall performance of Decision Tree model outperformed the other models.

...read moreread less

Abstract: Malaysia citizens are categorised into three different income groups which are the Top 20 Percent (T20), Middle 40 Percent (M40), and Bottom 40 Percent (B40). One of the focus areas in the Eleventh Malaysia Plan (11MP) is to elevate the B40 household group towards the middle-income society. Based on recent studies by the World Bank, Malaysia is expected to enter the high-income economy status no later than the year 2024. Thus, it is essential to clarify the B40 population through a predictive classification as a prerequisite towards developing a comprehensive action plan by the government. This paper is aimed at identifying the best machine learning models using Naive Bayes, Decision Tree and k-Nearest Neighbors algorithm for classifying the B40 population. Several data pre-processing task such as data cleaning, feature engineering, normalisation, feature selection: Correlation Attribute, Information Gain Attribute and Symmetrical Uncertainty Attribute and sampling methods using SMOTE has been conducted to the raw dataset to ensure the quality of the training data. Each classifier is then optimized using different tuning parameter with 10-Fold Cross Validation for achieving the optimal values before the performance of the three classifiers are compared to each other. For the experiments, a dataset from National Poverty Data Bank called eKasih obtained from the Society Wellbeing Department, Implementation Coordination Unit of Prime Minister's Department (ICU JPM), consisting of 99,546 households from 3 different states: Johor, Terengganu and Pahang are used to train each of the machine learning model. The experimental results using 10-Fold Cross-Validation method demonstrates that the overall performance of Decision Tree model outperformed the other models and the significance test specified the result is statistically significance.

...read moreread less

27 citations

Journal Article•DOI•

The Shape Space of 3D Botanical Tree Models

[...]

Guan Wang¹, Hamid Laga², Ning Xie³, Jinyuan Jia¹, Hedi Tabia⁴ - Show less +1 more•Institutions (4)

Tongji University¹, University of South Australia², University of Electronic Science and Technology of China³, Centre national de la recherche scientifique⁴

18 Jan 2018-ACM Transactions on Graphics

TL;DR: An algorithm for generating novel 3D tree model variations from existing ones via geometric and structural blending and the application of the framework in reflection symmetry analysis and symmetrization of botanical trees is demonstrated.

...read moreread less

Abstract: We propose an algorithm for generating novel 3D tree model variations from existing ones via geometric and structural blending. Our approach is to treat botanical trees as elements of a tree-shape space equipped with a proper metric that quantifies geometric and structural deformations. Geodesics, or shortest paths under the metric, between two points in the tree-shape space correspond to optimal deformations that align one tree onto another, including the possibility of expanding, adding, or removing branches and parts. Central to our approach is a mechanism for computing correspondences between trees that have different structures and a different number of branches. The ability to compute geodesics and their lengths enables us to compute continuous blending between botanical trees, which, in turn, facilitates statistical analysis, such as the computation of averages of tree structures. We show a variety of 3D tree models generated with our approach from 3D trees exhibiting complex geometric and structural differences. We also demonstrate the application of the framework in reflection symmetry analysis and symmetrization of botanical trees.

...read moreread less

27 citations

Journal Article•DOI•

Analysis and Study of Diabetes Follow-Up Data Using a Data-Mining-Based Approach in New Urban Area of Urumqi, Xinjiang, China, 2016-2017.

[...]

Yukai Li¹, Huling Li¹, Hua Yao²•Institutions (2)

Xinjiang Medical University¹, First Affiliated Hospital of Xinjiang Medical University²

10 Jul 2018-Computational and Mathematical Methods in Medicine

TL;DR: The experimental results show that Adaboost algorithm produces better classification results than the decision tree model in the test set, and the prediction results of these classification models are sufficient.

...read moreread less

Abstract: The focus of this study is the use of machine learning methods that combine feature selection and imbalanced process (SMOTE algorithm) to classify and predict diabetes follow-up control satisfaction data. After the feature selection and unbalanced process, diabetes follow-up data of the New Urban Area of Urumqi, Xinjiang, was used as input variables of support vector machine (SVM), decision tree, and integrated learning model (Adaboost and Bagging) for modeling and prediction. The experimental results show that Adaboost algorithm produces better classification results. For the test set, the G-mean was 94.65%, the area under the ROC curve (AUC) was 0.9817, and the important variables in the classification process, fasting blood glucose, age, and BMI were given. The performance of the decision tree model in the test set is relatively lower than that of the support vector machine and the ensemble learning model. The prediction results of these classification models are sufficient. Compared with a single classifier, ensemble learning algorithms show different degrees of increase in classification accuracy. The Adaboost algorithm can be used for the prediction of diabetes follow-up and control satisfaction data.

...read moreread less

26 citations

Journal Article•DOI•

Decision making model to predict presence of coronary artery disease using neural network and C5.0 decision tree

[...]

Ehsan Ahmadi¹, Gary R. Weckman¹, Dale T. Masel¹•Institutions (1)

Ohio University¹

01 Aug 2018-Journal of Ambient Intelligence and Humanized Computing

TL;DR: This paper employs neural networks (NN) and a boosted C5.0 decision tree model to predict CAD for the well-known Cleveland Heart Disease dataset to tune the optimal size and configuration of the neural networks and identify the insensitive features in both models, followed by assessing the effect of eliminating such features in the results.

...read moreread less

Abstract: Clinical decision support systems have always assisted physicians in diagnosing diseases. Coronary artery disease (CAD) is currently responsible for a large percentage of deaths, which motivated researchers to propose more accurate prediction models. This paper employs neural networks (NN) and a boosted C5.0 decision tree model to predict CAD for the well-known Cleveland Heart Disease dataset. We attempt to tune the optimal size and configuration of the neural networks and identify the insensitive features in both models, followed by assessing the effect of eliminating such features in the results. Both models are evaluated through ten experiments, each of which has different training and testing datasets, but with the same size. The most and the least important input features in each model are determined. The performance of the reduced dataset, i.e., the removed insignificant features contributing to the models, has been evaluated through statistical tests. Our results show that there is no significant difference between running the NN and C5.0 algorithms by initial dataset in terms of three performance criteria: positive prediction value (PPV), negative prediction value (NPV) and total accuracy value (TAV). Regarding the TAV criterion, the NN applied to the reduced dataset outperforms the C5.0 model with a 95% confidence interval. Finally, further discussion shows the trade-off between the NPV and PPV.

...read moreread less

24 citations

Journal Article•DOI•

Hierarchical co-evolutionary clustering tree-based rough feature game equilibrium selection and its application in neonatal cerebral cortex MRI

[...]

Weiping Ding¹, Weiping Ding², Chin-Teng Lin¹, Mukesh Prasad¹•Institutions (2)

University of Technology, Sydney¹, Nantong University²

01 Jul 2018-Expert Systems With Applications

TL;DR: The proposed CTFGES algorithm has been successfully applied into the feature segmentation of large-scale neonatal cerebral cortex MRI with varying noise ratios and intensity non-uniformity levels, and indicates that it can be adaptive to derive from the cortical folding surfaces and achieves the satisfying consistency with medical experts.

...read moreread less

Abstract: A wide variety of feature selection methods have been developed as promising solutions to find the classification pattern inside increasing applications. But the exploring efficient, flexible and robust feature selection method to handle the rising big data is still an exciting challenge. This paper presents a novel hierarchical co-evolutionary clustering tree-based rough feature game equilibrium selection algorithm (CTFGES). It aims to select out the high-quality feature subsets, which can enrich the research of feature selection and classification in the heterogeneous big data. Firstly, we construct a flexible hierarchical co-evolutionary clustering tree model to speed up the process of feature selection, which can effectively extract the features from the parent and children branches of four-layer co-evolutionary clustering tree. Secondly, we design a mixed co-evolutionary game equilibrium scheme with adaptive dynamics to guide parent and children branch subtrees to approach the optimal equilibrium regions, and enable their feature sets to converge stably to the Nash equilibrium. So both noisy heterogeneous features and non-identified redundant ones can be further eliminated. Finally, the extensive experiments on various big datasets are conducted to demonstrate the more excellent performance of CTFGES, in terms of accuracy, efficiency and robustness, compared with the representative feature selection algorithms. In addition, the proposed CTFGES algorithm has been successfully applied into the feature segmentation of large-scale neonatal cerebral cortex MRI with varying noise ratios and intensity non-uniformity levels. The results indicate that it can be adaptive to derive from the cortical folding surfaces and achieves the satisfying consistency with medical experts, which will be potential significance for successfully assessing the impact of aberrant brain growth on the neurodevelopment of neonatal cerebrum.

...read moreread less

18 citations

Posted Content•DOI•

Mining Significant Features of Diabetes Mellitus Applying Decision Trees: A Case Study In Bangladesh

[...]

Koushik Chandra Howladar¹, Md. Shahriare Satu, Avijit Barua¹, Mohammad Ali Moni²•Institutions (2)

Noakhali Science and Technology University¹, University of Sydney²

30 Nov 2018-bioRxiv

TL;DR: CDT unpruned tree shows highest accuracy, precision, recall, f-measure, second highest AUROC and lowest RMSE than other models and plasma glucose, plasma glucose 2hr after glucose and HDL-cholesterol have been found as the most significant features to predict the severity of Diabetes Mellitus.

...read moreread less

Abstract: Diabetes is a chronic condition which is associated with an abnormally high level of sugar in the blood. It is a lifelong disease that causes harmful effects in human life. The goal of this research is to predict the severity of diabetes and find out significant features of it. In this work, we gathered diabetes patients records from Noakhali Diabetes Association, Noakhali, Bangladesh. Thus, We preprocessed our raw dataset by replacing and removing missing and wrong records respectively. Thus, CDT, J48, NBTree and REPtree decision tree based classification techniques were used to analyze this dataset. After this analysis, we evaluated classification outcomes of these decision tree classifiers and found the best decision tree model from them. In this work, CDT unpruned tree shows highest accuracy, precision, recall, f-measure, second highest AUROC and lowest RMSE than other models. Then, we extracted possible rules and significant features from this model and plasma glucose, plasma glucose 2hr after glucose and HDL-cholesterol have been found as the most significant features to predict the severity of Diabetes Mellitus. We hope this work will be beneficial to build a predictive system and complementary tool for diabetes treatment in future.

...read moreread less

Proceedings Article•DOI•

Analysis and Diagnosis of Erythemato-Squamous Diseases Using CHAID Decision Trees

[...]

Alaa M. Elsayad¹, Mujahed Al-Dhaifallah², Ahmed M. Nassef¹•Institutions (2)

Salman bin Abdulaziz University¹, King Fahd University of Petroleum and Minerals²

19 Mar 2018

TL;DR: Experimental results showed that bagged ensemble outperforms other modeling algorithms and the prediction accuracies of these models are benchmarked against the Artificial Neural Network in terms of statistical accuracy, specificity, sensitivity, precision, true positive rate, true negative rate and F-score.

...read moreread less

Abstract: Erythemato-squamous diseases (ESDs) are common skin diseases. They consist of six different categories: psoriasis, seboreic dermatitis, lichen planus, pityriasis rosea, chronic dermatitis and pityriasis rubra pilaris. They all share the clinical features of erythema and scaling with very little differences. Their automatic detection is a challenging problem as they have overlapping signs and symptoms. This study evaluates the performance of CHAID decision trees (DTs) for the analysis and diagnosis of ESDs. DTs are nonparametric methods with no priori assumptions about the space distribution with the ability to generate understandable classification rules. This property makes them very efficient tools for physicians and medical specialists to understand the data and inspect the knowledge behind. The Chi-Squared Automatic Interaction Detection (CHAID) decision tree model is a very fast model with the ability to build wider decision trees and to handle all kinds of input variables (features). The CHAID model has many successful achievements especially when used as an interpreter rather than a classifier. Due to the small number of samples, this study uses Chi-square test with the Likelihood Ratio (LR) to get robust results. Ensembles of bagged and boosted CHAIDs were introduced to improve the stability and the accuracy of the model, but on the expense of interpretability. This paper presents the experimental results of the application of CHAID decision trees and their bagged and boosted ensembles for the deferential diagnosis of ESD using both clinical and histopathological features. The prediction accuracies of these models are benchmarked against the Artificial Neural Network (ANN) in terms of statistical accuracy, specificity, sensitivity, precision, true positive rate, true negative rate and F-score. Experimental results showed that bagged ensemble outperforms other modeling algorithms.

...read moreread less

Journal Article•DOI•

Multi-temporal fine-scale modelling of Larix decidua forest plots using terrestrial LiDAR and hemispherical photographs

[...]

Magnus Bremer¹, Magnus Bremer², V. Wichmann, Martin Rutzinger¹, Martin Rutzinger² - Show less +1 more•Institutions (2)

Austrian Academy of Sciences¹, University of Innsbruck²

01 Mar 2018-Remote Sensing of Environment

TL;DR: Forest models of two Larix decidua forest plots are reconstructed by making use of terrestrial LiDAR data and digital hemispherical photographs (DHP) and it is shown, that typical sources of error in the tree reconstruction process are minimized by the proposed approach.

...read moreread less

Journal Article•DOI•

Decision tree rule learning approach to counter burst header packet flooding attack in Optical Burst Switching network

[...]

Adel Rajab¹, Adel Rajab², Chin-Tser Huang², Mohammed Al-Shargabi¹•Institutions (2)

Najran University¹, University of South Carolina²

01 Jul 2018-Optical Switching and Networking

TL;DR: This study investigates the use of predictive ML to counter the risk of BHP flooding attacks experienced in OBS networks, proposing a decision tree-based architecture as an appropriate solution.

...read moreread less

Patent•

State prediction method and device

[...]

Hu Ruihua

23 Mar 2018

TL;DR: In this article, a decision-making tree model is used for predicting user loss state after the sampling moment, and the verification sample is input into the trained decision making tree model to obtain a predicted loss state.

...read moreread less

Abstract: The invention discloses a state prediction method and device. The method comprises the following steps of: sampling target users; respectively generating a negative sample, a positive sample and a verification sample according to account information of lost and unlost users according to a recognized sampling moment; training a decision-making tree which is used for predicting user loss state afterthe sampling moment; inputting the verification sample into the trained decision-making tree model to obtain a predicted loss state; and if a correct recall rate obtained through carrying out calculation according to the predicted loss state and a practical loss state is not smaller than a threshold value, determining that the training of the decision-making tree model is completed, and carryingout user loss state prediction. According to the state prediction method and device, the decision-making tree model is trained by a training sample generated through sampling the target users, and user loss state prediction is carried out according to the trained decision-making tree model, so that the technical problems of relatively recognition efficiency low and reusing difficulty caused by user loss state prediction carried out through artificial experiences or established rules in the prior art are solved.

...read moreread less

Proceedings Article•DOI•

Online Dissolved Gas Analysis of Power Transformers Based on Decision Tree Model

[...]

Arief Basuki¹, Suwarno¹•Institutions (1)

Bandung Institute of Technology¹

01 Oct 2018

TL;DR: The decision tree rule was implemented in online condition monitoring and diagnostic of power transformer which is integrated into SCADA system and can predict transformers fault from gas values by online better than conventional DGA methods.

...read moreread less

Abstract: This paper presents the possibility of using one of machine learning model, decision tree with C4.5 algorithm for gas interpretation in online condition monitoring and diagnostic application of power transformers. Decision tree selection is based on the best learning outcomes of machine learning software (WEKA and Orange) compared to naive Bayes, neural network, nearest neighbour and support vector machine models. The decision tree was built from 715 data, 7 attributes of gas and 9 types of fault which were cleaned by interquartile range method become 471 data. Evaluation result based on correction prediction are 95.54% using data training, 88.32% using cross validation and 87.23% using 10% random data from data training. The decision tree rule was implemented in online condition monitoring and diagnostic of power transformer which is integrated into SCADA system. This implementation result can predict transformers fault from gas values by online better than conventional DGA methods.

...read moreread less

Journal Article•DOI•

Heterogeneous fuzzy XML data integration based on structural and semantic similarities

[...]

Zongmin Ma¹, Zhen Zhao², Li Yan¹•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, Bohai University²

15 Nov 2018-Fuzzy Sets and Systems

TL;DR: A new fuzzy XML tree model is proposed, and an effective algorithm based on the tree edit distance is presented to identify the structural and semantic similarities between the fuzzy documents represented in the proposed fuzzyxml tree model.

...read moreread less

Journal Article•DOI•

An intuitionistic fuzzy diagnosis analytics for stroke disease

[...]

Taufik Djatna¹, Medria Kusuma Dewi Hardhienata¹, Anis Fitri Nur Masruriyah¹•Institutions (1)

Bogor Agricultural University¹

01 Dec 2018-Journal of Big Data

TL;DR: The proposed Intuitionistic Fuzzy Based Decision Tree is able to provide plenty of information to stakeholders regarding the hidden facts of established rules and utilize linguistic terms to accommodate unclearness, ambiguity, and hesitation in human perception.

...read moreread less

Abstract: One of the challenges in diagnosing stroke disease is the lack of useful analysis tool to identify critical stroke data that contains hidden relationships and trends from a vast amount of data. In order to address this problem, we proposed Intuitionistic Fuzzy Based Decision Tree in order to diagnosis the different types of stroke disease. The approach is implemented by mapping observation data into Intuitionistic Fuzzy Set. These results lead to a compound of a membership function, non-membership function, and a hesitation degree for each record. The result of Intuitionistic Fuzzy is calculated using Hamming Distance as main requirement for Intuitionistic Fuzzy Entropy. The Hamming Distance calculate the difference between values on the same variable. Main advantage of this approach is that we can find out variables effected on the stroke disease using information gain derived from Intutionistics Entropy. Furthermore, the Intuitionistic Fuzzy based Decision Tree are able to provide plenty of information to stakeholders regarding the hidden facts of established rules and utilize linguistic terms to accommodate unclearness, ambiguity, and hesitation in human perception. The results of Intuitionistic Fuzzy Entropy determine the root and node in the formation of the decision tree model based on the information gain of variables in the data. In this study, simulation results show that the approach successfully determine 20 variables that directly influence stroke. These variables are used to classify the types of stroke. Furthermore, results show that the approach has resulted in 90.59% in classifying stroke disease. Results of the study also demonstrates that the approach produces the best diagnosis performance compared to the other two models according to the accuracy of classification from the type of stroke disease.

...read moreread less

Book Chapter•DOI•

Latent Tree Models

[...]

Piotr Zwiernik

12 Nov 2018-arXiv: Statistics Theory

TL;DR: The role of tree metrics is emphasised in the structural description of this model class, in designing learning algorithms, and in understanding fundamental limits of what and when can be learned.

...read moreread less

Abstract: Latent tree models are graphical models defined on trees, in which only a subset of variables is observed. They were first discussed by Judea Pearl as tree-decomposable distributions to generalise star-decomposable distributions such as the latent class model. Latent tree models, or their submodels, are widely used in: phylogenetic analysis, network tomography, computer vision, causal modeling, and data clustering. They also contain other well-known classes of models like hidden Markov models, Brownian motion tree model, the Ising model on a tree, and many popular models used in phylogenetics. This article offers a concise introduction to the theory of latent tree models. We emphasise the role of tree metrics in the structural description of this model class, in designing learning algorithms, and in understanding fundamental limits of what and when can be learned.

...read moreread less

Proceedings Article•DOI•

Machine learning approach to solving the transient stability assessment problem

[...]

Zachary Pannell¹, Bhuvaneswari Ramachandran¹, Dallas Snider¹•Institutions (1)

University of West Florida¹

01 Feb 2018

TL;DR: In this paper, transient stability assessment is performed on a power system using a classification approach and data mining algorithms using offline training data collected by conducting load flow studies under normal operating conditions and faulty operating conditions at buses, at three different locations at lines and at different load levels.

...read moreread less

Abstract: In this paper, transient stability assessment is performed on a power system using a classification approach and data mining algorithms. As a first step, offline training data was collected by conducting load flow studies under normal operating conditions and faulty operating conditions at buses, at three different locations at lines and at different load levels. Twenty-three features were chosen to represent the training data for each load flow simulation. A support vector machine model was built and trained using the training data as well as a Naive Bayes model and Decision Tree model. Then an online testing model was developed and real-time data was used to test the validity of the model developed. The results indicate a higher accuracy and less time consumed by the core vector machine model compared to previous models available in literature. The IEEE 14 bus system was used for training data and for verifying the speed and accuracy of the proposed data mining algorithm.

...read moreread less

Journal Article•DOI•

Enhanced delay propagation tree model with Bayesian Network for modelling flight delay propagation

[...]

Weiwei Wu¹, Cheng-Lung Wu²•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, University of New South Wales²

13 Feb 2018-Transportation Planning and Technology

TL;DR: The major contribution of the DPT-BN model is to demonstrate how the modelling of non-independent and identically distributed delay profiles is more realistic for the observed delay propagation mechanism, and how robust airline scheduling methodologies can benefit from this probability-based delay model.

...read moreread less

Abstract: An enhanced Delay Propagation Tree model with Bayesian Network (DPT-BN) is developed to model multi-flight delay propagation and delay interdependencies. Using a set of real airline data, results s...

...read moreread less

Journal Article•DOI•

Bayesian Nonparametric Learning for Hierarchical and Sparse Topics

[...]

Jen-Tzung Chien¹•Institutions (1)

National Chiao Tung University¹

01 Feb 2018-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The proposed nIBP reduces the error rate of nCRP and nhDP by 18% and 8% on Reuters task for document classification, respectively and improves the variety of topic representation for heterogeneous documents.

...read moreread less

Abstract: This paper presents the Bayesian nonparametric (BNP) learning for hierarchical and sparse topics from natural language. Traditionally, the Indian buffet process provides the BNP prior on a binary matrix for an infinite latent feature model consisting of a flat layer of topics. The nested model paves an avenue to construct a tree model instead of a flat-layer model. This paper presents the nested Indian buffet process (nIBP) to achieve the sparsity and flexibility in topic model where the model complexity and topic hierarchy are learned from the groups of words. The mixed membership modeling is conducted by representing a document using the tree nodes or dishes that a document or a customer chooses according to the nIBP scenario. A tree stick-breaking process is implemented to select topic weights from a subtree for flexible topic modeling. Such an nIBP relaxes the constraint of adopting a single tree path in the nested Chinese restaurant process (nCRP) and, therefore, improves the variety of topic representation for heterogeneous documents. A Gibbs sampling procedure is developed to infer the nIBP topic model. Compared to the nested hierarchical Dirichlet process (nhDP), the compactness of the estimated topics in a tree using nIBP is improved. Experimental results show that the proposed nIBP reduces the error rate of nCRP and nhDP by 18% and 8% on Reuters task for document classification, respectively.

...read moreread less

Journal Article•DOI•

Assessment of Building Damage Risk by Natural Disasters in South Korea Using Decision Tree Analysis

[...]

KeumJi Kim, Seong-Hwan Yoon

04 Apr 2018-Sustainability

TL;DR: Choi et al. as mentioned in this paper used decision tree analysis to evaluate the risk of building damage in natural disasters and found that the number of regions at risk of rain damage increased by more than 30% on average.

...read moreread less

Abstract: The purpose of this study is to identify the relationship between weather variables and buildings damaged in natural disasters. We used four datasets on building damage history and 33 weather datasets from 230 regions in South Korea in a decision tree analysis to evaluate the risk of building damage. We generated the decision tree model to determine the risk of rain, gale, and typhoon (excluding gale with less damage). Using the weight and limit values of the weather variables derived using the decision tree model, the risk of building damage was assessed for 230 regions in South Korea until 2100. The number of regions at risk of rain damage increased by more than 30% on average. Conversely, regions at risk of damage from snowfall decreased by more than 90%. The regions at risk of typhoons decreased by 57.5% on average, while those at high risk of the same increased by up to 62.5% under RCP 8.5. The results of this study are highly fluid since they are based on the uncertainty of future climate change. However, the study is meaningful because it suggests a new method for assessing disaster risk using weather indices.

...read moreread less

Proceedings Article•DOI•

Intrusion Detection using Decision Tree Model in High-Speed Environment

[...]

M. Mazhar Rathore¹, Faisal Saeed¹, Abdul Rehman¹, Anand Paul¹, Alfred Daniel² - Show less +1 more•Institutions (2)

Kyungpook National University¹, SNS College of Technology²

01 Feb 2018

TL;DR: This work proposes a real-time intrusion detection system for the high-speed environment using decision tree-based classification model, i.e., C4.5, with a fewer number of flow features, to address the challenges faced by existing machine learning- based Intrusion Detection Systems.

...read moreread less

Abstract: Due to the rise in the usage and speed of internet, the rate of data generated over the internet is enormously increasing. This growth also upturns the security threats on the enterprise network and the Internet. Detecting such intrusion in a high-speed network at realtime is a challenging task. Existing machine learning- based Intrusion Detection Systems (IDSs) are not able to perceive recent unknown attacks while working at high-speed networks. Therefore, to address these challenges, we propose a real-time intrusion detection system for the high-speed environment using decision tree-based classification model, i.e., C4.5, with a fewer number of flow features. The nine best features are selected amongst forty-one from KDD99 intrusion dataset using FSR and BER techniques. The accuracy of the proposed IDS is evaluated in terms of true positive (TP- more than 99%) and false positive (FP- less than 0.001 %), and efficiency in terms of processing time. The higher accuracy and efficiency make the system to be able to work in a real-time and high-speed environment.

...read moreread less

Journal Article•DOI•

Two generalized models for planar compliant mechanisms based on tree structure method

[...]

Muqing Niu¹, Bintang Yang¹, Yikun Yang¹, Guang Meng¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Jan 2018-Precision Engineering-journal of The International Societies for Precision Engineering and Nanotechnology

TL;DR: In this paper, a tree structure method is presented to characterize the structure of a compliant mechanism, based on which, two generalized models are proposed for planar compliant mechanisms, and experiments with a compliant amplifier have been performed to verify the effectiveness of the proposed models.

...read moreread less

Abstract: Compliant mechanisms depend on elastic deformations to provide smooth and precise motions. It is essential to develop an accurate model to quantify the deformation for kinetic analysis and parameter optimization. This paper presents a tree structure method to characterize the structure of the mechanism, based on which, two generalized models are proposed for planar compliant mechanisms. Linear Tree Model aims at constructing the flexibility matrix of the whole mechanism. Not only the deformations of the mechanism can be calculated, but also the unknown forces/moments can be obtained through inverse operation. Beam Constraint Tree Model focuses on the precise load condition and deformed shape of each beam, and a nonlinear model, which considers load-stiffening, kinematic and elastokinematic effects, is adopted for the beam governing equation. Beam Constraint Tree Model is computed on the basis of Linear Tree Model, and can achieve higher accuracy. Simulations have been done to show the accuracy of the two models on different load conditions, and experiments with a compliant amplifier have been performed to verify the effectiveness of the proposed models. Furthermore, some examples are presented to show the applications of the proposed model. Both models are parametric, generalized and easy for computation, so they are practical for compliant mechanism design.

...read moreread less

Posted Content•

Program Language Translation Using a Grammar-Driven Tree-to-Tree Model.

[...]

Mehdi Drissi, Olivia Watkins, Aditya Khant, Vivaswat Ojha, Pedro Sandoval Segura, Rakia Segev, Eric Weiner, Robert Keller - Show less +4 more

04 Jul 2018-arXiv: Learning

TL;DR: This paper describes a tree decoder that leverages knowledge of a language's grammar rules to exclusively generate syntactically correct programs and finds that it outperforms the state of the art tree-to-tree model in translating between two programming languages on a previously used synthetic task.

...read moreread less

Abstract: The task of translating between programming languages differs from the challenge of translating natural languages in that programming languages are designed with a far more rigid set of structural and grammatical rules. Previous work has used a tree-to-tree encoder/decoder model to take advantage of the inherent tree structure of programs during translation. Neural decoders, however, by default do not exploit known grammar rules of the target language. In this paper, we describe a tree decoder that leverages knowledge of a language's grammar rules to exclusively generate syntactically correct programs. We find that this grammar-based tree-to-tree model outperforms the state of the art tree-to-tree model in translating between two programming languages on a previously used synthetic task.

...read moreread less

Journal Article•DOI•

Evaluation of Legislation Adequacy in Managing Time and Quality Performance in Iraqi Construction Projects- a Bayesian Decision Tree Approach

[...]

Hafeth I. Naji¹, Amer M. Ibrahim¹, Z. K. Hassan¹•Institutions (1)

University of Diyala¹

03 Jun 2018-Civil Engineering Journal

TL;DR: The results of Bayesian decision tree model reveal that the high percentage of construction projects were implemented with very high delay and high level of quality, and the enhancement in the quality performance is greater than the time performance under the legislative change.

...read moreread less

Abstract: Delay and quality defects are significant problems in Iraqi construction projects. During the period from 2003-2014, legislation has been changed to enhance the performance of construction project. This change is done by modifying some clauses of legislation and adding or deleting the others. The aim of this study is to evaluate the adequacy of these changes by using questionnaire and Bayesian decision tree model. 30 projects were taken for the period from 2003-2014. Performance of construction project was assessed on one hand by conducting a questionnaire which depend on the impact of legislation clauses on the time and quality performance, while on the other hand Bayesian decision tree model was developed in which qualitative estimate of time and quality performance by using KNIME program. The results of questionnaire estimate the delay from very low to very high and quality from very low to high in Iraqi construction industry. The results of Bayesian decision tree model reveal that the high percentage of construction projects were implemented with very high delay and high level of quality. The model gives good accuracy in prediction time and quality performance about 86.7%. These results show the enhancement in the quality performance is greater than the time performance under the legislative change. The model can assist the Iraqi legislator in evaluation the impact of legislation on time and quality performance of construction project.

...read moreread less

Proceedings Article•DOI•

Model Decomposition for Forward Model Approximation

[...]

Alexander Dockhorn¹, Tim Tippelt¹, Rudolf Kruse¹•Institutions (1)

Otto-von-Guericke University Magdeburg¹

01 Nov 2018

TL;DR: A model decomposition architecture, which advances on previous attempts of learning an approximated forward model for unknown games, and which adapts very well to previously unseen levels or situations.

...read moreread less

Abstract: In this paper we propose a model decomposition architecture, which advances on our previous attempts of learning an approximated forward model for unknown games [1]. The developed model architecture is based on design constraints of the General Video Game Artificial Intelligence Competition and the Video Game Definition Language. Our agent first builds up a database of interactions with the game environment for each distinct component of a game. We further train a decision tree model for each of those independent components. For predicting a future state we query each model individually and aggregate the result. The developed model ensemble does not just predict known states with a high accuracy, but also adapts very well to previously unseen levels or situations. Future work will show how well the increased accuracy helps in playing an unknown game using simulation-based search algorithms such as Monte Carlo Tree Search.

...read moreread less

Journal Article•DOI•

Teaching Quality Evaluation and Scheme Prediction Model Based on Improved Decision Tree Algorithm

[...]

Sujuan Jia¹, Yajing Pang¹•Institutions (1)

Hebei University of Science and Technology¹

26 Oct 2018-International Journal of Emerging Technologies in Learning (ijet)

TL;DR: A decision tree model by taking the teaching quality data and the statistical analysis results of the learn-er’s personalized behaviour as inputs was proposed, based on the improved C4.5 decision tree algorithm, which used the FAYYAD boundary point decision theorem.

...read moreread less

Abstract: Vast data in the higher education system are used to analyse and evaluate the teaching quality, so that the key factors that affect the quality of teaching can be predicted. Besides, the learner’s personalized behaviour can also become the data source for teaching result prediction. This paper proposes a decision tree model by taking the teaching quality data and the statistical analysis results of the learn-er’s personalized behaviour as inputs. This model was based on the improved C4.5 decision tree algorithm, which used the FAYYAD boundary point decision theorem for effectively reducing the computation time to the most threshold. In this algorithm, the iterative analysis mechanism was introduced in combination with the data change of the learner’s personalized behaviour, so as to dynamically adjust the final teaching evaluation result. Finally, according to the actual statisti-cal data of one academic year, the teaching quality evaluation was effectively completed and the direction of future teaching prediction was proposed.

...read moreread less

Journal Article•DOI•

Traffic State Estimation of Signalized Intersections Based on Stacked Denoising Auto-Encoder Model

[...]

Junping Xiang¹, Junping Xiang², Zonghai Chen¹•Institutions (2)

University of Science and Technology of China¹, Spanish National Research Council²

06 Feb 2018-Wireless Personal Communications

TL;DR: To solve the problem of fuzziness and uncertainty of traffic states in signalized intersection, a method was proposed for estimating traffic condition based on stacked de-noising auto-encoder model which obtains an accuracy of 91.5% in simulation data and 88% in empirical data, which is better by 7.1% than using decision tree model.

...read moreread less

Abstract: To solve the problem of fuzziness and uncertainty of traffic states in signalized intersection, a method was proposed for estimating traffic condition based on stacked de-noising auto-encoder model The simulation data and empirical data were used to train the model, and the K-means clustering method was used to determine the traffic state thresholds and the data were divided into three categories based on the threshold values Relevant features based on the reconstruction theory of de-noising auto-encoder were automatically extracted, and unsupervised greedy layer-wise pre-training and supervised fine-tuning were utilized to train the deep auto-encoder network, so that it had good robust performance on obtaining the traffic state characters with low quality in the complex environment From the experimental results, the proposed method obtains an accuracy of 915% in simulation data and 88% in empirical data, which is better by 71% than using decision tree model

...read moreread less