Showing papers on "Decision tree model published in 2014"

PDF

Open Access

Journal Article•DOI•

A MapReduce Implementation of C4.5 Decision Tree Algorithm

[...]

28 Feb 2014-International journal of database theory and application

TL;DR: This work proposes to implement a typical decision tree algorithm, C4.5, using MapReduce programming model, and transforms the traditional algorithm into a series of Map and Reduce procedures, showing both time efficiency and scalability.

...read moreread less

Abstract: Recent years have witness the development of cloud computing and the big data era, which brings up challenges to traditional decision tree algorithms. First, as the size of dataset becomes extremely big, the process of building a decision tree can be quite time consuming. Second, because the data cannot fit in memory any more, some computation must be moved to the external storage and therefore increases the I/O cost. To this end, we propose to implement a typical decision tree algorithm, C4.5, using MapReduce programming model. Specifically, we transform the traditional algorithm into a series of Map and Reduce procedures. Besides, we design some data structures to minimize the communication cost. We also conduct extensive experiments on a massive dataset. The results indicate that our algorithm exhibits both time efficiency and scalability.

...read moreread less

145 citations

Journal Article•DOI•

Gene selection for cancer identification: a decision tree model empowered by particle swarm optimization algorithm

[...]

Kun-Huang Chen¹, Kung-Jeng Wang¹, Min Lung Tsai², Kung Min Wang³, Angelia Melani Adrian¹, Wei Chung Cheng⁴, Tzu Sen Yang⁵, Nai Chia Teng⁵, Kuo Pin Tan¹, Ku Shang Chang² - Show less +6 more•Institutions (5)

National Taiwan University of Science and Technology¹, Yuanpei University², Memorial Hospital of South Bend³, National Yang-Ming University⁴, Taipei Medical University⁵

20 Feb 2014-BMC Bioinformatics

TL;DR: This study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier that outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets.

...read moreread less

Abstract: In the application of microarray data, how to select a small number of informative genes from thousands of genes that may contribute to the occurrence of cancers is an important issue. Many researchers use various computational intelligence methods to analyzed gene expression data. To achieve efficient gene selection from thousands of candidate genes that can contribute in identifying cancers, this study aims at developing a novel method utilizing particle swarm optimization combined with a decision tree as the classifier. This study also compares the performance of our proposed method with other well-known benchmark classification methods (support vector machine, self-organizing map, back propagation neural network, C4.5 decision tree, Naive Bayes, CART decision tree, and artificial immune recognition system) and conducts experiments on 11 gene expression cancer datasets. Based on statistical analysis, our proposed method outperforms other popular classifiers for all test datasets, and is compatible to SVM for certain specific datasets. Further, the housekeeping genes with various expression patterns and tissue-specific genes are identified. These genes provide a high discrimination power on cancer classification.

...read moreread less

123 citations

Journal Article•DOI•

ForesTexter: An efficient random forest algorithm for imbalanced text categorization

[...]

Qingyao Wu¹, Yunming Ye¹, Haijun Zhang¹, Michael K. Ng², Shen-Shyang Ho³ - Show less +1 more•Institutions (3)

Harbin Institute of Technology¹, Hong Kong Baptist University², Nanyang Technological University³

01 Sep 2014-Knowledge Based Systems

TL;DR: A new random forest (RF) based ensemble method, ForesTexter, that selects splits, both feature subspace selection and splitting criterion, for RF on imbalanced text data and is competitive against the standard random forest and different variants of SVM algorithms.

...read moreread less

Abstract: In this paper, we propose a new random forest (RF) based ensemble method, ForesTexter, to solve the imbalanced text categorization problems. RF has shown great success in many real-world applications. However, the problem of learning from text data with class imbalance is a relatively new challenge that needs to be addressed. A RF algorithm tends to use a simple random sampling of features in building their decision trees. As a result, it selects many subspaces that contain few, if any, informative features for the minority class. Furthermore, the Gini measure for data splitting is considered to be skew sensitive and bias towards the majority class. Due to the inherent complex characteristics of imbalanced text datasets, learning RF from such data requires new approaches to overcome challenges related to feature subspace selection and cut-point choice while performing node splitting. To this end, we propose a new tree induction method that selects splits, both feature subspace selection and splitting criterion, for RF on imbalanced text data. The key idea is to stratify features into two groups and to generate effective term weighting for the features. One group contains positive features for the minority class and the other one contains the negative features for the majority class. Then, for feature subspace selection, we effectively select features from each group based on the term weights. The advantage of our approach is that each subspace contains adequate informative features for both minority and majority classes. One difference between our proposed tree induction method and the classical RF method is that our method uses Support Vector Machines (SVM) classifier to split the training data into smaller and more balance subsets at each tree node, and then successively retrains the SVM classifiers on the data partitions to refine the model while moving down the tree. In this way, we force the classifiers to learn from refined feature subspaces and data subsets to fit the imbalanced data better. Hence, the tree model becomes more robust for text categorization task with imbalanced dataset. Experimental results on various benchmark imbalanced text datasets (Reuters-21578, Ohsumed, and imbalanced 20 newsgroup) consistently demonstrate the effectiveness of our proposed ForesTexter method. The performance of our proposed approach is competitive against the standard random forest and different variants of SVM algorithms.

...read moreread less

113 citations

Journal Article•DOI•

Modeling and Testing Landslide Hazard Using Decision Tree

[...]

Mutasem Sh. Alkhasawneh, Umi Kalthum Ngah, Lea Tien Tay, Nor Ashidi Mat Isa, Mohammad Subhi Al-Batah - Show less +1 more

04 Feb 2014-Journal of Applied Mathematics

TL;DR: A decision tree model is proposed for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia which identified slope angle, distance from drainage, surface area, slope aspect, and cross curvature as most important factors.

...read moreread less

Abstract: This paper proposes a decision tree model for specifying the importance of 21 factors causing the landslides in a wide area of Penang Island, Malaysia. These factors are vegetation cover, distance from the fault line, slope angle, cross curvature, slope aspect, distance from road, geology, diagonal length, longitude curvature, rugosity, plan curvature, elevation, rain perception, soil texture, surface area, distance from drainage, roughness, land cover, general curvature, tangent curvature, and profile curvature. Decision tree models are used for prediction, classification, and factors importance and are usually represented by an easy to interpret tree like structure. Four models were created using Chi-square Automatic Interaction Detector (CHAID), Exhaustive CHAID, Classification and Regression Tree (CRT), and Quick-Unbiased-Efficient Statistical Tree (QUEST). Twenty-one factors were extracted using digital elevation models (DEMs) and then used as input variables for the models. A data set of 137570 samples was selected for each variable in the analysis, where 68786 samples represent landslides and 68786 samples represent no landslides. 10-fold cross-validation was employed for testing the models. The highest accuracy was achieved using Exhaustive CHAID (82.0%) compared to CHAID (81.9%), CRT (75.6%), and QUEST (74.0%) model. Across the four models, five factors were identified as most important factors which are slope angle, distance from drainage, surface area, slope aspect, and cross curvature.

...read moreread less

73 citations

Proceedings Article•DOI•

Exponential Separation of Information and Communication

[...]

Anat Ganor¹, Gillat Kol², Ran Raz¹•Institutions (2)

Weizmann Institute of Science¹, Institute for Advanced Study²

18 Oct 2014

TL;DR: In this article, the authors show an exponential gap between communication complexity and information complexity, by giving an explicit example for a communication task (relation), with information complexity ≤ O(k) and distributional communication complexity ≥ 2k.

...read moreread less

Abstract: We show an exponential gap between communication complexity and information complexity, by giving an explicit example for a communication task (relation), with information complexity ≤ O(k), and distributional communication complexity ≥ 2k. This shows that a communication protocol cannot always be compressed to its internal information. By a result of Braverman [1], our gap is the largest possible. By a result of Braverman and Rao [2], our example shows a gap between communication complexity and amortized communication complexity, implying that a tight direct sum result for distributional communication complexity cannot hold.

...read moreread less

59 citations

Book Chapter•DOI•

A Comparative Assessment Between the Application of Fuzzy Unordered Rules Induction Algorithm and J48 Decision Tree Models in Spatial Prediction of Shallow Landslides at Lang Son City, Vietnam

[...]

Dieu Tien Bui¹, Biswajeet Pradhan², Inge Revhaug¹, Chuyen Trung Tran³•Institutions (3)

Norwegian University of Life Sciences¹, Universiti Putra Malaysia², Hanoi University of Mining and Geology³

01 Jan 2014

TL;DR: The main objective of this study is to investigate potential application of the Fuzzy Unordered Rules Induction Algorithm and the Bagging in comparison with Decision Tree model for spatial prediction of shallow landslides in Lang Son city area (Vietnam).

...read moreread less

Abstract: The main objective of this study is to investigate potential application of the Fuzzy Unordered Rules Induction Algorithm (FURIA) and the Bagging (an ensemble technique) in comparison with Decision Tree model for spatial prediction of shallow landslides in the Lang Son city area (Vietnam). First, a landslide inventory map was constructed from various sources. Then, the landslide inventory was randomly partitioned into 70 % for training the models and 30 % for the model validation. Second, six landslide conditioning factors (slope, aspect, lithology, land use, soil type, and distance to faults) were prepared. Using these factors and the training dataset, landslide susceptibility indexes were calculated using the FURIA, the FURIA with Bagging, the Decision Tree, and the Decision Tree with Bagging. Finally, prediction performances of these susceptibility maps were carried out using the Receiver Operating Characteristic (ROC) technique. The results show that area under the ROC curve (AUC) using training dataset has the largest for the Decision Tree with Bagging (0.925) and the FURIA with Bagging (0.913), followed by the Decision Tree (0.908) and the FURIA (0.878). The prediction capability of these models was estimated using the validation dataset. The highest prediction was achieved using the FURIA with Bagging (AUC = 0.802), followed by the Decision Tree (AUC = 0.783), the Decision Tree with Bagging (AUC = 0.777), and the FURIA (AUC = 0.773). We conclude that the FURIA with Bagging is the best model in this study.

...read moreread less

55 citations

Journal Article•DOI•

A Novel Low Computational Complexity Power Assignment Method for Non-orthogonal Multiple Access Systems

[...]

Anxin Li¹, Atsushi Harada¹, Hidetoshi Kayama¹•Institutions (1)

NTT DoCoMo¹

01 Jan 2014-IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences

53 citations

Journal Article•DOI•

A computational intelligence approach for a better diagnosis of diabetic patients

[...]

K. V. S. R. P. Varma¹, Allam Appa Rao, T. Sita Maha Lakshmi¹, P. V. Nageswara Rao¹•Institutions (1)

Gandhi Institute of Technology and Management¹

01 Jul 2014-Computers & Electrical Engineering

TL;DR: The modified Gini index-Gaussian fuzzy decision tree algorithm is proposed and is tested with Pima Indian Diabetes (PID) clinical data set for accuracy and this algorithm outperforms other decision tree algorithms.

...read moreread less

52 citations

Journal Article•DOI•

Spatial prediction of landslide susceptibility using a decision tree approach: a case study of the Pyeongchang area, Korea

[...]

Inhye Park¹, Saro Lee•Institutions (1)

Seoul National University¹

01 Aug 2014-Journal of remote sensing

TL;DR: Choi et al. as discussed by the authors applied a decision tree approach and validated it for analysis of landslide susceptibility using a geographic information system (GIS) using remote sensing and GIS.

...read moreread less

Abstract: A decision tree approach was applied and validated for analysis of landslide susceptibility using a geographic information system (GIS). The study area was the Pyeongchang area in Gangwon Province, Korea, where many landslides occurred in 2006 and where the 2018 Winter Olympics are to be held. Spatial data, such as landslides, topography, and geology, were detected, collected, and compiled in a database using remote sensing and GIS. The 3994 recorded landslide locations were randomly split 50/50 for training and validation of the models. A decision tree model, which is a type of data-mining classification model, was applied and decision trees were constructed using the chi-squared (χ2) automatic interaction detector (CHAID) and the quick, unbiased, and efficient statistical tree (QUEST) algorithms. Also, as a reference, a frequency-ratio model was applied using the same database. The relationships between the detected landslide locations and their factors were identified and quantified by frequency-ratio ...

...read moreread less

51 citations

Journal Article•DOI•

A simple tree swaying model for forest motion in windstorm conditions

[...]

David Pivato¹, Sylvain Dupont¹, Yves Brunet¹•Institutions (1)

Institut national de la recherche agronomique¹

01 Feb 2014-Trees-structure and Function

TL;DR: In this paper, a simple tree swaying model was developed for the purpose of simulating the effect of strong wind on the vulnerability of heterogeneous forest canopies, where the tree was represented as a flexible cantilever beam whose motion, induced by turbulent winds, was solved through a modal analysis.

...read moreread less

Abstract: A simple tree swaying model, valid for windstorm conditions, has been developed for the purpose of simulating the effect of strong wind on the vulnerability of heterogeneous forest canopies. In this model the tree is represented as a flexible cantilever beam whose motion, induced by turbulent winds, is solved through a modal analysis. The geometric nonlinearities related to the tree curvature are accounted for through the formulation of the wind drag force. Furthermore, a breakage condition is considered at very large deflections. A variety of case studies is used to evaluate the present model. As compared to field data collected on three different tree species, and to the outputs of mechanistic models of wind damage, it appears to be able to predict accurately large tree deflections as well as tree breakage, using wind velocity at tree top as a forcing function. The instantaneous response of the modelled tree to a turbulent wind load shows very good agreement with a more complex tree model. The simplicity of the present model and its low computational time make it well adapted to future use in large-eddy simulation airflow models, aimed at simulating the complete interaction between turbulent wind fields and tree motion in fragmented forests.

...read moreread less

34 citations

Journal Article•DOI•

CBC: An associative classifier with a small number of rules

[...]

Houtao Deng¹, George C. Runger², Eugene Tuv³, Wade Bannister•Institutions (3)

Intuit¹, Arizona State University², Intel³

01 Mar 2014

TL;DR: It is shown that associative classifiers consisting of an ordered rule set can be represented as a tree model, i.e., condition-based tree (CBT), which has competitive accuracy performance, and has a significantly smaller number of rules than well-known associated classifiers such as CBA and GARC.

...read moreread less

Abstract: Associative classifiers have been proposed to achieve an accurate model with each individual rule being interpretable. However, existing associative classifiers often consist of a large number of rules and, thus, can be difficult to interpret. We show that associative classifiers consisting of an ordered rule set can be represented as a tree model. From this view, it is clear that these classifiers are restricted in that at least one child node of a non-leaf node is never split. We propose a new tree model, i.e., condition-based tree (CBT), to relax the restriction. Furthermore, we also propose an algorithm to transform a CBT to an ordered rule set with concise rule conditions. This ordered rule set is referred to as a condition-based classifier (CBC). Thus, the interpretability of an associative classifier is maintained, but more expressive models are possible. The rule transformation algorithm can be also applied to regular binary decision trees to extract an ordered set of rules with simple rule conditions. Feature selection is applied to a binary representation of conditions to simplify/improve the models further. Experimental studies show that CBC has competitive accuracy performance, and has a significantly smaller number of rules (median of 10 rules per data set) than well-known associative classifiers such as CBA (median of 47) and GARC (median of 21). CBC with feature selection has even a smaller number of rules.

...read moreread less

Journal Article•DOI•

PcHD: Personalized classification of heartbeat types using a decision tree

[...]

Juyoung Park¹, Kyungtae Kang¹•Institutions (1)

Hanyang University¹

01 Nov 2014-Computers in Biology and Medicine

TL;DR: A novel method for automatic classification of an individual ECG beats for Holter monitoring is proposed, using the Pan-Tompkins algorithm to accurately extract features such as the QRS complex and P wave, and employing a decision tree to classify each beat in terms of these features.

...read moreread less

Journal Article•DOI•

Using Phylogenetic Networks to Model Chinese Dialect History

[...]

Johann-Mattis List¹, Nelson-Sathi Shijulal², William Martin², Hans Geisler²•Institutions (2)

University of Marburg¹, University of Düsseldorf²

01 Jan 2014

TL;DR: A network approach is used to study the lexical history of 40 Chinese dialects and the majority of characters in the data (about 54%) cannot be readily explained with the help of a given tree model.

...read moreread less

Abstract: The idea that language history is best visualized by a branching tree has been controversially discussed in the linguistic world and many alternative theories have been proposed. The reluctance of many scholars to accept the tree as the natural metaphor for language history was due to conflicting signals in linguistic data: many resemblances would simply not point to a unique tree. Despite these observations, the majority of automatic approaches applied to language data has been based on the tree model, while network approaches have rarely been applied. Due to the specific sociolinguistic situation in China, where very divergent varieties have been developing under the roof of a common culture and writing system, the history of the Chinese dialects is complex and intertwined. They are therefore a good test case for methods which no longer take the family tree as their primary model. Here we use a network approach to study the lexical history of 40 Chinese dialects. In contrast to previous approaches, our method is character-based and captures both vertical and horizontal aspects of language history. According to our results, the majority of characters in our data (about 54%) cannot be readily explained with the help of a given tree model. The borrowing events inferred by our method do not only reflect general uncertainties of Chinese dialect classification, they also reveal the strong influence of the standard language on Chinese dialect history.

...read moreread less

Journal Article•DOI•

Linking individual-tree and whole-stand models for forest growth and yield prediction

[...]

Quang V. Cao¹•Institutions (1)

Louisiana State University Agricultural Center¹

14 Oct 2014-Forest Ecosystems

TL;DR: In this article, the disaggregation approach provided a link between individual-tree models and whole-stand models, and should be considered as a better alternative to the unadjusted tree model.

...read moreread less

Proceedings Article•DOI•

Improving activity recognition via automatic decision tree pruning

[...]

Thomas Phan¹•Institutions (1)

Samsung¹

13 Sep 2014

TL;DR: This system mitigates the problem of misclassification by first identifying spurious classifications and then automatically pruning a decision tree model to remove labels that tend to produce wrong inferences, resulting in a 10% classification improvement based on the data set.

...read moreread less

Abstract: Activity recognition enables many user-facing smartphone applications, but it may suffer from misclassifications when trained models attempt to classify previously-unseen real-world behavior. Our system mitigates this problem by first identifying spurious classifications and then automatically pruning a decision tree model to remove labels that tend to produce wrong inferences, resulting in a 10% classification improvement based on our data set.

...read moreread less

Journal Article•DOI•

Near-ML MIMO detection algorithm with LR-aided fixed-complexity tree searching

[...]

Hyunsub Kim¹, Jangyoung Park¹, Hyukyeon Lee¹, Jaeseok Kim¹•Institutions (1)

Yonsei University¹

21 Oct 2014-IEEE Communications Letters

TL;DR: A low-complexity multipleinput multiple-output (MIMO) detection algorithm with lattice-reduction-aided fixed-complexITY tree searching which is motivated by the fixed- Complexity sphere decoder (FSD).

...read moreread less

Abstract: In this paper, we propose a low-complexity multipleinput multiple-output (MIMO) detection algorithm with lattice-reduction-aided fixed-complexity tree searching which is motivated by the fixed-complexity sphere decoder (FSD). As the proposed scheme generates a fixed tree whose size is much smaller than that of the full expansion in the FSD, the computational complexity is reduced considerably. Nevertheless, the proposed scheme achieves a near-maximum-likelihood (ML) performance with a large number of transmit antennas and a high-order modulation. The experimental results demonstrate that the performance degradation of the proposed scheme is less than 0.5 dB at the bit error rate (BER) of 10-5 for a 8 × 8 MIMO system with 256 QAM. Also, the proposed method reduces the complexity to about 1.23% of the corresponding FSD complexity.

...read moreread less

Proceedings Article•DOI•

Towards generating random forests via extremely randomized trees

[...]

Le Zhang¹, Ye Ren¹, Ponnuthurai Nagaratnam Suganthan¹•Institutions (1)

Nanyang Technological University¹

06 Jul 2014

TL;DR: The results on several public datasets show that random partition without exhaustive search at each node of a decision tree can yield better performance with less computational complexity.

...read moreread less

Abstract: The classification error of a specified classifier can be decomposed into bias and variance. Decision tree based classifier has very low bias and extremely high variance. Ensemble methods such as bagging can significantly reduce the variance of such unstable classifiers and thus return an ensemble classifier with promising generalized performance. In this paper, we compare different tree-induction strategies within a uniform ensemble framework. The results on several public datasets show that random partition (cut-point for univariate decision tree or both coefficients and cut-point for multivariate decision tree) without exhaustive search at each node of a decision tree can yield better performance with less computational complexity.

...read moreread less

Journal Article•DOI•

Vehicle classification with single multi-functional magnetic sensor and optimal MNS-based CART

[...]

Haijian Li¹, Honghui Dong¹, Limin Jia¹, Moyu Ren¹•Institutions (1)

Beijing Jiaotong University¹

01 Sep 2014-Measurement

TL;DR: A novel method with single multi-functional magnetic sensor and optimal Minimum Number of Split-sample (MNS)-based Classification and Regression Tree (CART) algorithm was proposed in this paper to classify on-road vehicles and achieved on-line vehicle classification in the sensor node.

...read moreread less

Patent•

Point cloud data based single tree three-dimensional modeling and morphological parameter extracting method

[...]

Huang Hongyu, Chen Chongcheng, Tang Liyu, Wang Xiaohui

23 Jul 2014

TL;DR: In this article, a point cloud data based single tree 3D modeling and morphological parameter extraction method is proposed, which can rapidly and semi-automatically extract tree important geometrical parameters and topological information to form into the high vivid single tree geometric model.

...read moreread less

Abstract: The invention relates to a point cloud data based single tree three-dimensional modeling and morphological parameter extracting method. The point cloud data based single tree three-dimensional modeling and morphological parameter extracting method comprises obtaining three-dimensional surface point cloud data of high density standing trees through a three-dimensional scanner or other live-action measuring modes, calculating the shortest distance from points to root nodes through a k-nearest neighbor graph, performing hierarchical clustering on the data according to distance, enabling centers of clustering hierarchies to be served as framework points of a limb system and meanwhile extracting corresponding semi-diameter of the framework points; connecting the framework points to establish a topological structure of branches and grading the branches; performing three-dimensional geometrical reconstruction on branches through a generalized cylinder body; adding leaf models to the limb system to form into a vivid three-dimensional single tree model; extracting height of trees, diameter of breast height and crown breadth of the standing trees in the point cloud. The point cloud data based single tree three-dimensional modeling and morphological parameter extracting method can rapidly and semi-automatically extract tree important geometrical parameters and topological information to form into the high vivid single tree geometric model and has wide application prospects and values in fields such as agriculture and forestry survey, ecological research and landscape planning.

...read moreread less

Journal Article•DOI•

Linguistic Decision Making for Robot Route Learning

[...]

Hongmei He¹, TM McGinnity², Sonya Coleman², Bryan Gardiner²•Institutions (2)

University of Kent¹, Ulster University²

01 Jan 2014-IEEE Transactions on Neural Networks

TL;DR: This paper develops a novel application of a linguistic decision tree for a robot route learning problem by dynamically deciding the robot's behavior, which is decomposed into atomic actions in the context of a specified task.

...read moreread less

Abstract: Machine learning enables the creation of a nonlinear mapping that describes robot-environment interaction, whereas computing linguistics make the interaction transparent. In this paper, we develop a novel application of a linguistic decision tree for a robot route learning problem by dynamically deciding the robot's behavior, which is decomposed into atomic actions in the context of a specified task. We examine the real-time performance of training and control of a linguistic decision tree, and explore the possibility of training a machine learning model in an adaptive system without dual CPUs for parallelization of training and control. A quantified evaluation approach is proposed, and a score is defined for the evaluation of a model's robustness regarding the quality of training data. Compared with the nonlinear system identification nonlinear auto-regressive moving average with eXogeneous inputs model structure with offline parameter estimation, the linguistic decision tree model with online linguistic ID3 learning achieves much better performance, robustness, and reliability.

...read moreread less

Journal Article•

Cover tree Bayesian reinforcement learning

[...]

Nikolaos Tziortziotis¹, Christos Dimitrakakis², Konstantinos Blekas¹•Institutions (2)

University of Ioannina¹, Chalmers University of Technology²

01 Jan 2014-Journal of Machine Learning Research

TL;DR: The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces, and it is demonstrated in an experimental comparison with a Gaussian process model, a linear model and simple least squares policy iteration.

...read moreread less

Abstract: This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with a Gaussian process model, a linear model and simple least squares policy iteration.

...read moreread less

Journal Article•DOI•

The Concept of Problem Complexity

[...]

Alejandro Salado¹, Roshanak Nilchiani¹•Institutions (1)

Stevens Institute of Technology¹

01 Jan 2014-Procedia Computer Science

TL;DR: The concept of Problem Complexity is presented as the complexity level that a set of requirements can impose to any system fulfilling them and mathematically demonstrated using the concept of joint entropy how problem complexity defines the minimum level of complexity a system can achieve for a givenSet of requirements.

...read moreread less

Proceedings Article•DOI•

Budding Trees

[...]

Ozan Irsoy¹, Olcay Taner Yildiz², Ethem Alpaydin³•Institutions (3)

Cornell University¹, Işık University², Boğaziçi University³

24 Aug 2014

TL;DR: In this article, the authors proposed a new decision tree model, named the budding tree, where a node can be both a leaf and an internal decision node, each bud node starts as a leaf node, can then grow children, but then later on, if necessary, its children can be pruned.

...read moreread less

Abstract: We propose a new decision tree model, named the budding tree, where a node can be both a leaf and an internal decision node. Each bud node starts as a leaf node, can then grow children, but then later on, if necessary, its children can be pruned. This contrasts with traditional tree construction algorithms that only grows the tree during the training phase, and prunes it in a separate pruning phase. We use a soft tree architecture and show that the tree and its parameters can be trained using gradient-descent. Our experimental results on regression, binary classification, and multi-class classification data sets indicate that our newly proposed model has better performance than traditional trees in terms of accuracy while inducing trees of comparable size.

...read moreread less

Book Chapter•DOI•

Data Mining Approach for Developing Various Models Based on Types of Attack and Feature Selection as Intrusion Detection Systems (IDS)

[...]

H. S. Hota¹, Akhilesh Kumar Shrivas¹•Institutions (1)

Guru Ghasidas University¹

01 Jan 2014

TL;DR: Empirical result shows that random forest technique outperforms in case of two class problem as well as five class problem on NSL-KDD data set and feature selection techniques on random forest tree model which is best model as binary classifier aswell as multiclass classifier.

...read moreread less

Abstract: Information security is one of the important issues to protect data or information from unauthorized access. Classification techniques play very important role in information security to classify data as legitimate or normal data. Nowadays, network traffic includes large amount of irrelevant information that increases complexity of classifier and affect the classification result, so we need to develop robust model that can classify the data with high accuracy. In this paper, various types of classification techniques are applied on NSL-KDD data with Tenfold cross-validation technique in two different viewpoints. First, the classification techniques are applied for two class problem as binary classification (normal and attack), and second, it is applied for five class problem as multiclass classification. Empirical result shows that random forest technique outperforms in case of two class problem as well as five class problem on NSL-KDD data set. Due to large amount of redundant data, we have also applied feature selection techniques on random forest tree model which is best model as binary classifier as well as multiclass classifier. Model produces highest accuracy with 15 features in case of binary classification. Performance of the various models are also evaluated using other performance measures like true-positive rate (TPR), false-positive rate (FPR), precision, F-measure and receiver operating characteristic (ROC) curve and the results are found to be satisfactory.

...read moreread less

Journal Article•

Comparison of two data mining techniques in labeling diagnosis to Iranian pharmacy claim dataset: artificial neural network (ANN) versus decision tree model.

[...]

Ehsan Rezaei-Darzi¹, Farshad Farzadfar¹, Amir Hashemi-Meshkini¹, Iman Navidi¹, Mahmood Mahmoudi¹, Mehdi Varmaghani¹, Parinaz Mehdipour¹, Soudi Alamdari M¹, Batool Tayefi¹, Shohreh Naderimagham¹, Fatemeh Soleymani¹, Alireza Mesdaghinia¹, Alireza Delavari¹, Kazem Mohammad¹ - Show less +10 more•Institutions (1)

Tehran University of Medical Sciences¹

01 Dec 2014-Archives of Iranian Medicine

TL;DR: Evaluating and comparing the prediction accuracy of two data mining techniques, including decision tree and neural network models in labeling diagnosis to gastrointestinal prescriptions in Iran found that artificial neural network and decision tree model represent similar accuracy in labeled diagnosis to GI prescription.

...read moreread less

Abstract: BACKGROUND: This study aimed to evaluate and compare the prediction accuracy of two data mining techniques, including decision tree and neural network models in labeling diagnosis to gastrointestinal prescriptions in Iran. METHODS: This study was conducted in three phases: data preparation, training phase, and testing phase. A sample from a database consisting of 23 million pharmacy insurance claim records, from 2004 to 2011 was used, in which a total of 330 prescriptions were assessed and used to train and test the models simultaneously. In the training phase, the selected prescriptions were assessed by both a physician and a pharmacist separately and assigned a diagnosis. To test the performance of each model, a k-fold stratified cross validation was conducted in addition to measuring their sensitivity and specificity. RESULT: Generally, two methods had very similar accuracies. Considering the weighted average of true positive rate (sensitivity) and true negative rate (specificity), the decision tree had slightly higher accuracy in its ability for correct classification (83.3% and 96% versus 80.3% and 95.1%, respectively). However, when the weighted average of ROC area (AUC between each class and all other classes) was measured, the ANN displayed higher accuracies in predicting the diagnosis (93.8% compared with 90.6%). CONCLUSION: According to the result of this study, artificial neural network and decision tree model represent similar accuracy in labeling diagnosis to GI prescription.

...read moreread less

Journal Article•DOI•

Fast Decision Tree Algorithm

[...]

V. Purdila, Stefan Gheorghe Pentiuc

01 Feb 2014-Advances in Electrical and Computer Engineering

TL;DR: An improved C4.5 algorithm that uses a compression mechanism to store the training and test data in memory and a very fast tree pruning algorithm that can be easily parallelized in order to achieve further speedup.

...read moreread less

Abstract: There is a growing interest nowadays to process large amounts of data using the well-known decision-tree learning algorithms. Building a decision tree as fast as possible against a lar ...

...read moreread less

Patent•

Decision tree model based multispectral remote sensing image river information extraction method

[...]

Ming Zhang, Xiao-Xiang Feng, Li Huaguo, Zheng Jing, Liu Zhe - Show less +1 more

19 Mar 2014

TL;DR: In this article, a decision tree model based multispectral remote sensing image river information extraction method was proposed for the thematic map production, which can be applied to the thematics map production directly.

...read moreread less

Abstract: The invention discloses a decision tree model based multispectral remote sensing image river information extraction method. The decision tree model based multispectral remote sensing image river information extraction method comprises step 1, preprocessing an obtained Landsat TM (Thematic Mapper) remote sensing image and segmenting out a river area to be extracted; step 2, performing ground object classification on the segmented river area, selecting 15 to 20 feature points from every type and extracting out corresponding picture element values from TM1 to TM5; step 3, analyzing spectrum characteristics of the different types of ground objects according to the extracted picture element values, establishing a decision rule and establishing a decision tree model of river information extraction; step 4, processing picture elements of a segmented river area image according to the decision tree model so as to generate a binaryzation image of water body information and non-water-body information; step 5, performing vectorization processing and post-processing on the generated binaryzation image to obtain river information. According to the decision tree model based multispectral remote sensing image river information extraction method, rapid retraction of the river information can be achieved and the decision tree model based multispectral remote sensing image river information extraction method can be applied to the thematic map production directly.

...read moreread less

Patent•

Decision-making tree based prediction method and device

[...]

Chen Huanhua, Cao Guoxiang

22 Oct 2014

TL;DR: In this paper, a decision-making tree based prediction method was proposed to improve the visualization effect of a decision making tree model and the decision-tree model testing process, which includes steps of generating a target training set by feature selection of data in a defective data set of prestored products according to feature attributes.

...read moreread less

Abstract: The invention discloses a decision-making tree based prediction method and device and relates to the field of data processing, and the method and the device can improve the visualization effect of a decision-making tree model and the decision-making tree model testing process. The decision-making tree based prediction method includes steps of generating a target training set by feature selection of data in a defective data set of prestored products according to feature attributes; training a training set by a decision-making tree algorithm to generate a decision-making tree, wherein the training set belongs to the target training set; compressing the decision-making tree to obtain a first decision-making tree; displaying the first decision-making tree by means of the visualization technology; selecting at least one test case from a test set to be sequentially input into the first decision-making tree for testing, and generating classification paths for the test cases; displaying the classification paths of the test cases in the first decision-making tree by means of the visualization technology. The decision-making tree based prediction method and device is applied to defect prediction of the products.

...read moreread less

Proceedings Article•DOI•

Complexity-reduced optimal power allocation in passive distributed radar systems

[...]

Omid Taghizadeh¹, Gholamreza Alirezaei¹, Rudolf Mathar¹•Institutions (1)

RWTH Aachen University¹

23 Oct 2014

TL;DR: This approach provides new insights to the nature of the power allocation problem and extracts some optimality conditions which are in turn used to achieve a new algorithm with reduced complexity for a reliable sensor selection.

...read moreread less

Abstract: In this paper, we provide an alternative derivation of the optimal power allocation for distributed passive radar systems in closed-form. Our approach provides new insights to the nature of the power allocation problem and extracts some optimality conditions which are in turn used to achieve a new algorithm with reduced complexity for a reliable sensor selection. Finally, we show the computational complexity and the run-time of the proposed algorithm against the previously available one by analytic and simulative comparisons.

...read moreread less

Comparing performance of decision tree and neural network in predicting myocardial infarction

[...]

Reza Safdari, M Ghazi saeedi, M Gharooni, M Nasiri, G Argi - Show less +1 more

01 Jan 2014

TL;DR: The results of the data mining showed that the variables of high blood pressure, hyperlipidemia and tobacco smoking were the most critical risk factors of myocardial infarction.

...read moreread less

Abstract: Purpose: Cardiovascular diseases are among the most common diseases in all societies. Using data mining techniques to generate predictive models to identify those at risk for reducing the effects of the disease is very helpful. The main purpose of this study was to predict the risk of myocardial infarction by Decision Tree based on the observed risk factors. Methods: The present work was an analytical study conducted on a database containing 350 records. Data were obtained from patients admitted to Shahid Rajaei specialized cardiovascular hospital, Iran, in 2011. Data were collected using a four-sectioned data collection form. Data analysis was performed using SPSS statistical software version 12 by CRISP methodology. In the modeling section decision tree and Neural Network were used. Results: The results of the data mining showed that the variables of high blood pressure, hyperlipidemia and tobacco smoking were the most critical risk factors of myocardial infarction. The accuracy of the decision tree model on the data was shown to be as 93/4. Conclusion: The best created model was decision tree C5.0. According to the created rules, it can be predicted which patient with new specified features may affected by myocardial infarction.

...read moreread less