scispace - formally typeset
Search or ask a question

Showing papers on "Decision tree model published in 2004"


Journal ArticleDOI
TL;DR: In this tutorial, traditional decision tree construction and the current state of decision tree modeling are reviewed and emphasis is placed on techniques that make decision trees well suited to handle the complexities of chemical and biochemical applications.
Abstract: In this tutorial, traditional decision tree construction and the current state of decision tree modeling are reviewed. Emphasis is placed on techniques that make decision trees well suited to handle the complexities of chemical and biochemical applications. Copyright © 2004 John Wiley & Sons, Ltd.

643 citations


Proceedings ArticleDOI
04 Jul 2004
TL;DR: This work introduces a framework, which is called Divide-by-2 (DB2), for extending support vector machines (SVM) to multi-class problems and shows that, DB2 is faster than one-against-one and one- against-rest algorithms in terms of testing time, significantly faster than the standard one- Against-Rest algorithms interms of training time, and the cross-validation accuracy ofDB2 is comparable to these two methods.
Abstract: We introduce a framework, which we call Divide-by-2 (DB2), for extending support vector machines (SVM) to multi-class problems. DB2 offers an alternative to the standard one-against-one and one-against-rest algorithms. For an N class problem, DB2 produces an N − 1 node binary decision tree where nodes represent decision boundaries formed by N − 1 SVM binary classifiers. This tree structure allows us to present a generalization and a time complexity analysis of DB2. Our analysis and related experiments show that, DB2 is faster than one-against-one and one-against-rest algorithms in terms of testing time, significantly faster than one-against-rest in terms of training time, and that the cross-validation accuracy of DB2 is comparable to these two methods.

157 citations


Proceedings ArticleDOI
27 Jun 2004
TL;DR: A new multimedia data mining framework for the extraction of soccer goal events in soccer videos by using combined multimodal analysis and decision tree logic is proposed.
Abstract: We propose a new multimedia data mining framework for the extraction of soccer goal events in soccer videos by using combined multimodal analysis and decision tree logic. The extracted events can be used to index the soccer videos. We first adopt an advanced video shot detection method to produce shot boundaries and some important visual features. Then, the visual/audio features are extracted for each shot at different granularities. This rich multimodal feature set is filtered by a pre-filtering step to clean the noise as well as to reduce the irrelevant data. A decision tree model is built upon the cleaned data set and is used to classify the goal shots. Finally, the experimental results demonstrate the effectiveness of our framework for soccer goal extraction.

84 citations


Journal ArticleDOI
TL;DR: A method of constructing regression trees within the framework of maximum likelihood that inherits the backward fitting idea of classification and regression trees (CART) but has more rigorous justification.
Abstract: We propose a method of constructing regression trees within the framework of maximum likelihood. It inherits the backward fitting idea of classification and regression trees (CART) but has more rigorous justification. Simulation studies show that it provides more accurate tree model selection compared to CART. The analysis of a baseball dataset is given as an illustration.

72 citations


Journal ArticleDOI
TL;DR: This paper investigates representations and supporting data structures for finite-memory processes, as well as the major impact these structures have on the universal algorithms in which they are used, and defines and investigates the properties of the finite-state machine (FSM) closure of a tree, which is the smallest FSM that generates all the processes generated by the tree.
Abstract: Tree models are efficient parametrizations of finite-memory processes, offering potentially significant model cost savings. The information theory literature has focused mostly on redundancy aspects of the universal estimation and coding of these models. In this paper, we investigate representations and supporting data structures for finite-memory processes, as well as the major impact these structures have on the universal algorithms in which they are used. We first generalize the class of tree models, and then define and investigate the properties of the finite-state machine (FSM) closure of a tree, which is the smallest FSM that generates all the processes generated by the tree. The interaction between FSM closures, generalized context trees (GCTs), and classical data structures such as compact suffix trees brings together the information-theoretic and the computational aspects, leading to the first algorithm for linear time encoding/decoding of a lossless twice-universal code in the class of three models. The implemented code is a two-pass version of Context. The corresponding optimal context selection rule and context transitions use tools similar to those employed in efficient implementation of the popular Burrows-Wheeler transform (BWT), yielding similar computational complexities. We also present a reversible transform that displays the same "context deinterleaving" feature as the BWT but is naturally based on an optimal context tree. FSM closures are also applied to an investigation of the effect of time reversal on tree models, motivated in part by the following question: When compressing a data sequence using a universal scheme in the class of tree models, can it make a difference whether we read the sequence from left to right or from right to left? Given a tree model of a process, we show constructively that the number of states in the tree model corresponding to the reversed process might be, in the extreme case, quadratic in the number of states of the original tree. This result answers the above motivating question in the affirmative.

65 citations


Proceedings Article
07 Jun 2004
TL;DR: The paper compares the models for small business credit scoring developed by logistic regression, neural networks, and CART decision trees on a Croatian bank dataset and finds the most successful neural network model was obtained by the probabilistic algorithm.
Abstract: The paper compares the models for small business credit scoring developed by logistic regression, neural networks, and CART decision trees on a Croatian bank dataset. The models obtained by all three methodologies were estimated; then validated on the same hold-out sample, and their performance is compared. There is an evident significant difference among the best neural network model, decision tree model, and logistic regression model. The most successful neural network model was obtained by the probabilistic algorithm. The best model extracted the most important features for small business credit scoring from the observed data

59 citations


Journal ArticleDOI
TL;DR: In this article, the authors considered the problem of distributed distance labeling on dynamic trees and proposed efficient distributed schemes for it, with amortized message complexity O(log 2π n) per operation, where n is the size of the tree at the time the operation takes place.
Abstract: Distance labeling schemes are composed of a marker algorithm for labeling the vertices of a graph with short labels, coupled with a decoder algorithm allowing one to compute the distance between any two vertices directly from their labels (without using any additional information). As applications for distance labeling schemes concern mainly large and dynamically changing networks, it is of interest to study distributed dynamic labeling schemes. The current paper considers the problem on dynamic trees, and proposes efficient distributed schemes for it. The paper first presents a labeling scheme for distances in the dynamic tree model, with amortized message complexity O(log2 n) per operation, where n is the size of the tree at the time the operation takes place. The protocol maintains O(log2 n) bit labels. This label size is known to be optimal even in the static scenario. A more general labeling scheme is then introduced for the dynamic tree model, based on extending an existing static tree labeling scheme to the dynamic setting. The approach fits a number of natural tree functions, such as distance, separation level, and flow. The main resulting scheme incurs an overhead of an O(log n) multiplicative factor in both the label size and amortized message complexity in the case of dynamically growing trees (with no vertex deletions). If an upper bound on n is known in advance, this method yields a different tradeoff, with an O(log2 n/log log n) multiplicative overhead on the label size but only an O(log n/log log n) overhead on the amortized message complexity. In the fully dynamic model the scheme also incurs an increased additive overhead in amortized communication, of O(log2 n) messages per operation.

44 citations


Journal ArticleDOI
01 Apr 2004
TL;DR: Experimental results demonstrate that the AHNT generalizes better than trees with homogeneous nodes, produces small trees and avoids the use of complex comparative statistical tests and/or a priori selection of large parameter sets.
Abstract: A new neural tree model, called adaptive high-order neural tree (AHNT), is proposed for classifying large sets of multidimensional patterns. The AHNT is built by recursively dividing the training set into subsets and by assigning each subset to a different child node. Each node is composed of a high-order perceptron (HOP) whose order is automatically tuned taking into account the complexity of the pattern set reaching that node. First-order nodes divide the input space with hyperplanes, while HOPs divide the input space arbitrarily, but at the expense of increased complexity. Experimental results demonstrate that the AHNT generalizes better than trees with homogeneous nodes, produces small trees and avoids the use of complex comparative statistical tests and/or a priori selection of large parameter sets.

43 citations


Journal ArticleDOI
TL;DR: By trading-off comprehensibility and performance using a multi-objective genetic programming optimization algorithm, this paper can induce polynomial-fuzzy decision trees (PFDT) that are smaller, more compact and of better performance than their linear decision tree (LDT) counterparts.
Abstract: Decision tree induction has been studied extensively in machine learning as a solution for classification problems. The way the linear decision trees partition the search space is found to be comprehensible and hence appealing to data modelers. Comprehensibility is an important aspect of models used in medical data mining as it determines model credibility and even acceptability. In the practical sense though, inordinately long decision trees compounded by replication problems detracts from comprehensibility. This demerit can be partially attributed to their rigid structure that is unable to handle complex non-linear or/and continuous data. To address this issue we introduce a novel hybrid multivariate decision tree composed of polynomial, fuzzy and decision tree structures. The polynomial nature of these multivariate trees enable them to perform well in non-linear territory while the fuzzy members are used to squash continuous variables. By trading-off comprehensibility and performance using a multi-objective genetic programming optimization algorithm, we can induce polynomial-fuzzy decision trees (PFDT) that are smaller, more compact and of better performance than their linear decision tree (LDT) counterparts. In this paper we discuss the structural differences between PFDT and LDT (C4.5) and compare the size and performance of their models using medical data.

40 citations


Proceedings ArticleDOI
01 Nov 2004
TL;DR: An algorithm designed to efficiently construct a decision tree over heterogeneously distributed data without centralizing is presented and its experimental results show that by using only 20% of the communication cost necessary to centralize the data it can achieve trees with accuracy at least 80%" of the trees produced by the centralized version.
Abstract: We present an algorithm designed to efficiently construct a decision tree over heterogeneously distributed data without centralizing We compare our algorithm against a standard centralized decision tree implementation in terms of accuracy as well as the communication complexity Our experimental results show that by using only 20% of the communication cost necessary to centralize the data we can achieve trees with accuracy at least 80% of the trees produced by the centralized version

38 citations


Journal ArticleDOI
01 Nov 2004
TL;DR: A new method to generate ensembles of classifiers that uses all available data to construct every individual classifier in an iterative manner and exhibits good performance in several standard datasets at low computational cost.
Abstract: This paper develops a new method to generate ensembles of classifiers that uses all available data to construct every individual classifier. The base algorithm builds a decision tree in an iterative manner: The training data are divided into two subsets. In each iteration, one subset is used to grow the decision tree, starting from the decision tree produced by the previous iteration. This fully grown tree is then pruned by using the other subset. The roles of the data subsets are interchanged in every iteration. This process converges to a final tree that is stable with respect to the combined growing and pruning steps. To generate a variety of classifiers for the ensemble, we randomly create the subsets needed by the iterative tree construction algorithm. The method exhibits good performance in several standard datasets at low computational cost.

Proceedings Article
01 Jan 2004
TL;DR: In this article, a new set of entropy-based machine learning attributes is proposed to overcome the problem that compression approximations may not work well on short sequences. But, as is well known, the exact Kolmogorov complexity is not algorithmically computable, in practice one can approximate it by computable compression methods.
Abstract: Biological sequences from different species are called orthologs if they evolved from a sequence of a common ancestor species and they have the same biological function. Approximations of Kolmogorov complexity or entropy of biological sequences are already well known to be useful in extracting similarity information between such sequences — in the interest, for example, of ortholog detection. As is well known, the exact Kolmogorov complexity is not algorithmically computable. In practice one can approximate it by computable compression methods. However, such compression methods do not provide a good approximation to Kolmogorov complexity for short sequences. Herein is suggested a new approach to overcome the problem that compression approximations may not work well on short sequences. This approach is inspired by new, conditional computations of Kolmogorov entropy. A main contribution of the empirical work described shows the new set of entropy-based machine learning attributes provides good separation between positive (ortholog) and negative (non-ortholog) data — better than with good, previously known alternatives (which do not employ some means to handle short sequences well). Also empirically compared are the new entropy based attribute set and a number of other, more standard similarity attributes sets commonly used in genomic analysis. The various similarity attributes are evaluated by cross validation, through boosted decision tree induction C5.0, and by Receiver Operating Characteristic (ROC) analysis. The results point to the conclusion: the new, entropy based attribute set by itself is not the one giving the best prediction; however, it is the best attribute set for use in improving the other, standard attribute sets when conjoined with them. Keywords—compression, decision tree, entropy, ortholog, ROC.

Journal ArticleDOI
TL;DR: The decision tree model developed and tested can help nurses understand the treatment-seeking behaviors of cancer patients, and hence develop nursing intervention strategies.
Abstract: The purpose of this study was to develop and test a decision tree model of the treatment-seeking behaviors among Korean cancer patients. The study used methodological triangulation, applying the cognitive ethnographic decision tree modeling approach. The model was developed based on qualitative data collected from in-depth interviews with 29 cancer patients. The model was tested using qualitative and quantitative data collected from interviews and a structured questionnaire involving 165 cancer patients. The predictability of the decision tree model was quantified as the proportion of participants who followed the pathway predicted by the model. Two models were developed, the first for decision making about when to visit a doctor after detecting symptoms, and the second for decision making about treatment type following the diagnosis. Decision outcomes for the first model were categorized into immediate visit and delayed visit. The first model was influenced by the perceived seriousness of symptoms, the experiences of visiting a doctor previously with similar symptoms, social-group influences on visiting a doctor, and barriers to visiting a doctor. Decision outcomes for the second model were hospital treatment only, and a mixture of hospital treatment and alternative therapies. The second model was influenced by curability, social-group influences on alternative therapies, and confidence in alternative therapies. The predictabilities of the 2 models were 90.3% and 94.5%, respectively. This study result can help nurses understand the treatment-seeking behaviors of cancer patients, and hence develop nursing intervention strategies.


Proceedings Article
Mikhail Ju. Moshov1, Igor Chikalov1
01 Apr 2004
TL;DR: Algorithm which allow to optimize decision trees consecutively againsts relatively different criterions for decision tables over an arbitrary infinite restricted information system have polynomial time complexity are considered.
Abstract: In the paper algorithms are considered which allow to optimize decision trees consecutively againsts relatively different criterions. For decision tables over an arbitrary infinite restricted information system [4], these algorithms have polynomial time complexity.

Proceedings ArticleDOI
07 Jun 2004
TL;DR: A method is introduced that replaces the clique model of a net by a tree model in the quadratic placement formulation that enables us to control the length of every tree segment separately.
Abstract: The performance of timing-driven placement methods depends strongly on the choice of the net model. In this paper a more precise net model is presented that does not increase numerical complexity. We introduce a method that replaces the clique model of a net by a tree model in the quadratic placement formulation. This improvement enables us to control the length of every tree segment separately. Furthermore, we present an analysis of the effects of every tree segment to the net delay. The result is in turn used to control the placement engine. Our presented results are based on legal placements. They show significant improvements over state-of-the art methods.

Journal ArticleDOI
TL;DR: Decision pathway modeling decomposes the multigroup classification problem into simpler binary discrimination tasks, which are then reassembled into a single hierarchical architecture to minimize effects of error propagation through the hierarchical architecture.
Abstract: Pattern recognition is playing an increasingly important role in chemical and biochemical data analysis Many of these pattern recognition applications call for the discrimination of more than two classes of objects Decision pathway modeling is proposed as a novel pattern recognition technique for multigroup classification Decision pathway modeling decomposes the multigroup classification problem into simpler binary discrimination tasks, which are then reassembled into a single hierarchical architecture To minimize effects of error propagation through the hierarchical architecture, dynamic pathway selection is proposed to adaptively direct the classification of new samples Decision pathway modeling is compared against generalized multigroup and coupled binary discriminant techniques in terms of classification accuracy The benefit of decision pathway modeling is shown to arise from the hierarchical decomposition and by the dynamic selection of classification pathways Copyright © 2004 John Wiley & Sons, Ltd

Book ChapterDOI
12 Jul 2004
TL;DR: Key findings are that when the data is highly unbalanced, algorithms tend to degenerate by assigning all cases to the most common outcome, and when data is balanced, accuracy rates tend to decline.
Abstract: This paper conducts experiments with three skewed data sets, seeking to demonstrate problems when skewed data is used, and identifying counter problems when data is balanced. The basic data mining algorithms of decision tree, regression-based, and neural network models are considered, using both categorical and continuous data. Two of the data sets have binary outcomes, while the third has a set of four possible outcomes. Key findings are that when the data is highly unbalanced, algorithms tend to degenerate by assigning all cases to the most common outcome. When data is balanced, accuracy rates tend to decline. If data is balanced, that reduces the training set size, and can lead to the degeneracy of model failure through omission of cases encountered in the test set. Decision tree algorithms were found to be the most robust with respect to the degree of balancing applied.

Proceedings ArticleDOI
16 Aug 2004
TL;DR: The preliminary results show that the T-complexity was the least effective in identifying previously established known associations between the sequences in the test set, with Shannon's entropy having an upper hand.
Abstract: In this work, we perform an empirical study of different published measures of complexity for general sequences, to determine their effectiveness in dealing with biological sequences. By effectiveness, we refer to how closely the given complexity measure is able to identify known biologically relevant relationships, such as closeness on a phylogenic tree. In particular, we study three complexity measures, namely, the traditional Shanon's entropy, linguistic complexity, and T-complexity. For each complexity measure, we construct the complexity profile for each sequence in our test set, and based on the profiles we compare the sequences using different performance measures based on: (i) the information theoretic divergence measure of relative entropy; (ii) apparent periodicity in the complexity profile; and (iii) correct phylogeny. The preliminary results show that the T-complexity was the least effective in identifying previously established known associations between the sequences in our test set. Shannon's entropy and linguistic-complexity provided better results, with Shannon's entropy having an upper hand.

Proceedings Article
01 May 2004
TL;DR: The new algorithm is shown to extract Decision Trees that have a higher predictive accuracy than those induced using C4.5 directly, and does not make assumptions about the ANN’s architecture or training algorithm; therefore, it can be applied to any type of ANN.
Abstract: Artificial Neural Networks (ANNs) have proved both a popular and powerful technique for pattern recognition tasks in a number of problem domains. However, the adoption of ANNs in many areas has been impeded, due to their inability to explain how they came to their conclusion, or show in a readily comprehendible form the knowledge they have obtained. This paper presents an algorithm that addresses these problems. The algorithm achieves this by extracting a Decision Tree, a graphical and easily understood symbolic representation of a decision process, from a trained ANN. The algorithm does not make assumptions about the ANN’s architecture or training algorithm; therefore, it can be applied to any type of ANN. The algorithm is empirically compared with Quinlan’s C4.5 (a common Decision Tree induction algorithm) using standard benchmark datasets. For most of the datasets used in the evaluation, the new algorithm is shown to extract Decision Trees that have a higher predictive accuracy than those induced using C4.5 directly.

Book ChapterDOI
12 Jul 2004
TL;DR: A study of tradeoffs between communication and computation in well-known communication models and in other related models finds that there is a computational task that exhibits a strong tradeoff behavior between the amount of communication and the time needed for local computation.
Abstract: We initiate a study of tradeoffs between communication and computation in well-known communication models and in other related models. The fundamental question we investigate is the following: Is there a computational task that exhibits a strong tradeoff behavior between the amount of communication and the amount of time needed for local computation?

Journal ArticleDOI
TL;DR: A computational procedure based on a decision-tree model for the identification and construction of all non-reducible descriptors in a supervised pattern recognition problem in which pattern descriptions consist of Boolean features is presented.

Journal ArticleDOI
TL;DR: A new general image segmentation system is presented, based on the calculation of a tree representation of the original image in which image regions are assigned to tree nodes, followed by a correspondence process with a model tree, which embeds the a priori knowledge about the images.

01 Jul 2004
TL;DR: An approach to decision tree induction based on this framework is studied and its performance when applied to realworld datasets is compared with the C4.5 and other machine learning algorithms.
Abstract: Label Semantics is a random set based framework for Computing with Words. Imprecise concepts are modeled by the degrees of appropriateness of a linguistic expression as defined by a fuzzy set. An approach to decision tree induction based on this framework is studied and its performance when applied to realworld datasets is compared with the C4.5 and other machine learning algorithms. A method of classification under linguistic constraints was proposed and studied with experiments.

01 Jan 2004
TL;DR: A novel multiscale tree representation model of web sites is proposed that contains an HMT-based two-phase classification algorithm, a context-based interscale fusion algorithms, a two-stage text-based denoising procedure, and an entropy-base pruning strategy.
Abstract: With the exponential growth of both the amount and the diversity of the web information, web site mining is highly desirable for automatically discovering and classifying topic-specific web resources from the World Wide Web. Nevertheless, existing web site mining methods have not yet handled adequately how to make use of all the correlative contextual semantic clues and how to denoise the content of web sites effectually so as to obtain a better classification accuracy. This paper circumstantiates three issues to be solved for designing an effective and efficient web site mining algorithm, i.e., the sampling size, the analysis granularity, and the representation structure of web sites. On the basis, this paper proposes a novel multiscale tree representation model of web sites, and presents a multiscale web site mining approach that contains an HMT-based two-phase classification algorithm, a context-based interscale fusion algorithm, a two-stage text-based denoising procedure, and an entropy-base pruning strategy. The proposed model and algorithms may be used as a starting-point for further investigating some related issues of web sites, such as query optimization of multiple sites and web usage mining. Experiments also show that the approach achieves in average 16% improvement in classification accuracy and 34.5% reduction in processing time over the baseline system.

Journal ArticleDOI
TL;DR: An exponential lower bound on the size of a decision tree for this function is obtained, and an asymptotic formula is derived, having a linear main term, for its average sensitivity is derived.
Abstract: We study various combinatorial complexity measures of Boolean functions related to some natural arithmetic problems about binary polynomials, that is, polynomials over F2. In particular, we consider the Boolean function deciding whether a given polynomial over F2 is squarefree. We obtain an exponential lower bound on the size of a decision tree for this function, and derive an asymptotic formula, having a linear main term, for its average sensitivity. This allows us to estimate other complexity characteristics such as the formula size, the average decision tree depth and the degrees of exact and approximative polynomial representations of this function. Finally, using a different method, we show that testing squarefreeness and irreducibility of polynomials over F2 cannot be done in AC0[p] for any odd prime p. Similar results are obtained for deciding coprimality of two polynomials over F2 as well.

Journal ArticleDOI
TL;DR: The Style Tree Model is employed to detect and climinate noises in any Web pages of the site, and an information based measure to determine which element node is noisy is constructed.
Abstract: A Web page typically contains many information blocks. Apart from the main content blocks, it usually has such blocks as navigation panels, copyright and privacy notices, and advertisements. We call these blocks the noisy blocks. The noises in Web pages can seriously harm Web data mining. To the question of climinating these noises, we intro duce a new tree structure, called Style Tree, and study an algorithm how to construct a site style tree. The Style Tree Model is employed to detect and climinate noises in any Web pages of the site. An information based measure to determine which element node is noisy is also constructed. In addition, the applications of this method are discussed in detail. Experimental results show that our noises climination technique is able to improve the mining results significantly.

Book ChapterDOI
23 Mar 2004
TL;DR: The complexity index captures the "richness of the language" used in a sequence and is used to characterize the sequence statistically and has a long history of applications in several fields, such as data compression, computational biology, data mining, computational linguistics, among others.
Abstract: This paper discusses the measure of complexity of a sequence called the complexity index. The complexity index captures the "richness of the language" used in a sequence. The measure is simple but quite intuitive. Sequences with low complexity index contain a large number of repeated substrings and they eventually become periodic (e.g., tandem repeats in a DNA sequence). The complexity index is used to characterize the sequence statistically and has a long history of applications in several fields, such as data compression, computational biology, data mining, computational linguistics, among others.

Journal ArticleDOI
TL;DR: It is proved that as tree size becomes large, the asymptotic performance ratio of such a randomized dilation-1 tree embedding is N/(N-1) in linear arrays and is optimal in rings.

Book ChapterDOI
01 Jan 2004
TL;DR: In this article, a decision tree model was used to predict the bankruptcy of Japanese companies, which is based on the decision tree of a tree-structured decision tree, and showed good performance.
Abstract: Many prior researches have used discriminate analysis. however, this paper uses a decision tree model to predict bankruptcy of Japanese companies.