Showing papers on "Decision tree model published in 2008"
••
12 Oct 2008TL;DR: This model can alleviate the limitations of a single tree-structured model by combining information provided across different tree models, and combines multiple deformable trees for capturing spatial constraints between non-connected body parts.
Abstract: Tree-structured models have been widely used for human pose estimation, in either 2D or 3D. While such models allow efficient learning and inference, they fail to capture additional dependencies between body parts, other than kinematic constraints between connected parts. In this paper, we consider the use of multiple tree models, rather than a single tree model for human pose estimation. Our model can alleviate the limitations of a single tree-structured model by combining information provided across different tree models. The parameters of each individual tree model are trained via standard learning algorithms in a single tree-structured model. Different tree models can be combined in a discriminative fashion by a boosting procedure. We present experimental results showing the improvement of our approaches on two different datasets. On the first dataset, we use our multiple tree framework for occlusion reasoning. On the second dataset, we combine multiple deformable trees for capturing spatial constraints between non-connected body parts.
167 citations
••
TL;DR: A data mining method combining attribute-oriented induction, information gain, and decision tree, which is suitable for preprocessing financial data and constructing decision tree model for financial distress prediction is put forward.
Abstract: Data mining technique is capable of mining valuable knowledge from large and changeable database. This paper puts forward a data mining method combining attribute-oriented induction, information gain, and decision tree, which is suitable for preprocessing financial data and constructing decision tree model for financial distress prediction. On the base of financial ratios attributes and one class attribute, adopting entropy-based discretization method, a data mining model for listed companies' financial distress prediction is designed. The empirical experiment with 35 financial ratios and 135 pairs of listed companies as initial samples got satisfying result, which testifies the feasibility and validity of the proposed data mining method for listed companies' financial distress prediction.
159 citations
••
01 Dec 2008TL;DR: A simple sketching method to generate a realistic 3D tree model from a single image by drawing at least two strokes in the tree image, including branches and leaves.
Abstract: In this paper, we introduce a simple sketching method to generate a realistic 3D tree model from a single image. The user draws at least two strokes in the tree image: the first crown stroke around the tree crown to mark up the leaf region, the second branch stroke from the tree root to mark up the main trunk, and possibly few other branch strokes for refinement. The method automatically generates a 3D tree model including branches and leaves. Branches are synthesized by a growth engine from a small library of elementary subtrees that are pre-defined or built on the fly from the recovered visible branches. The visible branches are automatically traced from the drawn branch strokes according to image statistics on the strokes. Leaves are generated from the region bounded by the first crown stroke to complete the tree. We demonstrate our method on a variety of examples.
134 citations
••
TL;DR: An evolutionary method is presented which allows decision tree flexibility through the use of co-evolving competition between the decision tree and the training data set, which gives results comparable with or superior to other classification methods.
Abstract: Decision tree classification provides a rapid and effective method of categorising datasets. Many algorithmic methods exist for optimising decision tree structure, although these can be vulnerable to changes in the training dataset. An evolutionary method is presented which allows decision tree flexibility through the use of co-evolving competition between the decision tree and the training data set. This method is tested using two different datasets and gives results comparable with or superior to other classification methods. A final discussion argues for the utility of decision trees over algorithmic or other alternative methods such as neural networks, particularly in situations where a large number of variables are being considered.
125 citations
••
TL;DR: This article considers the construction of classification trees using TARGET, a genetic algorithm approach to constructing decision trees called tree analysis with randomly generated and evolved trees (TARGET) that performs a better search of the tree model space and largely resolves the problems with current tree modeling techniques.
94 citations
•
80 citations
••
01 Oct 2008TL;DR: The proposed fuzzy supervised learning in Quest (SLIQ) decision tree (FS-DT) algorithm is aimed at constructing a fuzzy decision boundary instead of a crisp decision boundary, which results in more than 70% reduction in size of the decision tree compared to SLIQ.
Abstract: Traditional decision tree algorithms face the problem of having sharp decision boundaries which are hardly found in any real-life classification problems. A fuzzy supervised learning in Quest (SLIQ) decision tree (FS-DT) algorithm is proposed in this paper. It is aimed at constructing a fuzzy decision boundary instead of a crisp decision boundary. Size of the decision tree constructed is another very important parameter in decision tree algorithms. Large and deeper decision tree results in incomprehensible induction rules. The proposed FS-DT algorithm modifies the SLIQ decision tree algorithm to construct a fuzzy binary decision tree of significantly reduced size. The performance of the FS-DT algorithm is compared with SLIQ using several real-life datasets taken from the UCI Machine Learning Repository. The FS-DT algorithm outperforms its crisp counterpart in terms of classification accuracy. FS-DT also results in more than 70% reduction in size of the decision tree compared to SLIQ.
70 citations
••
TL;DR: A unique embedding list representation of the tree structure, which enables efficient implementation of the Tree Model Guided (TMG) candidate generation, and shows through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach.
Abstract: Due to the inherent flexibilities in both structure and semantics, XML association rules mining faces few challenges, such as: a more complicated hierarchical data structure and ordered data context. Mining frequent patterns from XML documents can be recast as mining frequent tree structures from a database of XML documents. In this study, we model a database of XML documents as a database of rooted labeled ordered subtrees. In particular, we are mainly concerned with mining frequent induced and embedded ordered subtrees. Our main contributions are as follows. We describe our unique embedding list representation of the tree structure, which enables efficient implementation of our Tree Model Guided (TMG) candidate generation. TMG is an optimal, nonredundant enumeration strategy that enumerates all the valid candidates that conform to the structural aspects of the data. We show through a mathematical model and experiments that TMG has better complexity compared to the commonly used join approach. In this article, we propose two algorithms, MB3-Miner and iMB3-Miner. MB3-Miner mines embedded subtrees. iMB3-Miner mines induced and/or embedded subtrees by using the maximum level of embedding constraint. Our experiments with both synthetic and real datasets against two well-known algorithms for mining induced and embedded subtrees, demonstrate the effectiveness and the efficiency of the proposed techniques.
47 citations
••
TL;DR: It is proposed to define the complexity of an ecological model as the statistical complexity of the output it produces, and it is suggested that model complexity so defined better captures the difficulty faced by a user in managing and understanding the behaviour of an ecology model than measures based on a model ‘size’.
37 citations
••
TL;DR: Three data mining classification techniques are used to predict the auditor choice and two models reveal that the level of debt is a factor that influences the Auditor choice decision.
Abstract: The selection of a proper auditor is driven by several factors. Here, we use three data mining classification techniques to predict the auditor choice. The methods used are Decision Trees, Neural Networks and Support Vector Machines. The developed models are compared in term of their performances. The wrapper feature selection technique is used for the Decision Tree model. Two models reveal that the level of debt is a factor that influences the auditor choice decision. This study has implications for auditors, investors, company decision makers and researchers.
26 citations
••
TL;DR: This work presents a novel approach for optimizing H.264/AVC video compression by dynamically allocating computational complexity and bits for encoding each coding element within a video sequence, according to its predicted MAD.
••
TL;DR: This paper attempts to model the process of activity scheduling conflict resolution with actual scheduling process data, and evaluates a number of conflict resolution models, including a decision tree model and two discrete choice models.
Abstract: This paper attempts to model the process of activity scheduling conflict resolution with actual scheduling process data. The resolution of activity scheduling conflicts is a critical component of rule-based activity scheduling models. Many current scheduling models use an assumed priority for each activity type to estimate how activity conflicts will be resolved, but research has shown that these activity type-based priority assumptions often do not hold in actuality. Therefore, the conflict resolution data captured in the scheduling process survey were used to estimate and evaluate a number of conflict resolution models, including a decision tree model and two discrete choice models. Both the conflict resolution decision tree model and the discrete choice models showed a promising ability to predict the resolution strategies chosen almost entirely on the basis of the attributes of the activities in conflict and characteristics of the surrounding schedule. These models present a useful advance in increasing the realism and accuracy of rule-based activity scheduling models.
••
15 Sep 2008TL;DR: Using the trusted hardware based model, the computation complexity of the scheme, including offline computation, is linear to the number of queries and is bounded by ${\mathrm{O}}(\sqrt{n})$ after optimization.
Abstract: For a private information retrieval (PIR) scheme to be deployed in practice, low communication complexity and low computation complexity are two fundamental requirements it must meet. Most existing PIR schemes only focus on the communication complexity. The reduction on the computational complexity did not receive the due treatment mainly because of its O(n) lower bound. By using the trusted hardware based model, we design a novel scheme which breaks this barrier. With constant storage, the computation complexity of our scheme, including offline computation, is linear to the number of queries and is bounded by ${\mathrm{O}}(\sqrt{n})$ after optimization.
••
18 Oct 2008TL;DR: The presented approach is aimed at handling uncertain information during the process of inducing decision trees and generalizes the rough set based approach to decision tree construction by allowing some extent misclassification when classifying objects.
Abstract: This paper presents a new approach for constructing decision trees based on variable precision rough set model. The presented approach is aimed at handling uncertain information during the process of inducing decision trees and generalizes the rough set based approach to decision tree construction by allowing some extent misclassification when classifying objects. In the paper, variable precision weighted mean precision are introduced. The new algorithm effectively overcomes the influence of the noise data in structuring decision tree, reduces the complexity of decision tree and strengthens its extensive ability.
•
01 Jan 2008TL;DR: A linear structured prediction model is used to solve the problem of machine translation using a novel representation called an aligned extended projection, or AEP, a parse-tree like structure that models clause-level phenomena such as verbal argument structure and lexical word-order.
Abstract: In this thesis, we take a statistical tree-to-tree approach to solving the problem of machine translation (MT). In a statistical tree-to-tree approach, first the source-language input is parsed into a syntactic tree structure; then the source-language tree is mapped to a target-language tree. This kind of approach has several advantages. For one, parsing the input generates valuable information about its meaning. In addition, the mapping from a source-language tree to a target-language tree offers a mechanism for preserving the meaning of the input. Finally, producing a target-language tree helps to ensure the grammaticality of the output.
A main focus of this thesis is to develop a statistical tree-to-tree mapping algorithm. Our solution involves a novel representation called an aligned extended projection, or AEP. The AEP, inspired by ideas in linguistic theory related to tree-adjoining grammars, is a parse-tree like structure that models clause-level phenomena such as verbal argument structure and lexical word-order. The AEP also contains alignment information that links the source-language input to the target-language output. Instead of learning a mapping from a source-language tree to a target-language tree, the AEP-based approach learns a mapping from a source-language tree to a target-language AEP.
The AEP is a complex structure, and learning a mapping from parse trees to AEPs presents a challenging machine learning problem. In this thesis, we use a linear structured prediction model to solve this learning problem. A human evaluation of the AEP-based translation approach in a German-to-English task shows significant improvements in the grammaticality of translations. This thesis also presents a statistical parser for Spanish that could be used as part of a Spanish/English translation system. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)
••
01 Jan 2008TL;DR: A tree-based methodology that grows an optimal tree structure with the posterior prediction modelling to be used as decision rule for new objects is presented and a special case will be described in details.
Abstract: The framework of this paper is classification and regression trees, also known as tree-based methods, binary segmentation, tree partitioning, decision trees. Trees can be fruitfully used either to explore and understand the dependence relationship between the response variable and a set of predictors or to assign the response class or value for new objects on which only the measurements of predictors are known. Since the introduction of two-stage splitting procedure in 1992, the research unit in Naples has been introducing several contributions in this field, one of the main issues is combining tree partitioning with statistical models. This paper will provide a new idea of knowledge extraction using trees and models. It will deal with the trade off between the interpretability of the tree structure (i.e., exploratory trees) and the accuracy of the decision tree model (i.e., decision tree-based rules). Prospective and retrospective view of using models and trees will be discussed. In particular, we will introduce a tree-based methodology that grows an optimal tree structure with the posterior prediction modelling to be used as decision rule for new objects. The general methodology will be presented and a special case will be described in details. An application on a real world data set will be finally shown.
••
TL;DR: A sharp lower bound 2n on the communication complexity of recognizing the 2n-dimensional orthant is established and this bound holds also for the EMPTINESS and the KNAPSACK problems.
Abstract: Deterministic and probabilistic communication protocols are introduced in which parties can exchange the values of polynomials (rather than bits in the usual setting) It is established a sharp lower bound 2n on the communication complexity of recognizing the 2n-dimensional orthant, on the other hand the probabilistic communication complexity of recognizing it does not exceed 4 A polyhedron and a union of hyperplanes are constructed in $$\mathbb{R}^{2n}$$for which a lower bound n on the probabilistic communication complexity of recognizing each is proved As a consequence this bound holds also for the EMPTINESS and the KNAPSACK problems
••
12 Mar 2008
TL;DR: This paper proposes two main techniques for reduce computational complexity on artificial neural networks, using piecewise linear activation function, and support vector machines built on a probability based binary tree.
Abstract: This paper proposes two main techniques for reduce computational complexity on artificial neural networks, using piecewise linear activation function, and support vector machines built on a probability based binary tree. These methods are compared with well-known classifiers based on the computational complexity, correct rate and time taken to process the required information. The results show that probability based binary tree SVM has an equivalent recognition rate and is faster than ANNs.
••
12 Dec 2008TL;DR: This essay uses the XML DOM object to do some research on the subjective item scoring problem, analyses the DOM tree's structure and the tree model, and designs a scoring system algorithm that can solve the short-answer, discussion-question kind of subjective items problem.
Abstract: The test system in this essay is a test system under the mode of B/S (Browser/Server). In the test system, the subjective item grading technology is always a problem that limits the computer scoring technology development. The subjective items generally require to answer the questions in a way of language description, Since different person has different way of thinking, different level of understanding and different way of describing, the answers cannot be unanimously the same. Here we'll use the XML DOM object to do some research on the subjective item scoring problem. Solving the type of subjective items that has no unanimously same answers, such as short-answer questions, discussion questions etc. There are two factors that will affect the subjective item scoring: knowledge point and the nearness level. The unidirectional nearness algorithm in the fuzzy mathematics only focus on the keyword matching, but ignore the scoring of the knowledge point and the nearness level of the whole question's answering. First we'll analyses the DOM tree's structure and the tree model, research the use of the navigation document tree, DOM tree object, attribute data's reading and DOM tree DFS(depth first search) traversal method. And then discuss and build a automated scroing system's workflow based on XML DOM tree. Design a scoring system algorithm that can solve the short-answer, discussion-question kind of subjective items problem.
••
11 Dec 2008TL;DR: This paper extracts automotive marketing information, constructs data warehouse, adopts an improved ID3 decision tree model and an association rule model, and then obtains prediction information of automotive customers' behavior.
Abstract: This paper extracts automotive marketing information, constructs data warehouse, adopts an improved ID3 decision tree model and an association rule model to do data mining, and then obtains prediction information of automotive customers' behavior. Experimental and comparative results verify the validity and accuracy of the prediction results.
••
18 Nov 2008TL;DR: Data mining algorithm based on classification mode can provide intelligent decision support for the customer management of enterprise by efficiently identifying customers, evaluating customer value, segmentation customers, improving the sale effect, retaining customers and increasing customers satisfaction and loyalty degree.
Abstract: This paper studies data mining algorithm based on classification mode in detail, especially classification rules pick-up based on rough sets and based on construction decision trees. An improving-algorithm of decision tree model based on rough sets is given. The technique of decision tree based on rough sets is used in customer value management fields, measurement customer value and segmentation customers were carried out after the decision tree model based on rough sets was set up by a series of feasible index system. Based on data mining technique, We can provide intelligent decision support for the customer management of enterprise by efficiently identifying customers, evaluating customer value, segmentation customers, improving the sale effect, retaining customers and increasing customers satisfaction and loyalty degree and so on.
••
07 Jan 2008TL;DR: In this article, a text-based language identification approach based on decision trees and adaptive resonance learning (ART) neural network has been proposed to identify the language associated with each grammar item using that text.
Abstract: Automatic language identification (LID) is a topic of great significance in areas of intelligent and security, where the language identities of any related materials need to be identified before any information can be processed. When the recognition elements of any content is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified using that text. Many methods have been proposed in the literature are focusing on Roman and Asian languages. This paper describes text-based language identification approaches on Arabic script. Two different approaches have been compared. The decision trees method commonly used in many application domain is firstly reviewed. We also applied a simple method for language identification that is based on adaptive resonance learning (ART) neural network. The experimented result shows that the decision tree model achieved highest accuracy than ARTMAP model. However, decision tree model may not reliable if the language used extends to others Arabic script compared to ARTMAP model. It is assumed that hybrid of both models will perform better and merit for further development.
•
TL;DR: A new algorithm for generating decision tree based on rough set model is proposed, which introduces hold-down factor, which is an additional terminal condition for expanding nodes, besides traditional one, which means the size of decision tree generated will not be too large to understand for the user.
Abstract: Among the decision tree generation algorithms, which are based on rough set model, existing algorithms usually partition examples too detailedly to avoid the negative impact caused by a few special examples on decision tree because of the classification accuracy. This leads to that the generated decision tree seems too large to be understood. It also weakens its classification ability and predictable ability on data to be classified or predicted. In order to solve these problems, a new algorithm for generating decision tree based on rough set model is proposed. It introduces hold-down factor, which is an additional terminal condition for expanding nodes, besides traditional one. For generating node, if the hold-down factor of some sample is bigger than the given threshold, the node will not be expanded any more. Thus, the problem of too detailed partition is avoided. The size of decision tree generated by the proposed algorithm will not be too large to understand for the user.
•
•
TL;DR: This paper presented the expressing method of decision tree and an optimized decision tree learning algorithm of ID3 and C4.5, especially explained how to select the regulation of decision attribution.
Abstract: This paper presented the expressing method of decision tree and an optimized decision tree learning algorithm of ID3 and C4.5,especially explained how to select the regulation of decision attribution.Compared with C4.5,there are many short comings in ID3.C4.5 improves decision tree classification efficiency and presentation distinctness.
•
TL;DR: Experimental Analysis of the data shows that the new algorithm can be more reasonable and more effective rules.
Abstract: Decision tree is an important method in induction learning as well as in data mining,which can be used to form classification and predictive model.ID3 algorithm is the most widely used algorithm in the decision tree,based on the Decision Tree ideas in the data mining,for the shortcoming of inclining to chose attributions which has many values,the paper has improved ID3 by introducing irrelevant degrees.Experimental Analysis of the data shows that the new algorithm can be more reasonable and more effective rules.
••
12 Dec 2008TL;DR: Experiments performed on a programming optimized source code show that the computational complexity associated with each frame is well controlled below a given limit with very little R-D performance degradation under a reasonable constraint comparing to the unconstrained case.
Abstract: The allowable computational complexity of video encoding is limited in a power-constrained system. Different video frames are associated with different motions and contexts, and so are associated with different computational complexities if no complexity control is utilized. Variation in computational complexity leads to encoding delay jittering. Typically motion estimation (ME) consumes much more computational complexity than other encoding tools. This work proposes a practical complexity control method based on the complexity analysis of an H.264 video encoder to determine the coding gain of each encoding tool in the video encoder. Experiments performed on a programming optimized source code show that the computational complexity associated with each frame is well controlled below a given limit with very little R-D performance degradation under a reasonable constraint comparing to the unconstrained case.
•
TL;DR: Wang et al. as mentioned in this paper used classification and regression tree (CART) algorithm for purposes of mining classification rules from training samples, which integrated spectral, texture and the assistant geographical characteristics.
Abstract: Wetlands are considered an integral part of the global ecosystem.Enhancement of their scientific management informed by quantitative,accurate and repeatable observations of wetlands' landscape would obviously be significant.Taking the northeast of the Sanjiang Plain as a case study,we use classification and regression tree(CART) algorithm for purposes of mining classification rules from training samples.Classification tree model of wetland information extraction was built from these samples through CART algorithm,which integrates spectral,texture and the assistant geographical characteristics.The classification results based on CART algorithm were checked by statistical confusion matrix accuracy assessment using field GPS samples.Validation shows that total classification accuracy is 82.65%,Kappa coefficient is 0.7935.The results had suggested that the accuracy of classification based on the CART algorithm was higher than the MLC supervised classification method.The developed method is portable,relatively easy to implement,and should be applicable in other settings and larger extents.
••
15 Dec 2008TL;DR: A set of techniques to simplify tree models for faster rendering while retaining their visual resemblance to the original model is presented by setting a budget constraint of how many leaves shall be rendered, and an iterative algorithm is proposed to allocate leaf objects at different viewing angles.
Abstract: We present a set of techniques to simplify tree models for faster rendering while retaining their visual resemblance to the original model. This goal is achieved by setting a budget constraint of how many leaves shall be rendered, and we select leaves with higher possibilities to be visible to generate a simplified model for rendering. We first examine how leaf objects can be prioritized to provide a tree model suitable for a single viewing angle. The camerapsilas projected screen space is partitioned into several small regions to ensure that each region is guaranteed to be filled with a certain number of leaves. Each region selects higher priority leaves by utilizing the leaf objectspsila spatial relationship to the camera. Then, an iterative algorithm is proposed to allocate leaf objects at different viewing angles. The algorithm chooses different viewing positions for leaf allocation at each iteration with the goal of maximizing the resemblance to the original tree model at all possible angles. Three criteria taken in consideration for viewing angle selection are: distance, leaf normal, and unfilled pixel. Experimental results are provided to demonstrate time efficiency of the proposed techniques at the cost of little visual degradation.
•
TL;DR: Experimental results demonstrated that the improved ID3 algorithm (AVID3) is more efficient than the traditionalID3 algorithm on many data sets.
Abstract: ID3 is a classical decision tree induction algorithm in data mining.It has the preference bias in selecting attributes with multiple values and is related to the number of training examples.A new approach to solving these drawbacks is given.At first,the threshold of attributes value's number is assigned to optimize the decision tree in calculating the entropy.At the meantime,a tree pruning method is implemented by adopting another threshold to reduce the error rate of the fully expanded tree.Experimental results demonstrated that the improved ID3 algorithm(AVID3) is more efficient than the traditional ID3 algorithm on many data sets.