scispace - formally typeset
Search or ask a question

Showing papers on "Decision tree model published in 2007"


Journal Article
TL;DR: In this paper, a new time-series forecasting model based on the flexible neural tree (FNT) is introduced. But the model is not suitable for time series forecasting and it is difficult to select the proper input variables or time-lags for constructing a time series model.
Abstract: Time-series forecasting is an important research and application area. Much effort has been devoted over the past several decades to develop and improve the time-series forecasting models. This paper introduces a new time-series forecasting model based on the flexible neural tree (FNT). The FNT model is generated initially as a flexible multi-layer feed-forward neural network and evolved using an evolutionary procedure. Very often it is a difficult task to select the proper input variables or time-lags for constructing a time-series model. Our research demonstrates that the FNT model is capable of handing the task automatically. The performance and effectiveness of the proposed method are evaluated using time series prediction problems and compared with those of related methods.

272 citations


Journal ArticleDOI
TL;DR: The result shows that C4.5 and PCA-based diagnosis method has higher accuracy and needs less training time than BPNN.

224 citations


Proceedings ArticleDOI
07 Sep 2007
TL;DR: The tool, the Trend Profiler (trend-prof), is described, for constructing models of empirical computational complexity that predict how many times each basic block in a program runs as a linear or a powerlaw function of user-specified features of the program's workloads.
Abstract: The standard language for describing the asymptotic behavior of algorithms is theoretical computational complexity. We propose a method for describing the asymptotic behavior of programs in practice by measuring their empirical computational complexity. Our method involves running a program on workloads spanning several orders of magnitude in size, measuring their performance, and fitting these observations to a model that predicts performance as a function of workload size. Comparing these models to the programmer's expectations or to theoretical asymptotic bounds can reveal performance bugs or confirm that a program's performance scales as expected. Grouping and ranking program locations based on these models focuses attention on scalability-critical code. We describe our tool, the Trend Profiler (trend-prof), for constructing models of empirical computational complexity that predict how many times each basic block in a program runs as a linear (y = a + bx) or a powerlaw (y = axb) function of user-specified features of the program's workloads. We ran trend-prof on several large programs and report cases where a program scaled as expected, beat its worst-case theoretical complexity bound, or had a performance bug.

147 citations


Journal ArticleDOI
TL;DR: This article proposes an IDS model based on a general and enhanced flexible neural tree (FNT) model that allows input variables selection, overlayer connections, and different activation functions for the various nodes involved.
Abstract: An intrusion is defined as a violation of the security policy of the system, and, hence, intrusion detection mainly refers to the mechanisms that are developed to detect violations of system security policy. Current intrusion detection systems ~IDS! examine all data features to detect intrusion or misuse patterns. Some of the features may be redundant or contribute little ~if anything! to the detection process. The purpose of this study is to identify important input features in building an IDS that is computationally efficient and effective. This article proposes an IDS model based on a general and enhanced flexible neural tree ~FNT!. Based on the predefined instruction/operator sets, a flexible neural tree model can be created and evolved. This framework allows input variables selection, overlayer connections, and different activation functions for the various nodes involved. The FNT structure is developed using an evolutionary algorithm, and the parameters are optimized by a particle swarm optimization algorithm. Empirical results indicate that the proposed method is efficient. © 2007 Wiley Periodicals, Inc.

137 citations


Journal ArticleDOI
TL;DR: This is the first algorithm that can learn arbitrary monotone Boolean functions to high accuracy, using random examples only, in time polynomial in a reasonable measure of the complexity of a decision tree size of f.
Abstract: We give an algorithm that learns any monotone Boolean function $\fisafunc$ to any constant accuracy, under the uniform distribution, in time polynomial in $n$ and in the decision tree size of $f.$ This is the first algorithm that can learn arbitrary monotone Boolean functions to high accuracy, using random examples only, in time polynomial in a reasonable measure of the complexity of $f.$ A key ingredient of the result is a new bound showing that the average sensitivity of any monotone function computed by a decision tree of size $s$ must be at most $\sqrt{\log s}$. This bound has proved to be of independent utility in the study of decision tree complexity [O. Schramm, R. O'Donnell, M. Saks, and R. Servedio, Every decision tree has an influential variable, in Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science, IEEE Computer Society, Los Alamitos, CA, 2005, pp. 31-39]. We generalize the basic inequality and learning result described above in various ways—specifically, to partition size (a stronger complexity measure than decision tree size), $p$-biased measures over the Boolean cube (rather than just the uniform distribution), and real-valued (rather than just Boolean-valued) functions.

111 citations


Proceedings ArticleDOI
11 Jun 2007
TL;DR: A mathematical foundation for the probabilistic tree model is developed and a very large class of queries for which simple variations of querying and updating algorithms from [3] compute the correct answer is identified.
Abstract: In [3], we introduced a framework for querying and updating probabilistic information over unordered labeled trees, the probabilistic tree model. The data model is based on trees where nodes are annotated with conjunctions of probabilistic event variables. We briefly described an implementation and scenarios of usage. We develop here a mathematical foundation for this model. In particular, we present complexity results. We identify a very large class of queries for which simple variations of querying and updating algorithms from [3] compute the correct answer. A main contribution is a full complexity analysis of queries and updates. We also exhibit a decision procedure for the equivalence of probabilistic trees and prove it is in co-RP. Furthermore, we study the issue of removing less probable possible worlds, and that of validating a probabilistic tree against a DTD. We show that these two problems are intractable in the most general case.

104 citations


Journal ArticleDOI
02 Oct 2007
TL;DR: A novel approach for the automatic extraction of trees and the delineation of the tree crowns from remote sensing data, based on co-registered colour-infrared aerial imagery and a digital surface model, proves the feasibility of this approach.
Abstract: In this paper, we present a novel approach for the automatic extraction of trees and the delineation of the tree crowns from remote sensing data, and report and evaluate the results obtained with different test data sets The approach is scale-invariant and is based on co-registered colour-infrared aerial imagery and a digital surface model (DSM) Our primary assumption is that the coarse structure of the crown, if represented at the appropriate level in scale-space, can be approximated with the help of an ellipsoid The fine structure of the crown is suppressed at this scale level and can be ignored Our approach is based on a tree model with three geometric parameters (size, circularity and convexity of the tree crown) and one radiometric parameter for the tree vitality The processing strategy comprises three steps First, we segment a wide range of scale levels of a pre-processed version of the DSM In the second step, we select the best hypothesis for a crown from the overlapping segments of all levels based on the tree model The selection is achieved with the help of fuzzy functions for the tree model parameters Finally, the crown boundary is refined using active contour models (snakes) The approach was tested with four data sets from different sensors and exhibiting different resolutions The results are very promising and prove the feasibility of the new approach for automatic tree extraction from remote sensing data

77 citations


Journal ArticleDOI
TL;DR: Two different approaches of decision tree search algorithms are proposed: bottom-up and top-down and four different measures for selecting the most appropriate set of inputs at every branching node (or decision node) of the tree.

74 citations


Proceedings ArticleDOI
11 Jun 2007
TL;DR: It follows from the results that this bound on the saving in communication is tight almost always, and this approach gives access to several powerful tools from this area such as normed spaces duality and Grothendiek's inequality.
Abstract: We introduce a new method to derive lower bounds on randomized and quantum communication complexity. Our method is based on factorization norms, a notion from Banach Space theory. This approach gives us access toseveral powerful tools from this area such as normed spaces duality and Grothendiek's inequality. This extends the arsenal of methods for deriving lower bounds in communication complexity. As we show,our method subsumes most of the previously known general approaches to lower bounds on communication complexity. Moreover, we extend all (but one) of these lower bounds to the realm of quantum communication complexity with entanglement. Our results also shed some light on the question how much communication can be saved by using entanglement.It is known that entanglement can save one of every two qubits, and examples for which this is tight are also known. It follows from our results that this bound on the saving in communication is tight almost always.

57 citations


Journal ArticleDOI
TL;DR: An IVDT for categorical input attributes has been developed and experimented on 20 subjects to test three hypotheses regarding its potential advantages, and the experimental results suggested that the IVDT process can improve the effectiveness of modeling in terms of producing trees with relatively high classification accuracies and small sizes.
Abstract: The loosely coupled relationships between visualization and analytical data mining (DM) techniques represent the majority of the current state of art in visual data mining; DM modeling is typically an automatic process with very limited forms of guidance from users. A conceptual model of the visualization support to DM modeling process and a novel interactive visual decision tree (IVDT) classification process have been proposed in this paper, with the aim of exploring humans' pattern recognition ability and domain knowledge to facilitate the knowledge discovery process. An IVDT for categorical input attributes has been developed and experimented on 20 subjects to test three hypotheses regarding its potential advantages. The experimental results suggested that, compared to the automatic modeling process as typically applied in current decision tree modeling tools, IVDT process can improve the effectiveness of modeling in terms of producing trees with relatively high classification accuracies and small sizes, enhance users' understanding of the algorithm, and give them greater satisfaction with the task.

32 citations


Journal ArticleDOI
TL;DR: This work proposes a new class of semiparametric regression models, termed partially linear tree‐based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models and has broad implications for applying the tree methodology to genetic epidemiology research.
Abstract: The success of genetic dissection of complex diseases may greatly benefit from judicious exploration of joint gene effects, which, in turn, critically depends on the power of statistical tools. Standard regression models are convenient for assessing main effects and low-order gene–gene interactions but not for exploring complex higher-order interactions. Tree-based methodology is an attractive alternative for disentangling possible interactions, but it has difficulty in modeling additive main effects. This work proposes a new class of semiparametric regression models, termed partially linear tree-based regression (PLTR) models, which exhibit the advantages of both generalized linear regression and tree models. A PLTR model quantifies joint effects of genes and other risk factors by a combination of linear main effects and a non-parametric tree -structure. We propose an iterative algorithm to fit the PLTR model, and a unified resampling approach for identifying and testing the significance of the optimal “pruned” tree nested within the tree resultant from the fitting algorithm. Simulation studies showed that the resampling procedure maintained the correct type I error rate. We applied the PLTR model to assess the association between biliary stone risk and 53 single nucleotide polymorphisms (SNPs) in the inflammation pathway in a population-based case-control study. The analysis yielded an interesting parsimonious summary of the joint effect of all SNPs. The proposed model is also useful for exploring gene–environment interactions and has broad implications for applying the tree methodology to genetic epidemiology research. Genet. Epidemiol. © 2007 Wiley-Liss, Inc.

Journal ArticleDOI
TL;DR: In this paper, Korman et al. proposed a distributed dynamic labeling scheme for dynamic trees, which is based on extending an existing static tree labeling scheme to the dynamic setting, where the tradeoff is designed to minimize the label size, sometimes at the expense of communication.
Abstract: Let F be a function on pairs of vertices. An F-labeling scheme is composed of a marker algorithm for labeling the vertices of a graph with short labels, coupled with a decoder algorithm allowing one to compute F(u, v) for any two vertices u and v directly from their labels. As applications for labeling schemes concern mainly large and dynamically changing networks, it is of interest to study distributed dynamic labeling schemes. This paper investigates labeling schemes for dynamic trees. We consider two dynamic tree models, namely, the leaf-dynamic tree model in which at each step a leaf can be added to or removed from the tree and the leaf-increasing tree model in which the only topological event that may occur is that a leaf joins the tree. A general method for constructing labeling schemes for dynamic trees (under the above mentioned dynamic tree models) was previously developed in Korman et al. (Theory Comput Syst 37:49–75, 2004). This method is based on extending an existing static tree labeling scheme to the dynamic setting. This approach fits many natural functions on trees, such as distance, separation level, ancestry relation, routing (in both the adversary and the designer port models), nearest common ancestor etc.. Their resulting dynamic schemes incur overheads (over the static scheme) on the label size and on the communication complexity. In particular, all their schemes yield a multiplicative overhead factor of Ω(log n) on the label sizes of the static schemes. Following (Korman et al., Theory Comput Syst 37:49–75, 2004), we develop a different general method for extending static labeling schemes to the dynamic tree settings. Our method fits the same class of tree functions. In contrast to the above paper, our trade-off is designed to minimize the label size, sometimes at the expense of communication. Informally, for any function k(n) and any static F-labeling scheme on trees, we present an F-labeling scheme on dynamic trees incurring multiplicative overhead factors (over the static scheme) of $$O(\log_{k(n)} n)$$ on the label size and $$O(k(n)\log_{k(n)} n)$$ on the amortized message complexity. In particular, by setting $$k(n) = n^{\epsilon}$$ for any $$0 < \epsilon < 1$$ , we obtain dynamic labeling schemes with asymptotically optimal label sizes and sublinear amortized message complexity for the ancestry relation, the id-based and label-based nearest common ancestor relation and the routing function.

Journal ArticleDOI
TL;DR: This paper first generates an n-ary context tree by constructing a complete tree up to a predefined depth, and then prune out nodes that do not provide compression improvements, and outperforms existing methods for a large set of different color map images.
Abstract: Significant lossless compression results of color map images have been obtained by dividing the color maps into layers and by compressing the binary layers separately using an optimized context tree model that exploits interlayer dependencies. Even though the use of a binary alphabet simplifies the context tree construction and exploits spatial dependencies efficiently, it is expected that an equivalent or better result would be obtained by operating directly on the color image without layer separation. In this paper, we extend the previous context-tree-based method to operate on color values instead of binary layers. We first generate an n-ary context tree by constructing a complete tree up to a predefined depth, and then prune out nodes that do not provide compression improvements. Experiments show that the proposed method outperforms existing methods for a large set of different color map images

Journal ArticleDOI
TL;DR: There is considerable interest in agent-based modeling as a tool to understand better the dynamics of complex systems, and a large body of work, particularly in land-change science, has focused on households as the primary agent of study.
Abstract: There is considerable interest in agent-based modeling as a tool to understand better the dynamics of complex systems. Particular attention has been focused by the land-change science community, but there has also been a good deal of effort in fields including epidemiology (Teweldemedhin et al, 2004), finance (LeBaron, 2000), computational sociology (Macy and Willer, 2002), ecology (Grimm et al, 2005), and computational economics (Tesfatsion, 2002). There are several notable prior syntheses and collections of agent-based modeling work. Gimblett (2002) and Janssen (2003) edited two early collections of agent-based model (ABM) research centered on complex human ^ environment sytems. Parker et al (2003) summarized applications of ABMs to land-use and landcover change, including how the field transitioned from abstract `toy' models to those more directly tied to real-world applications. Brown and Xie (2006) organized an issue of the International Journal of Geographic Information Science that included a number of agent-based applications with a particular emphasis on the spatial dynamics within the models. And, as evidence of the evolution in the application of agent-based approaches from abstract systems to real-world ones, Janssen and Ostrom (2006) collected a series of works of ABMs in the journal Ecology and Society that were directly supported by different types of empirical data. There are at least two fundamental reasons why ABMs are appealing tools for studying complex systems. First, ABMs explicitly incorporate agent interactions and the properties that emerge at higher levels of observation from these interactions. The importance of these agent interactions varies from system to system, but in many cases these relationships provide key insight into the behavior of complex systems. Second, ABMs enable modelers to represent agents with heterogeneous properties. Instead of every cell in an urban growth model being governed by the same land-change dynamics, the fitting process of an ABM can enable diverse spatial dynamics to be discovered through modeling. For example, landowners in south-central Indiana have varying responses to the same land-use decision-making context (Evans and Kelley, 2004). Recent innovations in agent-based modeling have produced yet more sophisticated representations of complex systems. A large body of work, particularly in land-change science, has focused on households as the primary agent of study. Now we see more diverse types of agents being portrayed, such as individuals (human and otherwise), villages, viruses, or voters. As we learn more about the kinds of modeling possible with agent-based approaches, we will likely incorporate a greater variety of agents in our models. And with this evolution will hopefully come a greater understanding not only of village-to-village interactions or landowner-to-landowner interactions, but also of interactions that go up and down across scales and representational levels (eg landowner to village, village to landowner). While there has been great progress in the agent-based modeling community (see, for example, the manuscripts here and those in the special issues and edited collections noted above), there remain a number of key challenges that point to reasonable next steps in the research enterprise. Researchers have made greater efforts to validate ABMs than was the case during the inception of ABM applications (Janssen and Ostrom, 2006). However, we should continue to consider the types of validation being Guest editorial Environment and Planning B: Planning and Design 2007, volume 34, pages 196 ^ 199

Journal ArticleDOI
TL;DR: A model of configuration complexity is developed that represents systems as a set of nested containers with configuration controls and derives various metrics that indicate configuration complexity, including execution complexity, parameter complexity, and memory complexity.
Abstract: The complexity of configuring computing systems is a major impediment to the adoption of new information technology (IT) products and greatly increases the cost of IT services. This paper develops a model of configuration complexity and demonstrates its value for a change management system. The model represents systems as a set of nested containers with configuration controls. From this representation, we derive various metrics that indicate configuration complexity, including execution complexity, parameter complexity, and memory complexity. We apply this model to a J2EE-based enterprise application and its associated middleware stack to assess the complexity of the manual configuration process for this application. We then show how an automated change management system can greatly reduce configuration complexity.

Patent
29 Jan 2007
TL;DR: In this paper, the likelihood of a selected performance condition occurring in a subject set including based on source data automatically collected from a sample group of the sets, systematic analysis of this data to form a decision tree model revealing prescribed values for characteristic input parameters that are determined to best relate to the condition, and automated comparison of the respective parameter values of the subject set to these prescribed values in order to screen each subject set for the likelihood for the condition occurring within a specified timeframe, which screening can be repeated for different conditions and timeframes using different decision tree models.
Abstract: An exemplary method and system for evaluating media-playing sets evaluates the likelihood of a selected performance condition occurring in a subject set including based on source data automatically collected from a sample group of the sets, systematic analysis of this data to form a decision tree model revealing prescribed values for characteristic input parameters that are determined to best relate to the condition, and automated comparison of the respective parameter values of the subject set to these prescribed values in order to screen each subject set for the likelihood of the condition occurring within a specified timeframe, which screening can be repeated for different conditions and timeframes using different decision tree models.

Journal ArticleDOI
TL;DR: This paper shows a corresponding upper bound for deterministic information complexity and improves known lower bounds for the public coin Las Vegas communication complexity by a constant factor.

Book ChapterDOI
09 Jul 2007
TL;DR: It follows from the existential result that any function that is complete for the class of functions with polylogarithmic nondeterministic k-party communication complexity does not have polylogARithmic deterministic complexity.
Abstract: We solve some fundamental problems in the number-onforehead (NOF) k-party communication model. We show that there exists a function which has at most logarithmic communication complexity for randomized protocols with a one-sided error probability of 1/3 but which has linear communication complexity for deterministic protocols. The result is true for k = nO(1) players, where n is the number of bits on each players' forehead. This separates the analogues of RP and P in the NOF communication model. We also show that there exists a function which has constant randomized complexity for public coin protocols but at least logarithmic complexity for private coin protocols. No larger gap between private and public coin protocols is possible. Our lower bounds are existential and we do not know of any explicit function which allows such separations. However, for the 3-player case we exhibit an explicit function which has Ω(log log n) randomized complexity for private coins but only constant complexity for public coins. It follows from our existential result that any function that is complete for the class of functions with polylogarithmic nondeterministic k-party communication complexity does not have polylogarithmic deterministic complexity. We show that the set intersection function, which is complete in the number-in-hand model, is not complete in the NOF model under cylindrical reductions.

Proceedings ArticleDOI
02 Nov 2007
TL;DR: The degree of dependency of decision attribute on condition attribute, based on rough set theory, is used as a heuristic for selecting the attribute that will best separate the samples into individual classes in a decision tree.
Abstract: One of the keys to constructing decision tree model is to choose standard for testing attribute, for the criteria of selecting test attributes influences the classification accuracy of the tree. There exists diversity choosing standards for testing attribute based on entropy, Bayesian, and so on. In this paper, the degree of dependency of decision attribute on condition attribute, based on rough set theory, is used as a heuristic for selecting the attribute that will best separate the samples into individual classes. The results of example and experiments show that compared with the entropy-based approach, our approach is a better way to select nodes for constructing decision tree.

Journal IssueDOI
TL;DR: This work presents a data-driven approach that synthesizes tree animations from a set of pre-computed motion data, and introduces a simple yet effective sampling scheme to generate a rich and reusable motion database for each tree model.
Abstract: We present a data-driven approach that synthesizes tree animations from a set of pre-computed motion data. Our approach improves previous motion synthesis algorithms for character animation in several aspects. We first introduce a simple yet effective sampling scheme to generate a rich and reusable motion database for each tree model. We also propose a novel technique to generate a fine set of transitions that are uniformly distributed in the motion database. The transition lengths are adaptively determined according to the similarity of the transiting frame pairs. In the runtime, we employ a greedy searching algorithm to synthesize smooth tree animations under an adjustable wind condition. Experimental results show that our approach achieves comparable quality to physically based methods, while in orders of magnitude faster performance. Copyright © 2007 John Wiley & Sons, Ltd.

Journal IssueDOI
TL;DR: This article proposes an IDS model based on a general and enhanced flexible neural tree (FNT) model that allows input variables selection, overlayer connections, and different activation functions for the various nodes involved.
Abstract: An intrusion is defined as a violation of the security policy of the system, and, hence, intrusion detection mainly refers to the mechanisms that are developed to detect violations of system security policy. Current intrusion detection systems (IDS) examine all data features to detect intrusion or misuse patterns. Some of the features may be redundant or contribute little (if anything) to the detection process. The purpose of this study is to identify important input features in building an IDS that is computationally efficient and effective. This article proposes an IDS model based on a general and enhanced flexible neural tree (FNT). Based on the predefined instruction/operator sets, a flexible neural tree model can be created and evolved. This framework allows input variables selection, overlayer connections, and different activation functions for the various nodes involved. The FNT structure is developed using an evolutionary algorithm, and the parameters are optimized by a particle swarm optimization algorithm. Empirical results indicate that the proposed method is efficient. © 2007 Wiley Periodicals, Inc. Int J Int Syst 22: 337–352, 2007.

01 Jan 2007
TL;DR: A new decision tree induction technique in which uncertainty measure is used for best attribute selection is developed, based on the study of priority based packages of SDFs (Sequence Derived Features).
Abstract: Summary To overcome the problem of exponentially increasing protein data, drug discoverers need efficient machine learning techniques to predict the functions of proteins which are responsible for various diseases in human body. The existing decision tree induction methodology C4.5 uses the entropy calculation for best attribute selection. The proposed method develops a new decision tree induction technique in which uncertainty measure is used for best attribute selection. This is based on the study of priority based packages of SDFs (Sequence Derived Features). The present research work results the creation of better decision tree in terms of depth than the existing C4.5 technique. The tree with greater depth ensures more number of tests before functional class assignment and thus results in more accurate predictions than the existing prediction technique. For the same test data, the percentage accuracy of the new HPF (Human Protein Function) predictor is 72% and that of the existing prediction technique is 44%.

Proceedings ArticleDOI
01 Oct 2007
TL;DR: An improved motion compensation decoding complexity model is proposed and its application to H.264/AVC decoding complexity reduction is examined in this work.
Abstract: An improved motion compensation decoding complexity model is proposed and its application to H.264/AVC decoding complexity reduction is examined in this work. This complexity model considers a rich set of inter prediction modes of H.264 as well as the relationship between motion vectors and frame sizes, which turn out to be highly related to the cache management efficiency. An H.264 encoder equipped with the complexity model can estimate the decoding complexity and choose the best inter prediction mode to meet the decoding complexity constraint of the target receiver platform. The performance of the proposed complexity model and its application to video decoding complexity reduction are demonstrated experimentally.

Journal ArticleDOI
TL;DR: The resource tree model and a new separation logic that extends the Bunched Implications logic with a modality for locations are defined and it is shown how the model and its associated language can be used to manage heap structures and also permission accounting.
Abstract: In this article, we propose a new data structure, called resource tree, that is a node-labelled tree in which nodes contain resources which belong to a partial monoid. We define the resource tree model and a new separation logic (BI-Loc) that extends the Bunched Implications logic (BI) with a modality for locations. In addition, we consider quantifications on locations and paths and then we study decidability by model-checking in these models and logics. Moreover, we define a language to deal with resource trees and also an assertion logic derived from BI-Loc. Then soundness and completeness issues are studied, and we show how the model and its associated language can be used to manage heap structures and also permission accounting.

01 Jan 2007
TL;DR: The CHAID algorithm is applied to derive a decision tree for the car allocation decisions in automobile deficient households using a large activity diary data set recently collected in the Netherlands and shows a satisfactory improvement in goodness of fit of the decision tree model compared to the null model.
Abstract: Computational process modeling has been introduced as an alternative approach to utility-maximizing framework to deal with the complexity of activity-based models of travel demand. ALBATROSS, a rule-based system, used data mining algorithms to derive choice rules underlying activity-travel patterns. In the context of a project that attempts to better include household as opposed to individual decision making into the original model, this paper describes the results for the car allocation decisions. The CHAID algorithm is applied to derive a decision tree for the car allocation decisions in automobile deficient households using a large activity diary data set recently collected in the Netherlands. The results show a satisfactory improvement in goodness of fit of the decision tree model compared to the null model. The probability of the male getting the car is considerably higher than the female getting the car in many condition settings. In only 16% of the condition settings, the female has the highest probability of getting the car. Accessibility of the work location by car relative to slow mode appears to be the most influential factor when both male and female work. For the covering abstract see ITRD E137145.

Journal ArticleDOI
TL;DR: A new model of classifier (which is called the complete decision tree) is proposed and compared with other recognition algorithms based on constructing decision trees.
Abstract: Application of decision trees in problems of classification by precedents is considered. A new model of classifier (which is called the complete decision tree) is proposed and compared with other recognition algorithms based on constructing decision trees.

Proceedings ArticleDOI
21 Nov 2007
TL;DR: A simple method for language identification that is based on adaptive resonance learning (ART) neural network is applied and the experimented result shows that the decision tree model achieved highest accuracy than ARTMAP model.
Abstract: Automatic language identification (LID) is a topic of great significance in areas of intelligent and security, where the language identities of any related materials need to be identified before any information can be processed. When the recognition elements of any content is dynamic and obtained directly from written text, the language associated with each grammar item has to be identified using that text. Many methods have been proposed in the literature are focusing on Roman and Asian languages. This paper describes text-based language identification approaches on Arabic script. Two different approaches have been compared. The decision trees method commonly used in many application domain is firstly reviewed. We also applied a simple method for language identification that is based on adaptive resonance learning (ART) neural network. The experimented result shows that the decision tree model achieved highest accuracy than ARTMAP model. However, decision tree model may not reliable if the language used extends to others Arabic script compared to ARTMAP model. It is assumed that hybrid of both models will perform better and merit for further development.

Journal Article
TL;DR: This paper shows the application of decision tree in production by analyzing and comparing a variety of typical classifiers to provide a basis for selecting or improving the algorithms in data mining.
Abstract: Decision tree is an important method in induction learning as well as in data mining,which can be used to form classification and predictive model.Introduces decision tree and points out its key techniques: the choice of testing feature and tree pruning.It summarizes the main features of every algorithm by analyzing and comparing a variety of typical classifiers to provide a basis for selecting or improving the algorithms in data mining.Finally,through an instance,this paper shows the application of decision tree in production.

Journal Article
TL;DR: Experimental results show that with respect to the comprehensibility and the generalization capability, either SSID or MCID is significantly superior to the frequently used See5 system (the improved version of C4.5).
Abstract: Since inductive bias exists during the process of selection of expanded attributes, attributes with more values are usually preferred to be selected. It consequently results in a decision tree with large scale and with poor generalization capability. Therefore it is necessary to simplify the decision tree including pre-pruning and post-pruning. This paper focuses on the pre-pruning. A new strategy of pre-pruning is given, that is, at the process of tree growth, two branches (or more) from the same node are merged into one branch and then the tree growth process continues. This paper investigates the impact of merging branches on decision tree induction. The main concerns are whether the comprehensibility, the size and the generalization accuracy of a decision tree can be improved if an appropriate merging strategy is selected and applied. Based on information gain, this paper analyzes the complexity of a decision tree before and after merging branches, and designs two algorithms of merging branches, SSID (based on the proportion of positive samples) and MCID (based on the most gain compensation). Experimental results show that with respect to the comprehensibility and the generalization capability, either SSID or MCID is significantly superior to the frequently used See5 system (the improved version of C4.5).

Book ChapterDOI
01 Jan 2007
TL;DR: Results about Bubblesort, Heapsort, Shellsort, Dobosiewiczsort, Shakersort, and sorting with stacks and queues in sequential or parallel mode using Kolmogorov complexity are surveyed.
Abstract: Recently, many results on the computational complexity of sorting algorithms were obtained using Kolmogorov complexity (the incompressibility method). Especially, the usually hard average-case analysis is ammenable to this method. Here we survey such results about Bubblesort, Heapsort, Shellsort, Dobosiewiczsort, Shakersort, and sorting with stacks and queues in sequential or parallel mode. Especially in the case of Shellsort the uses of Kolmogorov complexity surprisingly easily resolved problems that had stayed open for a long time despite strenuous attacks.