scispace - formally typeset
Search or ask a question

Showing papers on "Decision tree model published in 1990"


Journal ArticleDOI
TL;DR: Two new methods that adaptively introduce relevant features while learning a decision tree from examples are presented, showing empirically that these methods outperform a standard decision tree algorithm for learning small random DNF functions when the examples are drawn at random from the uniform distribution.
Abstract: We investigate the problem of learning Boolean functions with a short DNF representation using decision trees as a concept description language. Unfortunately, Boolean concepts with a short description may not have a small decision tree representation when the tests at the nodes are limited to the primitive attributes. This representational shortcoming may be overcome by using Boolean features at the decision nodes. We present two new methods that adaptively introduce relevant features while learning a decision tree from examples. We show empirically that these methods outperform a standard decision tree algorithm for learning small random DNF functions when the examples are drawn at random from the uniform distribution.

409 citations


Journal ArticleDOI
TL;DR: Some criteria for obtaining lower bounds for the formula size of Boolean functions are presented and the boundnΩ(logn) for the function “MINIMUM COVER” is obtained using methods considerably simpler than all previously known.
Abstract: We present some criteria for obtaining lower bounds for the formula size of Boolean functions. In the monotone case we get the boundn Ω(logn) for the function “MINIMUM COVER” using methods considerably simpler than all previously known. In the general case we are only able to prove that the criteria yield an exponential lower bound when applied to almost all functions. Some connections with graph complexity and communication complexity are also given.

148 citations


Proceedings Article
29 Jul 1990
TL;DR: Results suggest that the number of leaves in a decision tree is the important measure to minimize, and can be used as a basis for a methodology for formally proving that one decision tree generation algorithm is better than another.
Abstract: In this paper, we address the issue of evaluating decision trees generated from training examples by a learning algorithm. We give a set of performance measures and show how some of them relate to others. We derive results suggesting that the number of leaves in a decision tree is the important measure to minimize. Minimizing this measure will, in a probabilistic sense, improve performance along the other measures. Notably it is expected to produce trees whose error rates are less likely to exceed some acceptable limit. The motivation for deriving such results is two-fold: 1. To better understand what constitutes a good measure of performance, and 2. To provide guidance when deciding which aspects of a decision tree generation algorithm should be changed in order to improve the quality of the decision trees it generates. The results presented in this paper can be used as a basis for a methodology for formally proving that one decision tree generation algorithm is better than another. This would provide a more satisfactory alternative to the current empirical evaluation method for comparing algorithms.

79 citations



01 Jan 1990
TL;DR: Although the belief function model does not provide a point estimate of the failure rate, it does provide a more honest assessment of the lack of information about the failure of the system, which is slightly more conservative than subjective uncertainty estimates obtained in Spencer, Diegert and Easterling (1985).
Abstract: Problems with large numbers of attributes (variables) are difficult both because of the complexity of the outcome spaces and the need to organize the component information. Graphical Models provide a useful organizational tool in the fields of artificial intelligence and risk analysis. Belief functions, based on upper and lower probabilities, provide a rich collection of models for both the interaction among attributes and information about single attributes. The fusion and propagation algorithm provides an efficient method for computing margins of joint distributions using the structural information of the graphical model. However, the lack of computational tools implementing these techniques has limited their applicability to real-world examples. The BELIEF package is an implementation of the fusion and propagation algorithm for belief functions. It allows specification of component models using PS-sets--an abbreviated notation for structured sets--and it re-organizes the graphical model into a tree model in which the fusion and propagation algorithm is implemented by message passing. In this environment, it is simple to perform sensitivity analyses on the graphical models and to diagnostically trace information (or lack of information) back to its source. A typical fault tree from a Probabilistic Risk Assessment (Spencer, Diegert and Easterling (1985), NUREG CR-2787) provides an example of graphical belief models. Graphical belief functions easily model the fault tree and the failure of both data-available and data-free components. Including dependencies among components of the same type to model common information about failure rates increases the complexity of the problem until exact calculation of the belief of system failure is intractable. In this example, because the system is coherent (by a theorem proved here) beliefs of system failure can be calculated using Monte Carlo integration to break the type dependence. The resulting beliefs and plausibilities of system failure are slightly more conservative than subjective uncertainty estimates obtained in Spencer, Diegert and Easterling (1985). Although the belief function model does not provide a point estimate of the failure rate, it does provide a more honest assessment of the lack of information about the failure of the system.

22 citations


Proceedings ArticleDOI
22 Oct 1990
TL;DR: The authors prove general lower bounds on the length of the random input of parties computing a function f, depending on the number of bits communicated and the deterministic communication complexity of f.
Abstract: A quantitative investigation of the power of randomness in the context of communication complexity is initiated. The authors prove general lower bounds on the length of the random input of parties computing a function f, depending on the number of bits communicated and the deterministic communication complexity of f. Four standard models for communication complexity are considered: the random input of the parties may be shared or local, and the communication may be one-way or two-way. The bounds are shown to be tight for all the models, for all values of the deterministic communication complexity, and for all possible quantities of bits exchanged. It is shown that it is possible to reduce the number of random bits required by any protocol, without increasing the number of bits exchanged (up to a limit depending on the advantage achieved by the protocol). >

15 citations


Journal ArticleDOI
TL;DR: The problem of determining the asymptotic time complexity of the convex matrix searching problem is equivalent to determining the minimum decision tree height.

14 citations


Book ChapterDOI
01 Oct 1990
TL;DR: This paper introduces minimum complexity and presents quantitative evidence for minimum complexity in messy genetic evolution and appears to be a strong correlation with the theory and what is observed in biological genetics.
Abstract: This paper presents a principle of minimum complexity in evolving systems. Minimum complexity is supported by results and observations from genetic algorithm research and information complexity theory. This paper introduces minimum complexity and presents quantitative evidence for minimum complexity in messy genetic evolution. There also appears to be a strong correlation with our theory and what is observed in biological genetics.

9 citations


Proceedings ArticleDOI
P. Hajnal1
08 Jul 1990
TL;DR: Results suggest that there are relations between the decision tree complexity of a Boolean function and its symmetry and consideration is given to the question of what distinguishes graph properties from other highly symmetric Boolean functions, where randomization can help significantly.
Abstract: Results suggest that there are relations between the decision tree complexity of a Boolean function and its symmetry. A central conjecture is that for any monotone graph property the randomized decision tree complexity does not differ from the deterministic one with more than a constant factor. The authors improve on V. King's Omega (n/sup 5/4/) lower bound on the randomized decision tree complexity of monotone graph properties to Omega (n/sup 4/3/). The proof follows A. Yao's (1977) approach and improves it in a different direction from King's. At the heart of the proof is a duality argument combined with a new packing lemma for bipartite graphs. Consideration is also given to the question of what distinguishes graph properties from other highly symmetric Boolean functions, where randomization can help significantly. Open questions concerning this problem are discussed. >

7 citations


Proceedings ArticleDOI
06 Nov 1990
TL;DR: Both theoretical and empirical results are presented concerning W. Van de Velde's decision tree induction algorithm IDL and it is shown that in this domain IDL removes efficiently the tests of irrelevant attributes from the trees.
Abstract: Both theoretical and empirical results are presented concerning W. Van de Velde's (1989, 1990) decision tree induction algorithm IDL. Contrary to a conjecture by Van de Velde, the algorithm does not always produce a topologically minimal tree. This is true both of IDL used as an incremental decision tree induction algorithm and of IDL used as a post-processor for trees generated by TDIDT. Experiments have been made on using IDL in post-processing trees produced by ID3. The test domains are exclusive-OR functions with irrelevant attributes. The results show that in this domain IDL removes efficiently the tests of irrelevant attributes from the trees. The computational complexity analysis of IDL is reviewed. >

5 citations


Journal Article
TL;DR: It is found that the STM is quite adequate as a qualitative knowledge model in fault diagnostic expert systems for chemical processes.
Abstract: The Symptom Tree Model(STM) based on the Fault Tree Model(FTM)is modified to be appropriate in fault diagnosis and to be applied to large scale processes. Knowledge representation hybridizing frames and production rules is presented. It is used to implement knowledge base for fault diagnostic expert system with the STM of qualitative model for chemical processes. In order to perform fault diagnosis by using symptom trees on the real-time basis, a diagnostic strategy with hypothesis and test is presented. In order to apply the presented knowledge representation method and diagnostic strategy to naphtha furnace process, EXFAST(EXpert system for FAult diagnosis using Symptom Tree model)is developed. This developed EXFAST is tested and showed successful results. It is found that the STM is quite adequate as a qualitative knowledge model in fault diagnostic expert systems for chemical processes.

Journal ArticleDOI
TL;DR: Results obtained indicate that the stochastic decision tree approach provides a more useful aid to decision makers than the traditional “best-estimate” approach.



Journal ArticleDOI
TL;DR: This is the first time that in a nontrivial stochastic recursion tree model the optimality of a nondirectional algorithm with respect to the average run time has been proved.