scispace - formally typeset
Search or ask a question

Showing papers in "arXiv: Artificial Intelligence in 1994"


Posted Content
TL;DR: It is demonstrated that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.
Abstract: Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k > 2 values (i.e., k ``classes''). The definition is acquired by studying collections of training examples of the form [x_i, f (x_i)]. Existing approaches to multiclass learning problems include direct application of multiclass algorithms such as the decision-tree algorithms C4.5 and CART, application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and application of binary concept learning algorithms with distributed output representations. This paper compares these three approaches to a new technique in which error-correcting codes are employed as a distributed output representation. We show that these output representations improve the generalization performance of both C4.5 and backpropagation on a wide range of multiclass learning tasks. We also demonstrate that this approach is robust with respect to changes in the size of the training sample, the assignment of distributed representations to particular classes, and the application of overfitting avoidance techniques such as decision-tree pruning. Finally, we show that---like the other methods---the error-correcting code technique can provide reliable class probability estimates. Taken together, these results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.

2,455 citations


Posted Content
TL;DR: Teleo-reactive (T-R) programs whose execution entails the construction of circuitry for the continuous computation of the parameters and conditions on which agent action is based support parameter binding and recursion.
Abstract: A formalism is presented for computing and organizing actions for autonomous agents in dynamic environments. We introduce the notion of teleo-reactive (T-R) programs whose execution entails the construction of circuitry for the continuous computation of the parameters and conditions on which agent action is based. In addition to continuous feedback, T-R programs support parameter binding and recursion. A primary difference between T-R programs and many other circuit-based systems is that the circuitry of T-R programs is more compact; it is constructed at run time and thus does not have to anticipate all the contingencies that might arise over all possible runs. In addition, T-R programs are intuitive and easy to write and are written in a form that is compatible with automatic planning and learning methods. We briefly describe some experimental applications of T-R programs in the control of simulated and actual mobile robots.

349 citations


Posted Content
TL;DR: The authors investigated the relationship between the size of a decision tree consistent with some training data and the accuracy of the tree on test data and found that smaller decision trees are on average less accurate than the average accuracy of slightly larger trees.
Abstract: We report on a series of experiments in which all decision trees consistent with the training data are constructed. These experiments were run to gain an understanding of the properties of the set of consistent decision trees and the factors that affect the accuracy of individual trees. In particular, we investigated the relationship between the size of a decision tree consistent with some training data and the accuracy of the tree on test data. The experiments were performed on a massively parallel Maspar computer. The results of the experiments on several artificial and two real world problems indicate that, for many of the problems investigated, smaller consistent decision trees are on average less accurate than the average accuracy of slightly larger trees.

82 citations


Posted Content
TL;DR: This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree.
Abstract: This article describes a new system for induction of oblique decision trees. This system, OC1, combines deterministic hill-climbing with two forms of randomization to find a good oblique split (in the form of a hyperplane) at each node of a decision tree. Oblique decision tree methods are tuned especially for domains in which the attributes are numeric, although they can be adapted to symbolic or mixed symbolic/numeric attributes. We present extensive empirical studies, using both real and artificial data, that analyze OC1's ability to construct oblique trees that are smaller and more accurate than their axis-parallel counterparts. We also examine the benefits of randomization for the construction of oblique decision trees.

79 citations


Posted Content
TL;DR: In this article, the authors present a rigorous comparative analysis of partial-order and total-order planning by focusing on two specific planners that can be directly compared, and they show that there are some subtle assumptions that underly the wide-spread intuitions regarding the supposed efficiency of partial order planning.
Abstract: For many years, the intuitions underlying partial-order planning were largely taken for granted. Only in the past few years has there been renewed interest in the fundamental principles underlying this paradigm. In this paper, we present a rigorous comparative analysis of partial-order and total-order planning by focusing on two specific planners that can be directly compared. We show that there are some subtle assumptions that underly the wide-spread intuitions regarding the supposed efficiency of partial-order planning. For instance, the superiority of partial-order planning can depend critically upon the search strategy and the structure of the search space. Understanding the underlying assumptions is crucial for constructing efficient planners.

78 citations


Posted Content
TL;DR: This paper describes how to modify GSAT so that it can be applied to non-clausal formulas by using a particular ``score'' function which gives the number of clauses of the CNF conversion of a formula which are false under a given truth assignment.
Abstract: In this paper we describe how to modify GSAT so that it can be applied to non-clausal formulas. The idea is to use a particular ``score'' function which gives the number of clauses of the CNF conversion of a formula which are false under a given truth assignment. Its value is computed in linear time, without constructing the CNF conversion itself. The proposed methodology applies to most of the variants of GSAT proposed so far.

27 citations


Posted Content
TL;DR: SUBDUE as discussed by the authors is a substructure discovery system based on the minimum description length principle, which uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints.
Abstract: The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.

12 citations


Posted Content
TL;DR: The main original contributions here are the decompositiontechniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms.
Abstract: This paper is a multidisciplinary review of empirical, statistical learning from a graphical model perspective. Well-known examples of graphical models include Bayesian networks, directed graphs representing a Markov chain, and undirected networks representing a Markov field. These graphical models are extended to model data analysis and empirical learning using the notation of plates. Graphical operations for simplifying and manipulating a problem are provided including decomposition, differentiation, and the manipulation of probability models from the exponential family. Two standard algorithm schemas for learning are reviewed in a graphical framework: Gibbs sampling and the expectation maximization algorithm. Using these operations and schemas, some popular algorithms can be synthesized from their graphical specification. This includes versions of linear regression, techniques for feed-forward networks, and learning Gaussian and discrete Bayesian networks from data. The paper concludes by sketching some implications for data analysis and summarizing how some popular algorithms fall within the framework presented. The main original contributions here are the decomposition techniques and the demonstration that graphical models provide a framework for understanding and developing complex learning algorithms.

6 citations


Posted Content
TL;DR: In this article, the authors define the degree of belief to be the asymptotic value of this fraction as N grows large and show that when the vocabulary underlying Phi and KB uses constants and unary predicates only, they can naturally associate an entropy with each world.
Abstract: Given a knowledge base KB containing first-order and statistical facts, we consider a principled method, called the random-worlds method, for computing a degree of belief that some formula Phi holds given KB. If we are reasoning about a world or system consisting of N individuals, then we can consider all possible worlds, or first-order models, with domain {1,...,N} that satisfy KB, and compute the fraction of them in which Phi is true. We define the degree of belief to be the asymptotic value of this fraction as N grows large. We show that when the vocabulary underlying Phi and KB uses constants and unary predicates only, we can naturally associate an entropy with each world. As N grows larger, there are many more worlds with higher entropy. Therefore, we can use a maximum-entropy computation to compute the degree of belief. This result is in a similar spirit to previous work in physics and artificial intelligence, but is far more general. Of equal interest to the result itself are the limitations on its scope. Most importantly, the restriction to unary predicates seems necessary. Although the random-worlds method makes sense in general, the connection to maximum entropy seems to disappear in the non-unary case. These observations suggest unexpected limitations to the applicability of maximum-entropy methods.

2 citations


Posted Content
TL;DR: Wrap-Up as mentioned in this paper is a trainable information extraction component that makes intersentential inferences and identifies logical relations among information extracted from the text, and it is fully trainable, and not only automatically decides what classifiers are needed, but even derives the feature set for each classifier automatically.
Abstract: The vast amounts of on-line text now available have led to renewed interest in information extraction (IE) systems that analyze unrestricted text, producing a structured representation of selected information from the text. This paper presents a novel approach that uses machine learning to acquire knowledge for some of the higher level IE processing. Wrap-Up is a trainable IE discourse component that makes intersentential inferences and identifies logical relations among information extracted from the text. Previous corpus-based approaches were limited to lower level processing such as part-of-speech tagging, lexical disambiguation, and dictionary construction. Wrap-Up is fully trainable, and not only automatically decides what classifiers are needed, but even derives the feature set for each classifier automatically. Performance equals that of a partially trainable discourse module requiring manual customization for each domain.

1 citations


Posted Content
TL;DR: The authors presented a general-purpose Symbolic Pattern Associator (SPA) based upon the decision-tree learning algorithm ID3 and conducted extensive head-to-head comparisons on the generalization ability between ANN models and the SPA under different representations.
Abstract: Learning the past tense of English verbs - a seemingly minor aspect of language acquisition - has generated heated debates since 1986, and has become a landmark task for testing the adequacy of cognitive modeling. Several artificial neural networks (ANNs) have been implemented, and a challenge for better symbolic models has been posed. In this paper, we present a general-purpose Symbolic Pattern Associator (SPA) based upon the decision-tree learning algorithm ID3. We conduct extensive head-to-head comparisons on the generalization ability between ANN models and the SPA under different representations. We conclude that the SPA generalizes the past tense of unseen verbs better than ANN models by a wide margin, and we offer insights as to why this should be the case. We also discuss a new default strategy for decision-tree learning algorithms.

Posted Content
TL;DR: In this article, the correctness of the subsumption algorithm used in CLASSIC, a description logic-based knowledge representation system that is being used in practical applications, has been investigated and a variant semantics for descriptions with respect to which the current implementation is complete, and which can be independently motivated.
Abstract: This paper analyzes the correctness of the subsumption algorithm used in CLASSIC, a description logic-based knowledge representation system that is being used in practical applications. In order to deal efficiently with individuals in CLASSIC descriptions, the developers have had to use an algorithm that is incomplete with respect to the standard, model-theoretic semantics for description logics. We provide a variant semantics for descriptions with respect to which the current implementation is complete, and which can be independently motivated. The soundness and completeness of the polynomial-time subsumption algorithm is established using description graphs, which are an abstracted version of the implementation structures used in CLASSIC, and are of independent interest.

Posted Content
TL;DR: The approach described here, called PTR, uses probabilities associated with domain theory elements to numerically track the "flow" of proof through the theory to measure the precise role of a clause or literal in allowing or preventing a (desired or undesired) derivation for a given example.
Abstract: The theory revision problem is the problem of how best to go about revising a deficient domain theory using information contained in examples that expose inaccuracies. In this paper we present our approach to the theory revision problem for propositional domain theories. The approach described here, called PTR, uses probabilities associated with domain theory elements to numerically track the ``flow'' of proof through the theory. This allows us to measure the precise role of a clause or literal in allowing or preventing a (desired or undesired) derivation for a given example. This information is used to efficiently locate and repair flawed elements of the theory. PTR is proved to converge to a theory which correctly classifies all examples, and shown experimentally to be fast and accurate even for deep theories.