scispace - formally typeset
Search or ask a question

Showing papers in "Machine Learning in 1988"


Journal ArticleDOI
TL;DR: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior – and proves their convergence and optimality for special cases and relation to supervised-learning methods.
Abstract: This article introduces a class of incremental learning procedures specialized for prediction – that is, for using past experience with an incompletely known system to predict its future behavior. Whereas conventional prediction-learning methods assign credit by means of the difference between predicted and actual outcomes, the new methods assign credit by means of the difference between temporally successive predictions. Although such temporal-difference methods have been used in Samuel's checker player, Holland's bucket brigade, and the author's Adaptive Heuristic Critic, they have remained poorly understood. Here we prove their convergence and optimality for special cases and relate them to supervised-learning methods. For most real-world prediction problems, temporal-difference methods require less memory and less peak computation than conventional methods and they produce more accurate predictions. We argue that most problems to which supervised learning is currently applied are really prediction problems of the sort to which temporal-difference methods can be applied to advantage.

4,803 citations


Journal ArticleDOI
TL;DR: There is no a priori reason why machine learning must borrow from nature, but many machine learning systems now borrow heavily from current thinking in cognitive science, and rekindled interest in neural networks and connectionism is evidence of serious mechanistic and philosophical currents running through the field.
Abstract: There is no a priori reason why machine learning must borrow from nature. A field could exist, complete with well-defined algorithms, data structures, and theories of learning, without once referring to organisms, cognitive or genetic structures, and psychological or evolutionary theories. Yet at the end of the day, with the position papers written, the computers plugged in, and the programs debugged, a learning edifice devoid of natural metaphor would lack something. It would ignore the fact that all these creations have become possible only after three billion years of evolution on this planet. It would miss the point that the very ideas of adaptation and learning are concepts invented by the most recent representatives of the species Homo sapiens from the careful observation of themselves and life around them. It would miss the point that natural examples of learning and adaptation are treasure troves of robust procedures and structures. Fortunately, the field of machine learning does rely upon nature's bounty for both inspiration and mechanism. Many machine learning systems now borrow heavily from current thinking in cognitive science, and rekindled interest in neural networks and connectionism is evidence of serious mechanistic and philosophical currents running through the field. Another area where natural example has been tapped is in work on genetic algorithms (GAs) and genetics-based machine learning. Rooted in the early cybernetics movement (Holland, 1962), progress has been made in both theory (Holland, 1975; Holland, Holyoak, Nisbett, & Thagard, 1986) and application (Goldberg, 1989; Grefenstette, 1985, 1987) to the point where genetics-based systems are finding their way into everyday commercial use (Davis & Coombs, 1987; Fourman, 1985).

3,019 citations


Journal ArticleDOI
Dana Angluin1
TL;DR: This work considers the problem of using queries to learn an unknown concept, and several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries.
Abstract: We consider the problem of using queries to learn an unknown concept. Several types of queries are described and studied: membership, equivalence, subset, superset, disjointness, and exhaustiveness queries. Examples are given of efficient learning methods using various subsets of these queries for formal domains, including the regular languages, restricted classes of context-free languages, the pattern languages, and restricted types of prepositional formulas. Some general lower bound techniques are given. Equivalence queries are compared with Valiant's criterion of probably approximately correct identification under random sampling.

1,797 citations


Journal ArticleDOI
TL;DR: This work presents one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions.
Abstract: Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linear-threshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space.

1,669 citations


Journal ArticleDOI
TL;DR: This paper shows that when the teacher may make independent random errors in classifying the example data, the strategy of selecting the most consistent rule for the sample is sufficient, and usually requires a feasibly small number of examples, provided noise affects less than half the examples on average.
Abstract: The basic question addressed in this paper is: how can a learning algorithm cope with incorrect training examples? Specifically, how can algorithms that produce an “approximately correct” identification with “high probability” for reliable data be adapted to handle noisy data? We show that when the teacher may make independent random errors in classifying the example data, the strategy of selecting the most consistent rule for the sample is sufficient, and usually requires a feasibly small number of examples, provided noise affects less than half the examples on average. In this setting we are able to estimate the rate of noise using only the knowledge that the rate is less than one half. The basic ideas extend to other types of random noise as well. We also show that the search problem associated with this strategy is intractable in general. However, for particular classes of rules the target rule may be efficiently identified if we use techniques specific to that class. For an important class of formulas – the k-CNF formulas studied by Valiant – we present a polynomial-time algorithm that identifies concepts in this form when the rate of classification errors is less than one half.

820 citations


Journal ArticleDOI
TL;DR: This paper provides a brief overview of how one might use genetic algorithms as a key element in learning systems.
Abstract: Genetic algorithms represent a class of adaptive search techniques that have been intensively studied in recent years. Much of the interest in genetic algorithms is due to the fact that they provide a set of efficient domain-independent search heuristics which are a significant improvement over traditional “weak methods” without the need for incorporating highly domain-specific knowledge. There is now considerable evidence that genetic algorithms are useful for global function optimization and NP-hard problems. Recently, there has been a good deal of interest in using genetic algorithms for machine learning problems. This paper provides a brief overview of how one might use genetic algorithms as a key element in learning systems.

416 citations


Journal ArticleDOI
TL;DR: In this article, the credit assignment problem that arises when long sequences of rules fire between successive external rewards is addressed, and two distinct approaches to rule learning with genetic algorithms have been previously reported, each approach offering a useful solution to a different level of credit assignment.
Abstract: In rule discovery systems, learning often proceeds by first assessing the quality of the system's current rules and then modifying rules based on that assessment. This paper addresses the credit assignment problem that arises when long sequences of rules fire between successive external rewards. The focus is on the kinds of rule assessment schemes which have been proposed for rule discovery systems that use genetic algorithms as the primary rule modification strategy. Two distinct approaches to rule learning with genetic algorithms have been previously reported, each approach offering a useful solution to a different level of the credit assignment problem. We describe a system, called RUDI, that exploits both approaches. We present analytic and experimental results that support the hypothesis that multiple levels of credit assignment can improve the performance of rule learning systems based on genetic algorithms.

344 citations


Journal ArticleDOI
TL;DR: This work explores in detail the tradeoffs between the amount of effort spent on evaluating each structure and the number of structures evaluated during a given iteration of the genetic algorithm.
Abstract: Genetic algorithms are adaptive search techniques which have been used to learn high-performance knowledge structures in reactive environments that provide information in the form of payoff. In general, payoff can be viewed as a noisy function of the structure being evaluated, and the learning task can be viewed as an optimization problem in a noisy environment. Previous studies have shown that genetic algorithms can perform effectively in the presence of noise. This work explores in detail the tradeoffs between the amount of effort spent on evaluating each structure and the number of structures evaluated during a given iteration of the genetic algorithm. Theoretical analysis shows that, in some cases, more efficient search results from less accurate evaluations. Further evidence is provided by a case study in which genetic algorithms are used to obtain good registrations of digital images.

321 citations



Journal ArticleDOI
TL;DR: Machine learning is a scientific discipline and, like the fields of AI and computer science, has both theoretical and empirical aspects, making it more akin to physics and chemistry than astronomy or sociology.
Abstract: Machine learning is a scientific discipline and, like the fields of AI and computer science, has both theoretical and empirical aspects. Although recent progress has occurred on the theoretical front (see Machine Learning, volume 2, number 4), most learning algorithms are too complex for formal analysis. Thus, the field promises to have a significant empirical component for the foreseeable future. And unlike some empirical sciences, machine learning is fortunate enough to have experimental control over a wide range of factors, making it more akin to physics and chemistry than astronomy or sociology.

106 citations


Journal ArticleDOI
TL;DR: Gofer is an example of a classifier system that builds an internal model of its environment, using rules to represent objects, goals, and relationships, and learning is triggered whenever the model proves to be an inadequate basis for generating behavior in a given situation.
Abstract: Most classifier systems learn a collection of stimulus-response rules, each of which directly acts on the problem-solving environment and accrues strength proportional to the overt reward expected from the behavioral sequences in which the rule participates. gofer is an example of a classifier system that builds an internal model of its environment, using rules to represent objects, goals, and relationships. The model is used to direct behavior, and learning is triggered whenever the model proves to be an inadequate basis for generating behavior in a given situation. This means that overt external rewards are not necessarily the only or the most useful source of feedback for inductive change. gofer is tested in a simple two-dimensional world where it learns to locate food and avoid noxious stimulation.

Journal ArticleDOI
TL;DR: This paper presents several empirical studies of known issues in classifier systems, including the effects of population size, the actual contribution of genetic algorithms, the use of rule chaining in solving higher-order tasks, and issues of task representation and dynamic population convergence.
Abstract: This paper describes two classifier systems that learn These are rule-based systems that use genetic algorithms, which are based on an analogy with natural selection and genetics, as their principal learning mechanism, and an economic model as their principal mechanism for apportioning credit CFS-C is a domain-independent learning system that has been widely tested on serial computers CFS is a parallel implementation of CFS-C that makes full use of the inherent parallelism of classifier systems and genetic algorithms, and that allows the exploration of large-scale tasks that were formerly impractical As with other approaches to learning, classifier systems in their current form work well for moderately-sized tasks but break down for larger tasks In order to shed light on this issue, we present several empirical studies of known issues in classifier systems, including the effects of population size, the actual contribution of genetic algorithms, the use of rule chaining in solving higher-order tasks, and issues of task representation and dynamic population convergence We conclude with a discussion of some major unresolved issues in learning classifier systems and some possible approaches to making them more effective on complex tasks

Journal ArticleDOI
TL;DR: This paper describes precedent analysis, partial explanation of a precedent (or rule) to isolate the new technique(s) it embodies, and rule reanalysis, which involves analyzing old rules in terms of new rules to obtain a more general set.
Abstract: Explanation-based learning depends on having an explanation on which to base generalization. Thus, a system with an incomplete or intractable domain theory cannot use this method to learn from every precedent. However, in such cases the system need not resort to purely empirical generalization methods, because it may already know almost everything required to explain the precedent. Learning by failing to explain is a method that uses current knowledge to prune the well-understood portions of complex precedents (and rules) so that what remains may be conjectured as a new rule. This paper describes precedent analysis, partial explanation of a precedent (or rule) to isolate the new technique(s) it embodies, and rule reanalysis, which involves analyzing old rules in terms of new rules to obtain a more general set. The algorithms PA, PA-RR, and PA-RR-GW implement these ideas in the domains of digital circuit design and simplified gear design.

Journal ArticleDOI
TL;DR: The issues surrounding the integration of programmed and learned knowledge in classifier-system representations, including comprehensibility, ease of expression, explanation, predictability, robustness, redundancy, stability, and the use of analogical representations are explored.
Abstract: Both symbolic and subsymbolic models contribute important insights to our understanding of intelligent systems. Classifier systems are low-level learning systems that are also capable of supporting representations at the symbolic level. In this paper, we explore in detail the issues surrounding the integration of programmed and learned knowledge in classifier-system representations, including comprehensibility, ease of expression, explanation, predictability, robustness, redundancy, stability, and the use of analogical representations. We also examine how these issues speak to the debate between symbolic and subsymbolic paradigms. We discuss several dimensions for examining the tradeoffs between programmed and learned representations, and we propose an optimization model for constructing hybrid systems that combine positive aspects of each paradigm.

Journal ArticleDOI
TL;DR: The general properties that similarity metrics, objective functions, and concept description languages must have to guarantee that a (conceptual) clustering problem is polynomial-time solvable by a simple and widely used clustering technique, the agglomerative-hierarchical algorithm are investigated.
Abstract: Research in cluster analysis has resulted in a large number of algorithms and similarity measurements for clustering scientific data. Machine learning researchers have published a number of methods for conceptual clustering, in which observations are grouped into clusters that have “good” descriptions in some language. In this paper we investigate the general properties that similarity metrics, objective functions, and concept description languages must have to guarantee that a (conceptual) clustering problem is polynomial-time solvable by a simple and widely used clustering technique, the agglomerative-hierarchical algorithm. We show that under fairly general conditions, the agglomerative-hierarchical method may be used to find an optimal solution in polynomial time.

Journal ArticleDOI
TL;DR: The AAAI-87 Conference on Artificial Intelligence as discussed by the authors involved over 6,700 people and required the full accommodations of the Seattle Center in Seattle, Washington (site of a former world's fair) for its four parallel technical sessions and exhibition show; and it still needed the University of Washington campus for their four parallel tutorial sessions.
Abstract: Some of us can remember the first AAAI conference in 1980 a cozy gathering of 400 AI researchers tucked away in one corner of a university campus. There were only two parallel sessions of papers, and they filled one relatively thin proceedings volume. There were no tutorial sessions and no exhibition hall. For better or worse, the field of artificial intelligence has grown considerably in the subsequent seven years. The recent Sixth National Conference on Artificial Intelligence, AAAI-87, involved over 6,700 people and required the full accommodations of the Seattle Center in Seattle, Washington (site of a former world's fair) for its four parallel technical sessions and exhibition show; and it still needed the University of Washington campus for its four parallel tutorial sessions.