Showing papers in "Machine Learning in 1990"
••
TL;DR: In this paper, a method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy, and it is shown that these two notions of learnability are equivalent.
Abstract: This paper addresses the problem of improving the accuracy of an hypothesis output by a learning algorithm in the distribution-free (PAC) learning model. A concept class is learnable (or strongly learnable) if, given access to a source of examples of the unknown concept, the learner with high probability is able to output an hypothesis that is correct on all but an arbitrarily small fraction of the instances. The concept class is weakly learnable if the learner can produce an hypothesis that performs only slightly better than random guessing. In this paper, it is shown that these two notions of learnability are equivalent.
A method is described for converting a weak learning algorithm into one that achieves arbitrarily high accuracy. This construction may have practical applications as a tool for efficiently converting a mediocre learning algorithm into one that performs extremely well. In addition, the construction has some interesting theoretical consequences, including a set of general upper bounds on the complexity of any strong learning algorithm as a function of the allowed error e.
3,678 citations
••
TL;DR: foil is a system that learns Horn clauses from data expressed as relations, based on ideas that have proved effective in attribute-value learning systems, but extends them to a first-order formalism.
Abstract: This paper describes FOIL, a system that learns Horn clauses from data expressed as relations. FOIL is based on ideas that have proved effective in attribute-value learning systems, but extends them to a first-order formalism. This new system has been applied successfully to several tasks taken from the machine learning literature.
1,616 citations
••
TL;DR: These relatively simple, gradient-descent learning procedures work well for small tasks, and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
Abstract: A major goal of research on networks of neuron-like processing units is to discover efficient learning procedures that allow these networks to construct complex internal representations of their environment. The learning procedures must be capable of modifying the connection strengths in such a way that internal units which are not part of the input or output come to represent important features of the task domain. Several interesting gradient-descent procedures have recently been discovered. Each connection computes the derivative, with respect to the connection strength, of a global measure of the error in the performance of the network. The strength is then adjusted in the direction that decreases the error. These relatively simple, gradient-descent learning procedures work well for small tasks and the new challenge is to find ways of improving their convergence rate and their generalization abilities so that they can be applied to larger, more realistic tasks.
514 citations
••
TL;DR: Two new methods that adaptively introduce relevant features while learning a decision tree from examples are presented, showing empirically that these methods outperform a standard decision tree algorithm for learning small random DNF functions when the examples are drawn at random from the uniform distribution.
Abstract: We investigate the problem of learning Boolean functions with a short DNF representation using decision trees as a concept description language. Unfortunately, Boolean concepts with a short description may not have a small decision tree representation when the tests at the nodes are limited to the primitive attributes. This representational shortcoming may be overcome by using Boolean features at the decision nodes. We present two new methods that adaptively introduce relevant features while learning a decision tree from examples. We show empirically that these methods outperform a standard decision tree algorithm for learning small random DNF functions when the examples are drawn at random from the uniform distribution.
409 citations
••
TL;DR: The problem oflearning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical decision rules from a simple flight simulator, and the learning method relies on the notion of competition and employs genetic algorithms to search the space of decision policies.
Abstract: The problem of learning decision rules for sequential tasks is addressed, focusing on the problem of learning tactical decision rules from a simple flight simulator. The learning method relies on the notion of competition and employs genetic algorithms to search the space of decision policies. Several experiments are presented that address issues arising from differences between the simulation model on which learning occurs and the target environment on which the decision rules are ultimately tested.
248 citations
••
TL;DR: There is no polynomial time algorithm using only equivalence queries that exactly identifies deterministic or nondeterministic finite state acceptors, context free grammars, or disjunctive or conjunctive normal form boolean formulas.
Abstract: We consider the problem of exact identification of classes of concepts using only equivalence queries. We define a combinatorial property, approximate fingerprints, of classes of concepts and show that no class with this property can be exactly identified in polynomial time using only equivalence queries. As applications of this general theorem, we show that there is no polynomial time algorithm using only equivalence queries that exactly identifies deterministic or nondeterministic finite state acceptors, context free grammars, or disjunctive or conjunctive normal form boolean formulas.
226 citations
••
TL;DR: This research attempts to integrate both types of learning by combining newly developed heuristics for formulating equations with the previously developed concept learning method embodied in the inductive learning program AQ11, resulting in ABACUS.
Abstract: Most research on inductive learning has been concerned with qualitative learning that induces conceptual, logic-style descriptions from the given facts. In contrast, quantitative learning deals with discovering numerical laws characterizing empirical data. This research attempts to integrate both types of learning by combining newly developed heuristics for formulating equations with the previously developed concept learning method embodied in the inductive learning program AQ11. The resulting system, ABACUS, formulates equations that bind subsets of observed data, and derives explicit, logic-style descriptions stating the applicability conditions for these equations. In addition, several new techniques for quantitative learning are introduced. Units analysis reduces the search space of equations by examining the compatibility of variables' units. Proportionality graph search addresses the problem of identifying relevant variables that should enter equations. Suspension search focusses the search space through heuristic evaluation. The capabilities of ABACUS are demonstrated by several examples from physics and chemistry.
185 citations
••
TL;DR: This chapter suggests methods of overcoming deficiencies through extensions to the way a case is classified by a decision tree.
Abstract: Decision trees are a widely known formalism for expressing classification knowledge and yet their straightforward use can be criticized on several grounds Because results are categorical, they do not convey potential uncertainties in classification Small changes in the attribute values of a case being classified may result in sudden and inappropriate changes to the assigned class Missing or imprecise information may apparently prevent a case from being classified at all This chapter suggests methods of overcoming these deficiencies through extensions to the way a case is classified by a decision tree
171 citations
••
TL;DR: Protos as discussed by the authors is a learning apprentice system for heuristic classification that relegates inductive learning and deductive problem solving to minor roles in support of retaining, indexing, and matching exemplars.
Abstract: Building Protos, a learning apprentice system for heuristic classification, has forced us to scrutinize the usefulness of inductive learning and deductive problem solving. While these inference methods have been widely studied in machine learning, their seductive elegance in artificial domains ( e.g., mathematics) does not carry over to natural domains ( e.g., medicine). This paper briefly describes our rationale in the Protos system for relegating inductive learning and deductive problem solving to minor roles in support of retaining, indexing, and matching exemplars. The problems that arise from “lazy generalization” are described along with their solutions in Protos. Finally, an example of Protos in the domain of clinical audiology is discussed.
120 citations
••
TL;DR: This paper juxtaposes the probability matching paradox of decision theory and the magnitude of reinforcement problem of animal learning theory to show that simple classifier system bidding structures are unable to match the range of behaviors required in the deterministic and probabilistic problems faced by real cognitive systems.
Abstract: This paper juxtaposes the probability matching paradox of decision theory and the magnitude of reinforcement problem of animal learning theory to show that simple classifier system bidding structures are unable to match the range of behaviors required in the deterministic and probabilistic problems faced by real cognitive systems. The inclusion of a variance-sensitive bidding (VSB) mechanism is suggested, analyzed, and simulated to enable good bidding performance over a wide range of nonstationary probabilistic and deterministic environments.
102 citations
••
TL;DR: A new framework for constructing learning algorithms which involve master algorithms which use learning algorithms for intersection-closed concept classes as subroutines and show that these algorithms are optimal or nearly optimal with respect to several different criteria.
Abstract: This paper introduces a new framework for constructing learning algorithms. Our methods involve master algorithms which use learning algorithms for intersection-closed concept classes as subroutines. For example, we give a master algorithm capable of learning any concept class whose members can be expressed as nested differences (for example, c1 – (c2 – (c3 – (c4 – c5)))) of concepts from an intersection-closed class. We show that our algorithms are optimal or nearly optimal with respect to several different criteria. These criteria include: the number of examples needed to produce a good hypothesis with high confidence, the worst case total number of mistakes made, and the expected number of mistakes made in the first t trials.
••
TL;DR: Experiments have shown that the method may produce more significantly reduced concept representations than the traditional approach and that these representations may also perform better in recognizing new concept examples, which calls for further research and new experiments.
Abstract: Most human concepts elude precise definition—they have fluid boundaries and context-dependent meaning We call such concepts flexible, in contrast to crisp concepts, which are well defined and context independent As machine learning research has concentrated primarily on learning crisp concepts, learning flexible concepts emerges as a new challenge to the field and an important research direction
This chapter describes an approach to learning flexible concepts based on a two-tiered concept representation In such a representation, the concept meaning is defined by two components: the base concept representation (BCR), and the inferential concept interpretation (ICI) The BCR (the first tier) is an explicit description of basic concept properties, while the ICI (the second tier) characterizes allowed modifications of the concept meaning and its possible variations in different contexts Thus, the ICI defines concept boundaries implicitly, by the results of matching procedures and inference processes The latter can be deductive, analogical or inductive
In the method described, the initial BCR is a complete and consistent concept description, induced from concept examples by a conventional AQ inductive learning program (AQ15) This description is then simplified by the so-called TRUNC procedure, to maximize a description quality measure The so-obtained BCR is usually much simpler than the initial description, but in a strict, logical sense is incomplete with regard to the training examples The ICI is implemented in the form of a procedure for flexible matching, which determines a degree to which instance matches different candidate concepts and chooses the concept that makes the best match Due to this procedure, training examples that have been “uncovered” during the description-reduction process may still be classified correctly
The method has been implemented in the learning system AQTT-15, and experimentally applied to learning diagnostic rules in a sample of medical domains Experiments have shown that the method may produce more significantly reduced concept representations than the traditional approach and that these representations may also perform better in recognizing new concept examples This surprising and potentially significant result calls for further research and new experiments In particular, the method should be tested on other problems and in different domains Other interesting topics for future work include the development of a “direct” method for learning two-tiered representations, an extension of the form of such representations, acquiring the second tier of descriptions through examples, and the development of techniques for learning hierarchically organized two-tiered representations
We have no sound notions either in logic or physics; substance, quality, action, passion, and existence are not clear notions…
Sir Francis Bacon
Novum Organum, First Book, Chapter 15, 1620
••
TL;DR: An algorithm that generalizes explanation structures and reports empirical results that demonstrate the value of acquiring recursive and iterative concepts is presented.
Abstract: In explanation-based learning, a specific problem's solution is generalized into a form that can be later used to solve conceptually similar problems. Most research in explanation-based learning involves relaxing constraints on the variables in the explanation of a specific example, rather than generalizing the graphical structure of the explanation itself. However, this precludes the acquisition of concepts where an iterative or recursive process is implicitly represented in the explanation by a fixed number of applications. This paper presents an algorithm that generalizes explanation structures and reports empirical results that demonstrate the value of acquiring recursive and iterative concepts. The BAGGER2 algorithm learns recursive and iterative concepts, integrates results from multiple examples, and extracts useful subconcepts during generalization. On problems where learning a recursive rule is not appropriate, the system produces the same result as standard explanation-based methods. Applying the learned recursive rules only requires a minor extension to a PROLOG-like problem solver, namely, the ability to explicitly call a specific rule. Empirical studies demonstrate that generalizing the structure of explanations helps avoid the recently reported negative effects of learning.
••
TL;DR: This chapter addresses the issue of learning by experimentation as an integral component of PRODIGY, a flexible planning system that encodes its domain knowledge as declarative operators and applies the operator refinement method to acquire additional preconditions or postconditions when observed consequences diverge from internal expectations.
Abstract: Autonomous systems require the ability to plan effective courses of action under potentially uncertain or unpredictable contingencies. Planning requires knowledge of the environment that is accurate enough to allow reasoning about actions. If the environment is too complex or very dynamic, goal-driven learning with reactive feedback becomes a necessity. This chapter addresses the issue of learning by experimentation as an integral component of PRODIGY. PRODIGY is a flexible planning system that encodes its domain knowledge as declarative operators and applies the operator refinement method to acquire additional preconditions or postconditions when observed consequences diverge from internal expectations. When multiple explanations for the observed divergence are consistent with the existing domain knowledge, experiments to discriminate among these explanations are generated. The experimentation process isolates the deficient operator and inserts the discriminant condition or unforeseen side effect to avoid similar impasses in future planning. Thus, experimentation is demand-driven and exploits both the internal state of the planner and any external feedback received. A detailed example of integrated experiment formulation is presented as the basis for a systematic approach to extending an incomplete domain theory or correcting a potentially inaccurate one.
••
TL;DR: A subarea of machine learning that is actively exploring the use of genetic algorithms as the key element in the design of robust learning strategies is described, and an example of their use in learning entire task programs is given.
Abstract: This chapter describes a subarea of machine learning that is actively exploring the use of genetic algorithms as the key element in the design of robust learning strategies. After characterizing the kinds of learning problems motivating this approach, a brief overview of genetic algorithms is presented. Three major approaches to using genetic algorithms for machine learning are described, and an example of their use in learning entire task programs is given. Finally, an assessment of the strengths and weaknesses of this approach to machine learning is provided.
••
TL;DR: Experiments show that some data characteristics interact in non-intuitive ways; for example, noisy data may degrade accuracy differently depending on the size of the concept and the choice of learning algorithm appears less important.
Abstract: Concept learning depends on data character. To discover how, some researchers have used theoretical analysis to relate the behavior of idealized learning algorithms to classes of concepts. Others have developed pragmatic measures that relate the behavior of empirical systems such as ID3 and PLS1 to the kinds of concepts encountered in practice. But before learning behavior can be predicted, concepts and data must be characterized. Data characteristics include their number, error, “size,” and so forth. Although potential characteristics are numerous, they are constrained by the way one views concepts. Viewing concepts as functions over instance space leads to geometric characteristics such as concept size (the proportion of positive instances) and concentration (not too many “peaks”). Experiments show that some of these characteristics drastically affect the accuracy of concept learning. Sometimes data characteristics interact in non-intuitive wayss for example, noisy data may degrade accuracy differently depending on the size of the concept. Compared with effects of some data characteristics, the choice of learning algorithm appears less important: performance accuracy is degraded only slightly when the splitting criterion is replaced with random selection. Analyzing such observations suggests directions for concept learning research.
••
TL;DR: This chapter gives a brief account of the recent progress and prospective research directions in the field, attempts to clarify some basic concepts, proposes a multicriteria classification of learning methods, and concludes with a brief description of each chapter.
Abstract: The last few years have produced a remarkable expansion of research in machine learning. The field has gained an unprecedented popularity, several new areas have developed, and some previously established areas have gained new momentum. While symbolic methods, both empirical and knowledge intensive (in particular, inductive concept learning and explanation-based methods), continued to be exceedingly active (see Parts Two and Three of the book, respectively), subsymbolic approaches, especially neural networks, have experienced tremendous growth (Part Five). Unlike past efforts that concentrated on single learning strategies, the new trends have been to integrate different strategies and to develop cognitive learning architectures (Part Four). There has been an increasing interest in experimental comparisons of various methods, and in theoretical analyses of learning algorithms. Researchers have been sharing the same data sets and have applied their techniques to the same problems in order to understand the relative merits of different methods. Theoretical investigations have brought new insights into the complexity of learning processes (Part Six). This chapter gives a brief account of the recent progress and prospective research directions in the field, attempts to clarify some basic concepts, proposes a multicriteria classification of learning methods, and concludes with a brief description of each chapter.
••
TL;DR: An algorithm is given that, for any simple deterministic language L, outputs a grammar G in 2-standard form, such that L = L(G), using membership queries and extended equivalence queries.
Abstract: This paper is concerned with the problem of learning simple deterministic languages. The algorithm described in this paper is based on the theory of model inference given by Shapiro. In our setting, however, nonterminal membership queries, except for the start symbol, are not permitted. Extended equivalence queries are used instead. Nonterminals that are necessary for a correct grammar and their intended models are introduced automatically. We give an algorithm that, for any simple deterministic language L, outputs a grammar G in 2-standard form, such that L e L(G), using membership queries and extended equivalence queries. We also show that the algorithm runs in time polynomial in the length of the longest counterexample and the number of nonterminals in a minimal grammar for L.
••
TL;DR: The paper compares two algorithms, INFER and MALGEN, examining their performance on actual data collected in two Scottish schools and concluding with a critical discussion of the two methods.
Abstract: By its very nature, artificial intelligence is concerned with investigating topics that are ill-defined and ill-understood. This paper describes two approaches to expanding a good but incomplete theory of a domain. The first uses the domain theory as far as possible and fills in specific gaps in the reasoning process, generalizing the suggested missing steps and adding them to the domain theory. The second takes existing operators of the domain theory and applies perturbations to form new plausible operators for the theory. The specific domain to which these techniques have been applied is high-school algebra problems. The domain theory is represented as operators corresponding to algebraic manipulations, and the problem of expanding the domain theory becomes one of discovering new algebraic operators. The general framework used is one of generate and test—generating new operators for the domain and using tests to filter out unreasonable ones. The paper compares two algorithms, INFER and MALGEN, examining their performance on actual data collected in two Scottish schools and concluding with a critical discussion of the two methods.
••
TL;DR: This chapter presents DISCIPLE, a multistrategy, integrated learning system illustrating a theory and a methodology for learning expert knowledge in the context of an imperfect domain theory, which integrates a learning system and an empty expert system using the same knowledge base.
Abstract: This chapter presents DISCIPLE, a multistrategy, integrated learning system illustrating a theory and a methodology for learning expert knowledge in the context of an imperfect domain theory. DISCIPLE integrates a learning system and an empty expert system, both using the same knowledge base. It is initially provided with an imperfect (nonhomogeneous) domain theory and learns problem-solving rules from the problem-solving steps received from its expert user, during interactive problem-solving sessions. In this way, DISCIPLE evolves from a helpful assistant in problem solving to a genuine expert. The problem-solving method of DISCIPLE combines problem reduction, problem solving by constraints, and problem solving by analogy. The learning method of DISCIPLE depends on its knowledge about the problem-solving step (the example) from which it learns. In the context of a complete theory about the example, DISCIPLE uses explanation-based learning to improve its performance. In the context of a weak theory about the example, it synergistically combines explanation-based learning, learning by analogy, empirical learning, and learning by questioning the user, developing its competence. In the context of an incomplete theory about the example, DISCIPLE learns by combining the above-mentioned methods, improving both its competence and performance.
••
TL;DR: In this paper, the authors present a concept-acquisition methodology that uses data (concept examples and counterexamples), domain knowledge, and tentative concept descriptions in an integrated way.
Abstract: In this chapter we present a concept-acquisition methodology that uses data (concept examples and counterexamples), domain knowledge, and tentative concept descriptions in an integrated way. Domain knowledge can be incomplete and/or incorrect with respect to the given data; moreover, the tentative concept descriptions can be expressed in a form that is not operational. The methodology is aimed at producing discriminant and operational concept descriptions, by integrating inductive and deductive learning. In fact, the domain theory is used in a deductive process, that tries to operationalize the tentative concept descriptions, but the obtained results are tested on the whole learning set rather than on a single example. Moreover, deduction is interleaved with the application of data-driven inductive steps. In this way, a search in a constrained space of possible descriptions can help overcome some limitations of the domain theory (e.g., inconsistency). The method has been tested in the framework of the inductive learning system “ML-SMART,” previously developed by the authors, and a simple example is also given.
••
TL;DR: The present approach is related to the discovery of category structure and the use of feature intercorrelations and their interaction with generalization, inheritance, retrieval, and memory organization.
Abstract: Categorization processes are central to many human capabilities; e.g., language, reasoning, problem solving. The concept of categorization is also at the base of many kinds of phenomenon which AI researchers have attempted to model; e.g., induction, analogy, and the use of causal models. Most approaches to induction can be characterized on a single dimension such as model driven, “top-down” to data driven, “bottom-up.” At the one end a large amount of preconstructed information (knowledge rich) is used while on the other end the featural similarity is analyzed of a given set of objects or events in the absence of other knowledge structures. These two kinds of approaches, represented recently by explanation-based learning (EBL) and similarity-based learning (SBL), conflict in terms of the proper approach to categorization and construction of causal theories. One view central to the present approach is that featural information is instrumental in formation of knowledge structures. Knowledge structures can be more general than objects and can possess more complex information than features (e.g., abstract concepts, actions, relations). Such knowledge structures are hypothesized to be both created and further manipulated by the SBL mechanism that learned them in the first place. The present approach is related to the discovery of category structure and the use of feature intercorrelations and their interaction with generalization, inheritance, retrieval, and memory organization.
••
TL;DR: This chapter proposes a new class of knowledge-based systems designed to address this knowledge-acquisition bottleneck by incorporating a learning component to acquire new knowledge through experience, called LEAP.
Abstract: It is by now well recognized that a major impediment to developing knowledge-based systems is the knowledge-acquisition bottleneck: The task of building up a complete enough and correct enough knowledge base to provide high-level performance. This chapter proposes a new class of knowledge-based systems designed to address this knowledge-acquisition bottleneck by incorporating a learning component to acquire new knowledge through experience. In particular, we define learning apprentice systems as the class of interactive, knowledge-based consultants that directly assimilate new knowledge by observing and analyzing the problem-solving steps contributed by their users through their normal use of the system. This chapter describes a specific learning apprentice system, called LEAP, which is currently being developed in the domain of VLSI design. We also discuss design issues for learning apprentice systems more generally, as well as restrictions on the generality of our current approach.
••
TL;DR: The present investigation focuses on knowledge acquisition, learning by analogy, and knowledge retention, and experimental results are presented that exhibit forms of intelligent behavior not yet observed in classified systems and expert systems.
Abstract: This paper presents a description and an empirical evaluation of a rule-based, cumulative learning system called CSM (classifier system with memory), tested in the robot navigation domain. The significance of this research is to augment the current model of classifier systems with analogical problem solving capabilities and chunking mechanisms. The present investigation focuses on knowledge acquisition, learning by analogy, and knowledge retention. Experimental results are presented that exhibit forms of intelligent behavior not yet observed in classified systems and expert systems.
••
TL;DR: The goal of this editorial is to emphasize the importance of exploratory research and to encourage the publication of high quality exploratory results in Machine Learning.
Abstract: Exploratory research contributes to the continued vitality of every discipline. The aim of exploratory research is to identify new tasks--tasks that cannot be solved by existing methods. Once a new task has been found, exploratory research seeks to develop a precise definition of the task and to understand the factors that make the task different from previously-solved tasks. Until recently, most research in machine learning was primarily exploratory. However, during the past decade, some areas of the field--particularly inductive learning--have matured to the point that careful, quantitative experiments are now possible and proved theoretical results have been obtained. Although these trends are extremely healthy and long overdue, there is a danger that the increased attention to these products of mature research may discourage researchers from undertaking and publishing research of a more exploratory nature. The goal of this editorial is to emphasize the importance of exploratory research and to encourage the publication of high quality exploratory results in Machine Learning.
••
TL;DR: This chapter presents a system, called OGUST, which learns concepts from sets of examples, and shows that for learning “good” generalizations, one must use all kinds of theorems and not only those expressed by taxonomies.
Abstract: In this chapter, we present a system, called OGUST, which learns concepts from sets of examples. Presently, most such systems use only properties of the domain expressed as taxonomies or use only a few simple theorems. First, we show that for learning “good” generalizations, we must use all kinds of theorems and not only those expressed by taxonomies. Then we explain how in OGUST, we control the use of theorems to apply only those that may improve the generalization, how we avoid the problem of loops, and how the use of theorems enables to increase the explicability of the system.
••
TL;DR: This editorial contains suggestions to authors of papers in the area of machine learning, although much of it applies to the broader field of artificial intelligence.
Abstract: This editorial contains suggestions to authors of papers in the area of machine learning, although much of it applies to the broader field of artificial intelligence. I have distilled these comments from my five-year experience as an editor of Machine Learning, focusing on problems that tended to recur in different papers. Many comments are slanted toward papers that describe running systems, but others will be useful for different types of papers. Authors should focus on those suggestions relevant to their own research emphasis. I have divided the suggestions into a number of categories, which should be self-explanatory. I expect most readers will agree with many of the points, but undoubtedly some will be more controversial. Despite this, I believe that listing them explicitly in this manner will at least encourage authors to think about the issues before drafting their papers, and thus reduce the need for revisions at later dates.
••
TL;DR: In this article, the authors claim that creativity is a simpler, more algorithmic process than many have thought and that AI is ready to start designing creative computers, and they also claim that AI researchers have been intimidated by it.
Abstract: Creativity is obviously a crucial aspect of human intelligence, and yet it has not been explored much by AI researchers. A principal reason for this lack of attention is the mystical aura that the word “creativity” has about it. Creativity is thought to be something so mysterious that AI researchers have been intimidated by it. We claim that creativity is a much simpler, more algorithmic process than many have thought and that AI is ready to start designing creative computers.
••
TL;DR: An overview of some recent theoretical results in the learning framework introduced by Valiant and further developed and a comparison to the work of Mitchell on version spaces are presented.
Abstract: We present an overview of some recent theoretical results in the learning framework introduced by Valiant in [Valiant, 1984] and further developed in [Valiant 1985; Blumer, et al., 1987; 1989; Pitt and Valiant, 1988; Haussler, 1988; Angluin and Laird, 1988; Angluin, 1988; Rivest, 1987; Haussler, 1989; Kearns, et al., 1987a; 1987b]. Our focus is on applications to AI problems of learning from examples as given in [Haussler, 1988; 1989] and [Kearns, et al., 1987a; 1987b], along with a comparison to the work of Mitchell on version spaces [Mitchell, 1982]. We discuss learning problems for both attribute-based and structural domains. This is a revised and expanded version of [Haussler, 1987].
••
TL;DR: This chapter argues that: (1) it may not always be practical or even possible to determine a causal explanation; (2) similarity usually implies causality; (3) similarity-based generalizations can be refined over time; and (4) similarity and explanation-based methods complement each other in important ways.
Abstract: A large portion of the research in machine learning has involved a paradigm of comparing many examples and analyzing them in terms of similarities and differences. The assumption is made, usually tacitly, that the resulting generalizations will have applicability to new examples. While such research has been very successful, it is by no means obvious why similarity-based generalizations should be useful, since they may simply reflect coincidences. Proponents of explanation-based learning—a knowledge-intensive method of examining single examples to derive generalizations based on underlying causal models—could contend that their methods are more fundamentally grounded, and that there is no need to look for similarities across examples. In this chapter, we present the issues, and then show why similarity-based methods are important. We include a description of the similarity-based system UNIMEM and present four reasons why robust machine learning must involve the integration of similarity-based and explanation-based methods. We argue that: (1) it may not always be practical or even possible to determine a causal explanation; (2) similarity usually implies causality; (3) similarity-based generalizations can be refined over time; (4) similarity-based and explanation-based methods complement each other in important ways.