scispace - formally typeset
Search or ask a question

Showing papers by "Thomas G. Dietterich published in 1991"


Proceedings Article
14 Jul 1991
TL;DR: It is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/e ln 1/δ+ 1/e[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features, and suggests that training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.
Abstract: In many domains, an appropriate inductive bias is the MIN-FEATURES bias, which prefers consistent hypotheses definable over as few features as possible. This paper defines and studies this bias. First, it is shown that any learning algorithm implementing the MIN-FEATURES bias requires Θ(1/e ln 1/δ+ 1/e[2p + p ln n]) training examples to guarantee PAC-learning a concept having p relevant features out of n available features. This bound is only logarithmic in the number of irrelevant features. The paper also presents a quasi-polynomial time algorithm, FOCUS, which implements MIN-FEATURES. Experimental studies are presented that compare FOCUS to the ID3 and FRINGE algorithms. These experiments show that-- contrary to expectations--these algorithms do not implement good approximations of MIN-FEATURES. The coverage, sample complexity, and generalization performance of FOCUS is substantially better than either ID3 or FRINGE on learning problems where the MIN-FEATURES bias is appropriate. This suggests that, in practical applications, training data should be preprocessed to remove irrelevant features before being given to ID3 or FRINGE.

716 citations


Book
01 Mar 1991
TL;DR: Readings in Machine Learning collects the best of the published machine learning literature, including papers that address a wide range of learning tasks, and that introduce a variety of techniques for giving machines the ability to learn.
Abstract: From the Publisher: The ability to learn is a fundamental characteristic of intelligent behavior. Consequently, machine learning has been a focus of artificial intelligence since the beginnings of AI in the 1950s. The 1980s saw tremendous growth in the field, and this growth promises to continue with valuable contributions to science, engineering, and business. Readings in Machine Learning collects the best of the published machine learning literature, including papers that address a wide range of learning tasks, and that introduce a variety of techniques for giving machines the ability to learn. The editors, in cooperation with a group of expert referees, have chosen important papers that empirically study, theoretically analyze, or psychologically justify machine learning algorithms. The papers are grouped into a dozen categories, each of which is introduced by the editors.

325 citations


01 Jan 1991
TL;DR: It is demonstrated that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.
Abstract: Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k < 2 values (i.e., k "classes"). The definition is acquired by studying large collections of training examples of the form [xi, f(xi)]. Existing approaches to this problem include (a) direct application of multiclass algorithms such as the decision-tree algorithms ID3 and CART, (b) application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and (c) application of binary concept learning algorithms with distributed output codes such as those employed by Sejnowski and Rosenberg in the NETtalk system. This paper compares these three approaches to a new technique in which BCH error-correcting codes are employed as a distributed output representation. We show that these output representations improve the performance of ID3 on the NETtalk task and of back propagation on an isolated-letter speech-recognition task. These results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.

212 citations


Book ChapterDOI
14 Jul 1991
TL;DR: In this paper, error-correcting output codes are employed as a distributed output representation to improve the performance of ID3 on the NETtalk task and of backpropagation on an isolated-letter speech-recognition task.
Abstract: Multiclass learning problems involve finding a definition for an unknown function f(x) whose range is a discrete set containing k > 2 values (i.e., k "classes"). The definition is acquired by studying large collections of training examples of the form 〈Xi, f(Xi)〉. Existing approaches to this problem include (a) direct application of multiclass algorithms such as the decision-tree algorithms ID3 and CART, (b) application of binary concept learning algorithms to learn individual binary functions for each of the k classes, and (c) application of binary concept learning algorithms with distributed output codes such as those employed by Sejnowski and Rosenberg in the NETtalk system. This paper compares these three approaches to a new technique in which BCH error-correcting codes are employed as a distributed output representation. We show that these output representations improve the performance of ID3 on the NETtalk task and of backpropagation on an isolated-letter speech-recognition task. These results demonstrate that error-correcting output codes provide a general-purpose method for improving the performance of inductive learning programs on multiclass problems.

188 citations


Proceedings Article
02 Dec 1991
TL;DR: It is concluded that supervised learning of center locations can be very important for radial basis function learning.
Abstract: Three methods for improving the performance of (gaussian) radial basis function (RBF) networks were tested on the NETtalk task. In RBF, a new example is classified by computing its Euclidean distance to a set of centers chosen by unsupervised methods. The application of supervised learning to learn a non-Euclidean distance metric was found to reduce the error rate of RBF networks, while supervised learning of each center's variance resulted in inferior performance. The best improvement in accuracy was achieved by networks called generalized radial basis function (GRBF) networks. In GRBF, the center locations are determined by supervised learning. After training on 1000 words, RBF classifies 56.5% of letters correct, while GRBF scores 73.4% letters correct (on a separate test set). From these and other experiments, we conclude that supervised learning of center locations can be very important for radial basis function learning.

175 citations



01 Jan 1991
TL;DR: A set of machine learning methods for automatically constructing letter-to-sound rules by analyzing a dictionary of words and their pronunciations are presented, showing that error-correcting output codes provide a domain-independent, algorithm-independent approach to multiclass learning problems.
Abstract: The task of mapping spelled English words into strings of phonemes and stresses ("reading aloud") has many practical applications. Several commercial systems perform this task by applying a knowledge base of expert-supplied letter-to-sound rules. This dissertation presents a set of machine learning methods for automatically constructing letter-to-sound rules by analyzing a dictionary of words and their pronunciations. Taken together, these methods provide a substantial performance improvement over the best commercial system--DECtalk from Digital Equipment Corporation. In a performance test, the learning methods were trained on a dictionary of 19,002 words. Then, human subjects were asked to compare the performance of the resulting letter-to-sound rules against the dictionary for an additional 1,000 words not used during training. In a blind procedure, the subjects rated the pronunciations of both the learned rules and the DECtalk rules according to whether they were noticably different from the dictionary pronunciation. The error rate for the learned rules was 28.8% (288 words noticeably different), while the error rate for the DECtalk rules was 44.3% (433 words noticeably different). If, instead of using human judges, were required that the pronunciations of the letter-to-sound rules exactly match the dictionary to be counted correct, then the error rate for our learned rules is 35.2% and the error rate for DECtalk is 63.6%. Similar results were observed at the level of individual letters, phonemes, and stresses. To achieve these results, several techniques were combined. The key learning technique represents the output classes by the codewords of an error-correcting code. Boolean concept learning methods, such as the standard ID3 decision-tree algorithm, can be applied to learn the individual bits of these codewords. This converts the muticlass learning problem into a number of boolean concept learning problems. This method is shown to be superior to several other methods: multiclass ID3, one-tree-per-class ID3, the domain-specific distributed code employed by T. Sejnowski and C. Rosenberg in their NETtalk system, and a method developed by D. Wolpert. Similar results in the domain of isolated-letter speech recognition with the backpropagation algorithm show that error-correcting output codes provide a domain-independent, algorithm-independent approach to multiclass learning problems.

19 citations


Journal ArticleDOI
TL;DR: Two general approaches to closing the gap between specifications and run-time architectures are described, which have been the focus of work on model-directed reasoning and task-specific architectures.
Abstract: The claim that in knowledge compilation the gap between specifications and run-time architectures is substantial is examined. The forces that create the gap are identified and discussed. Two general approaches to closing this gap are described. One approach, which has been the focus of knowledge compilation research, converts specifications into a form that the run-time architecture can interpret directly. The other approach, which has been the focus of work on model-directed reasoning and task-specific architectures, changes the run-time architecture so that it can interpret the given specifications directly. >

9 citations



Book ChapterDOI
01 Jun 1991
TL;DR: A method to replace a single inefficient non-gradient-based optimization by a set of efficient numerical gradient-directed optimizations that can be performed in parallel and decreases the dependence of the numerical methods on having a good starting point is described.
Abstract: Many important application problems can be formalized as constrained non-linear optimization tasks. However, numerical methods for solving such problems are brittle and do not scale well. This paper describes a method to speed up and increase the reliability of numerical optimization by (a) optimizing the computation of the objective function, and (b) splitting the objective function into special cases that possess differentiable closed forms. This allows us to replace a single inefficient non-gradient-based optimization by a set of efficient numerical gradient-directed optimizations that can be performed in parallel. In the domain of 2-dimensional structural design, this procedure yields a 95% speedup over traditional optimization methods and decreases the dependence of the numerical methods on having a good starting point.

3 citations


Book ChapterDOI
01 Jun 1991
TL;DR: A taxonomy of engineering tasks for application of machine learning technology is described and described, including noisy data, continuous quantities, mathematical formulas, large problem spaces, incorporating multiple sources and forms of knowledge, and the need for user-system interaction.
Abstract: Engineers need intelligent tools to assist them with problems such as design, planning, monitoring, control, diagnosis, and analysis. Manual construction of these tools can be costly or impossible due to problems such as large amounts of data, lack of problem understanding, and the expense of knowledge engineering. Machine learning techniques hold promise for assisting in solutions to many of these problems, but engineering domains present significant challenges to learning systems, including: noisy data, continuous quantities, mathematical formulas, large problem spaces, incorporating multiple sources and forms of knowledge, and the need for user-system interaction. This paper describes a number of challenges to learning systems motivated by engineering applications and describes a taxonomy of engineering tasks for application of machine learning technology.

01 Jan 1991
TL;DR: A method to replace a single inefficient non-gradient-based optimization by a set of efficient numerical gradient-directed optimizations that can be performed in parallel and yields a 95% speedup over traditional optimization methods and de creases the dependence of the numerical methods on having a good starting point.
Abstract: Many important application problems can be formalized as constrained non-linear optimization tasks. However, numerical methods for solving such problems are brittle and do not scale well. Furthermore, for large classes of engineering problems, the objective function cannot be converted into a differentiable closed form. This prevents the application of efficient gradient optimization methods--only slower, non-gradient methods can be applied. This paper describes a method to speed up and increase the reliability of numerical optimization by (a) optimizing the computation of the objective function, and (b) splitting the objective function into special cases that possess differentiable closed forms. This allows us to replace a single inefficient non-gradient-based optimization by a set of efficient numerical gradient-directed optimizations that can be performed in parallel. In the domain of 2-dimensional structural design, this procedure yields a 95% speedup over traditional optimization methods and de creases the dependence of the numerical methods on having a good starting point.