scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 1993"


Journal ArticleDOI
TL;DR: On most datasets studied, the best of very simple rules that classify examples on the basis of a single attribute is as accurate as the rules induced by the majority of machine learning systems.
Abstract: This article reports an empirical investigation of the accuracy of rules that classify examples on the basis of a single attribute. On most datasets studied, the best of these very simple rules is as accurate as the rules induced by the majority of machine learning systems. The article explores the implications of this finding for machine learning research and applications.

1,873 citations


Book ChapterDOI
27 Jun 1993
TL;DR: A general method is presented that allows predictions to use both instance-based and model-based learning, and improves with three approaches to constructing models and with eight datasets demonstrate improvements due to the composite method.
Abstract: This paper concerns learning tasks that require the prediction of a continuous value rather than a discrete class. A general method is presented that allows predictions to use both instance-based and model-based learning. Results with three approaches to constructing models and with eight datasets demonstrate improvements due to the composite method. Keywords: learning with continuous classes, instance-based learning, model-based learning, empirical evaluation.

705 citations


Journal ArticleDOI
TL;DR: It is argued that knowledge transfer is essential if robots are to learn control with moderate learning times in complex scenarios and two approaches which both capture invariant knowledge about the robot and its environments are presented.

600 citations


Book ChapterDOI
22 Aug 1993
TL;DR: This paper shows how to construct several cryptographic primitives based on certain assumptions on the difficulty of learning by developing further a line of thought introduced by Impagliazzo and Levin.
Abstract: Modern cryptography has had considerable impact on the development of computational learning theory. Virtually every intractability result in Valiant’s model [13] (which is representation-independent in the sense that it does not rely on an artificial syntactic restriction on the learning algorithm’s hypotheses) has at its heart a cryptographic construction [4, 9, 1, 10]. In this paper, we give results in the reverse direction by showing how to construct several cryptographic primitives based on certain assumptions on the difficulty of learning. In doing so, we develop further a line of thought introduced by Impagliazzo and Levin [6].

340 citations


Journal ArticleDOI
01 Apr 1993
TL;DR: A class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency are examined.
Abstract: The Dyna class of reinforcement learning architectures enables the creation of integrated learning, planning and reacting systems. A class of strategies designed to enhance the learning and planning power of Dyna systems by increasing their computational efficiency is examined. The benefit of using these strategies is demonstrated on some simple abstract learning tasks. It is proposed that the backups to be performed in Dyna be prioritized in order to improve its efficiency. It is demonstrated with simple tasks that use some specific prioritizing schemes can lead to significant reductions in computational effort and corresponding improvements in learning performance. >

241 citations


Journal ArticleDOI
01 Jan 1993
TL;DR: Experiments show how behavior acquisition can be achieved by means of a learning coordination mechanism using an architecture based on learning classifier systems and on the structural properties of animal behavioral organization as proposed by ethologists.
Abstract: Intelligent robots should be able to use sensor information to learn how to behave in a changing environment. As environmental complexity grows, the learning task becomes more and more difficult. This problem is faced using an architecture based on learning classifier systems and on the structural properties of animal behavioral organization, as proposed by ethologists. After a description of the learning technique used and of the organizational structure proposed, experiments that show how behavior acquisition can be achieved are presented. The simulated robot learns to follow a light and to avoid hot dangerous objects. While these two simple behavioral patterns are independently learned, coordination is attained by means of a learning coordination mechanism. >

205 citations


11 Jul 1993
TL;DR: Several meta-learning strategies for integrating independently learned classifiers by the same learner in a parallel and distributed computing environment are outlined, particularly suited for massive amounts of data that main-memory-based learning algorithms cannot efficiently handle.
Abstract: Much of the research in inductive learning concentrates on problems with relatively small amounts of data. With the coming age of very large network computing, it is likely that orders of magnitude more data in databases will be available for various learning problems of real world importance. Learning techniques are central to knowledge discovery and the approach proposed in this paper may substantially increase the amount of data a knowledge discovery system can handle effectively. Meta-learning is proposed as a general technique to integrating a number of distinct learning processes. This paper details several meta-learning strategies for integrating independently learned classifiers by the same learner in a parallel and distributed computing environment. Our strategies are particularly suited for massive amounts of data that main-memory-based learning algorithms cannot efficiently handle. The strategies are also independent of the particular learning algorithm used and the underlying parallel and distributed platform. Preliminary experiments using different data sets and algorithms demonstrate encouraging results: parallel learning by meta-learning can achieve comparable prediction accuracy in less space and time than purely serial learning.

185 citations


Journal ArticleDOI
TL;DR: The use of function identification and adaptive control algorithms in learning controllers for robot manipulators and the similarities and differences between betterment learning schemes, repetitive controllers and adaptive learning schemes based on integral transforms are discussed.
Abstract: Learning control encompasses a class of control algorithms for programmable machines such as robots which attain, through an iterative process, the motor dexterity that enables the machine to execute complex tasks. In this paper we discuss the use of function identification and adaptive control algorithms in learning controllers for robot manipulators. In particular, we discuss the similarities and differences between betterment learning schemes, repetitive controllers and adaptive learning schemes based on integral transforms. The stability and convergence properties of adaptive learning algorithms based on integral transforms are highlighted and experimental results illustrating some of these properties are presented

183 citations


Proceedings Article
11 Jul 1993
TL;DR: This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains and shows that the algorithms are tractable with only a simple change in the task representation or initialization.
Abstract: This paper analyzes the complexity of on-line reinforcement learning algorithms, namely asynchronous realtime versions of Q-learning and value-iteration, applied to the problem of reaching a goal state in deterministic domains. Previous work had concluded that, in many cases, tabula rasa reinforcement learning was exponential for such problems, or was tractable only if the learning algorithm was augmented. We show that, to the contrary, the algorithms are tractable with only a simple change in the task representation or initialization. We provide tight bounds on the worst-case complexity, and show how the complexity is even smaller if the reinforcement learning algorithms have initial knowledge of the topology of the state space or the domain has certain special properties. We also present a novel bidirectional Q-learning algorithm to find optimal paths from all states to a goal state and show that it is no more complex than the other algorithms.

134 citations


Book ChapterDOI
27 Jun 1993
TL;DR: It is shown that simple random-representation methods can perform as well as nearest-neighbor methods (while being more suited to online learning), and signicantly better than backpropagation, and suggest that randomness has a useful role to play in online supervised learning and constructive induction.
Abstract: We consider the requirements of online learning|learning which must be done incrementally and in realtime, with the results of learning available soon after each new example is acquired. Despite the abundance of methods for learning from examples, there are few that can be used eectively for online learning, e.g., as components of reinforcement learning systems. Most of these few, including radial basis functions, CMACs, Kohonen’s self-organizing maps, and those developed in this paper, share the same structure. All expand the original input representation into a higher dimensional representation in an unsupervised way, and then map that representation to the nal answer using a relatively simple supervised learner, such as a perceptron or LMS rule. Such structures learn very rapidly and reliably, but have been thought either to scale poorly or to require extensive domain knowledge. To the contrary, some researchers (Rosenblatt, 1962; Gallant & Smith, 1987; Kanerva, 1988; Prager & Fallside, 1988) have argued that the expanded representation can be chosen largely at random with good results. The main contribution of this paper is to develop and test this hypothesis. We show that simple random-representation methods can perform as well as nearest-neighbor methods (while being more suited to online learning), and signicantly better than backpropagation. We nd that the size of the random representation does increase with the dimensionality of the problem, but not unreasonably so, and that the required size can be reduced substantially using unsupervisedlearning techniques. Our results suggest that randomness has a useful role to play in online supervised learning and constructive induction. 1. Online Learning Applications of supervised learning can be divided into two types: online and oine.

132 citations


Book ChapterDOI
27 Jun 1993
TL;DR: This research concludes that it is possible to build artificial agents that can acquire complex control policies effectively by reinforcement learning and enable its applications to complex robot- learning problems.
Abstract: The aim of this research is to extend the state of the art of reinforcement learning and enable its applications to complex robot- learning problems. This paper presents a series of scaling-up extensions to reinforcement learning, including: generalization by neural networks, using action models, teaching, hierarchical learning , and having a short-term memory. These extensions have been tested in a physically-realistic robot simulator, and combined to solve a complex robot-learning problem. Simulation results indicate that each of the extensions could result in either significant learning speedup or new capabilities. This research concludes that it is possible to build artificial agents that can acquire complex control policies effectively by reinforcement learning.

Book ChapterDOI
TL;DR: This work studies on-line learning processes in artificial neural networks from a general point of view, and applies the results on the transitions from “twists” in two-dimensional self-organizing maps to perfectly ordered configurations.
Abstract: We study on-line learning processes in artificial neural networks from a general point of view. On-line learning means that a learning step takes place at each presentation of a randomly drawn training pattern. It can be viewed as a stochastic process governed by a continuous-time master equation. On-line learning is necessary if not all training patterns are available all the time. This occurs in many applications when the training patterns are drawn from a time-dependent environmental distribution. Studying learning in a changing environment, we encounter a conflict between the adaptability and the confidence of the network's representation. Minimization of a criterion incorporating both effects yields an algorithm for on-line adaptation of the learning parameter. The inherent noise of on-line learning makes it possible to escape from undesired local minima of the error potential on which the learning rule performs (stochastic) gradient descent. We try to quantify these often made claims by considering the transition times between various minima. We apply our results on the transitions from “twists” in two-dimensional self-organizing maps to perfectly ordered configurations. Finally, we discuss the capabilities of on-line learning for global optimization.

Proceedings Article
01 Jan 1993
TL;DR: This paper presents a solution in the form of new pruning techniques that dramatically improve the runtime of rule induction methods with no loss in accuracy: formal analysis shows an improvement in asymp-totic time complexity, and experiments show an order-of-magnitude speedup.
Abstract: Recent years have seen increased interest in systems that learn sets of rules. The goal of this paper is to study the degree to which \separate and conquer" rule learning induction methods scale up to large, real-world learning problems. In particular, we study the asymptotic complexity of rule induction on large training sets in the presence of noise. We present formal arguments and experimental data supporting the claim that existing methods do not scale up well on noisy data. We then present a solution in the form of new pruning techniques that dramatically improve the runtime of rule induction methods with no loss in accuracy: formal analysis shows an improvement in asymp-totic time complexity, and experiments show an order-of-magnitude speedup on a set of benchmark problems while obtaining slightly more accurate hypotheses.

BookDOI
01 Jan 1993

Journal ArticleDOI
TL;DR: A form of distribution-free learning in which the learner knows the distribution being used, so that “distribution-free” refers only to the requirement that the number of queries can be obtained uniformly over all distributions.
Abstract: The original and most widely studied PAC model for learning assumes a passive learner in the sense that the learner plays no role in obtaining information about the unknown concept. That is, the samples are simply drawn independently from some probability distribution. Some work has been done on studying more powerful oracles and how they affect learnability. To find bounds on the improvement in sample complexity that can be expected from using oracles, we consider active learning in the sense that the learner has complete control over the information received. Specifically, we allow the learner to ask arbitrary yes/no questions. We consider both active learning under a fixed distribution and distribution-free active learning. In the case of active learning, the underlying probability distribution is used only to measure distance between concepts. For learnability with respect to a fixed distribution, active learning does not enlarge the set of learnable concept classes, but can improve the sample complexity. For distribution-free learning, it is shown that a concept class is actively learnable iff it is finite, so that active learning is in fact less powerful than the usual passive learning model. We also consider a form of distribution-free learning in which the learner knows the distribution being used, so that “distribution-free” refers only to the requirement that a bound on the number of queries can be obtained uniformly over all distributions. Even with the side information of the distribution being used, a concept class is actively learnable iff it has finite VC dimension, so that active learning with the side information still does not enlarge the set of learnable concept classes.

Book ChapterDOI
27 Jun 1993
TL;DR: A density-adaptive reinforcement learning and a density adaptive forgetting algorithm that deletes observations from the learning set depending on whether subsequent evidence is available in a local region of the parameter space.
Abstract: We describe a density-adaptive reinforcement learning and a density-adaptive forgetting algorithm. This learning algorithm uses hybrid D κ-D/2 κ -trees to allow for a variable resolution partitioning and labelling of the input space. The density adaptive forgetting algorithm deletes observations from the learning set depending on whether subsequent evidence is available in a local region of the parameter space. The algorithms are demonstrated in a simulation for learning feasible robotic grasp approach directions and orientations and then adapting to subsequent mechanical failures in the gripper.

Proceedings Article
29 Nov 1993
TL;DR: The goal of the learner is to infer a hypothesis w=(w/sub 1/,...w/ sub d/)' with small (mean-square) generalisation error E(Y-/spl psi/(X)'w)/sup 2/ on future random examples (X,Y) generated independently of the training sample from the same underlying distribution.
Abstract: We study the problem of when to stop learning a class of feedforward networks - networks with linear outputs neuron and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there are in general three distinct phases in the generalization performance in the learning process, and in particular, the network has better generalization performance when learning is stopped at a certain time before the global minimum of the empirical error is reacherd. A notion of effective size of a machine is defined and used to explain the trade-off between the complexity of the machine and the training error in the learning process. The study leads naturally to a network size selection criterion, which turns out to be a generalization of Akaike's Information Criterion for the learning process. It is shown that stopping learning before the global minimum of the empirical error has the effect of network size selection.

Journal ArticleDOI
TL;DR: This article focuses on the distribution of work between several learning algorithms on the one hand and the user on the other hand, and the principle of multi-functionality of one representation for the balanced use by learning algorithms and users.
Abstract: Machine learning techniques are often used for supporting a knowledge engineer in constructing a model of part of the world. Different learning algorithms contribute to different tasks within the modeling process. Integrating several learning algorithms into one system allows it to support several modeling tasks within the same framework. In this article, we focus on the distribution of work between several learning algorithms on the one hand and the user on the other hand. The approach followed by the MOBAL system is that of balanced cooperation, i.e., each modeling task can be done by the user or by a learning tool of the system. The MOBAL system is described in detail. We discuss the principle of mutli-functionality of one representation for the balanced use by learning algorithms and users.

Proceedings ArticleDOI
28 Mar 1993
TL;DR: It is shown how reinforcement learning can be made practical for complex problems by introducing hierarchical learning and artificial neural networks are used to generalize experiences.
Abstract: It is shown how reinforcement learning can be made practical for complex problems by introducing hierarchical learning. The agent at first learns elementary skills for solving elementary problems. To learn a new skill for solving a complex problem later on, the agent can ignore the low-level details and focus on the problem of coordinating the elementary skills it has developed. A physically-realistic mobile robot simulator is used to demonstrate the success and importance of hierarchical learning. For fast learning, artificial neural networks are used to generalize experiences, and a teaching technique is employed to save many learning trials of the simulated robot. >

Journal ArticleDOI
TL;DR: Experimental results are presented to show that oriented dynamic learning is far more efficient than dynamic learning in SOCRATES.
Abstract: An efficient technique for dynamic learning called oriented dynamic learning is proposed. Instead of learning being performed for almost all signals in the circuit, it is shown that it is possible to determine a subset of these signals to which all learning operations can be restricted. It is further shown that learning for this set of signals provides the same knowledge about the nonsolution areas in the decision trees as the dynamic learning of SOCRATES. High efficiency is achieved by limiting learning to certain learning lines that lie within a certain area of the circuit, called the active area. Experimental results are presented to show that oriented dynamic learning is far more efficient than dynamic learning in SOCRATES. >

Proceedings Article
28 Aug 1993
TL;DR: The learning system SMART+ is described, that embeds sophisticated knowledge-based heuristics to control the search process and is able to deal with numerical features.
Abstract: Inducing concept descriptions in First Order Logic is inherently a complex task. There are two main reasons: on one hand, the task is usually formulated as a search problem inside a very large space of logical descriptions which needs strong heuristics to be kept to manageable size. On the other hand, most developed algorithms are unable to handle numerical features, typically occurring in realworld data. In this paper, we describe the learning system SMART+, that embeds sophisticated knowledge-based heuristics to control the search process and is able to deal with numerical features. SMART+ can use different learning strategies, such as inductive, deductive and abductive ones, and exploits both backgruond knowledge and statistical evaluation criteria. Furthermore, it can use simple Genetic Algorithms to refine predicate semantics and this aspect will be described in detail. Finally, an evaluation of SMART+ performances is made on a complex task.

Book ChapterDOI
13 Sep 1993
TL;DR: This work denominates the method projective mapping, which is the most common method in feed forward neural networks, where an input vector is projected on a “weight vector”.
Abstract: A response generating system can be seen as a mapping from a set of external states (inputs) to a set of actions (outputs). This mapping can be done in principally different ways. One method is to divide the state space into a set of discrete states and store the optimal response for each state. This is denominated a memory mapping system. Another method is to approximate continuous functions from the input space to the output space. I denominate this method projective mapping, although the function does not have to be linear. The latter method is the most common one in feed forward neural networks, where an input vector is projected on a “weight vector”.

Proceedings Article
28 Aug 1993
TL;DR: A learning method that combines explanation-based learning from a previously learned approximate domain theory, together with inductive learning from observations, based on a neural network representation of domain knowledge that is robust to errors in the domain theory.
Abstract: Many researchers have noted the importance of combining inductive and analytical learning, yet we still lack combined learning methods that are effective in practice. We present here a learning method that combines explanation-based learning from a previously learned approximate domain theory, together with inductive learning from observations. This method, called explanation-based neural network learning (EBNN), is based on a neural network representation of domain knowledge. Explanations are constructed by chaining together inferences from multiple neural networks. In contrast with symbolic approaches to explanation-based learning which extract weakest preconditions from the explanation, EBNN extracts the derivatives of the target concept with respect to the training example features. These derivatives summarize the dependencies within the explanation, and are used to bias the inductive learning of the target concept. Experimental results on a simulated robot control task show that EBNN requires significantly fewer training examples than standard inductive learning. Furthermore, the method is shown to be robust to errors in the domain theory, operating effectively over a broad spectrum from very strong to very weak domain theories.

Journal ArticleDOI
10 Mar 1993-EPL
TL;DR: It is found that the optimally-trained spherical perceptron may learn a linearly-separable rule as well as any possible network, and simulation results support these conclusions.
Abstract: We introduce optimal learning with a neural network, which we define as minimising the expectation generalisation error. We find that the optimally-trained spherical perceptron may learn a linearly-separable rule as well as any possible network. We sketch an algorithm to generate optimal learning, and simulation results support our conclusions. Optimal learning of a well-known, significant unlearnable problem, the mismatched weight problem, gives better asymptotic learning than conventional techniques, and may be simulated enormously more easily. Unlike many other learning schemes, optimal learning extends to more general networks learning more complex rules.

Proceedings ArticleDOI
28 Mar 1993
TL;DR: The UR-ID3 algorithm described combines uncertain reasoning with the rule set produced by ID3 to create a machine learning algorithm which is robust in the presence of uncertain training and testing data.
Abstract: Quinlan's ID3 is a symbolic machine learning algorithm which uses training examples as input and constructs a decision tree as output. One problem with the standard decision tree approach to machine learning is that uncertain data, either in training and/or testing, often produces poor classification accuracies. The UR-ID3 algorithm described combines uncertain reasoning with the rule set produced by ID3 to create a machine learning algorithm which is robust in the presence of uncertain training and testing data. Experimental results are presented which compare the new algorithm's performance with that of ID3 and backpropagation neural networks. >

Proceedings ArticleDOI
28 Mar 1993
TL;DR: An error potential for self-organizing learning rules is given and the gradient leads to the well-known learning rule of Kohonen, except for the determination of the 'winning' unit.
Abstract: An error potential for self-organizing learning rules is given. The gradient of this error potential leads to the well-known learning rule of Kohonen, except for the determination of the 'winning' unit. The existence of an error potential facilitates a global description of the learning process. A one-dimensional topological map is treated as an example. >

Book ChapterDOI
08 Nov 1993
TL;DR: Three sequential-learning problems are reviewed, some new, and not-so-new, algorithms for learning from sequences are examined, and applications for these methods are given.
Abstract: Whereas basic machine learning research has mostly viewed input data as an unordered random sample from a population, researchers have also studied learning from data whose input sequence follows a regular sequence. To do so requires that we regard the input data as a stream and identify regularities in the data values as they occur. In this brief survey I review three sequential-learning problems, examine some new, and not-so-new, algorithms for learning from sequences, and give applications for these methods. The three generic problems I discuss are: Predicting sequences of discrete symbols generated by stochastic processes. Learning streams by extrapolation from a general rule. Learning to predict time series.

Proceedings ArticleDOI
01 Jan 1993
TL;DR: The proposed models adopt supervised learning without modifying the basic learning algorithm and behave as a supervised learning machine, which can learn input-output functions in addition to the characteristics of the conventional Kohonen feature maps.
Abstract: Kohonen feature maps as a supervised learning machine are proposed and discussed. The proposed models adopt supervised learning without modifying the basic learning algorithm. They behave as a supervised learning machine, which can learn input-output functions in addition to the characteristics of the conventional Kohonen feature maps. In the pattern recognition problems, the proposed models can structure the recognition system more simply than the conventional method, i.e., structuring a pattern recognition machine using a supervised learning machine after pre-processing by the Kohonen feature map. The proposed models do not distinguish the input vectors from the desired vectors because they regard them as the same kind of vectors. Several examples are simulated in order to compare with the conventional supervised learning machines. The results indicate the effectiveness of the proposed models. >

Proceedings ArticleDOI
01 Aug 1993
TL;DR: A survey of recent advances in the application of genetic algorithms to problems in machine learning.
Abstract: One approach to the design of learning systems is to extract heuristics from existing adaptive systems. Genetic algorithms are heuristic learning models based on principles drawn from natural evolution and selective breeding. Some features that distinguish genetic algorithms from other search methods are: A population of structures that can be interpreted as candidate solutions to the given problem; The competitive selection of structures for reproduction, based on each structure's tness as a solution to the given problem; Idealized genetic operators that alter the selected structures in order to create new structures for further testing. In many applications, these features enable the genetic algorithm to rapidly improve the average tness of the population and to quickly identify the high performance regions of very complex search spaces. In practice, genetic algorithms may be combined with local search techniques to create a high-performance hybrid search algorithm. This talk provides a survey of recent advances in the application of genetic algorithms to problems in machine learning. Although many genetic algorithm applications have been in the areas of function optimization, parameter tuning, scheduling and other combinatorial problems [1], genetic algorithms have also been applied to many traditional machine learning problems, including concept learning from examples, learning weights for neural nets, and learning rules for sequential decision problems. At NRL, we investigate many aspects of genetic algorithms, ranging from the study of alternative selection policies [6] and crossover operators [3, 12], to performance studies of genetic algorithms for optimization in non-stationary environments [8]. Much of our e ort has been devoted to the development of practical learning systems that use genetic algorithms to learn strategies for sequential decision problems [5]. In our Samuel system [7], the \chromosome" of the genetic algorithm represents a set of condition-action rules for controlling an autonomous vehicle or a robot. The tness of a rule set is measured by evaluating the performance of the resulting control strategy on a simulator. This system has successfully learned highly e ective strategies for several tasks, including evading a predator, tracking a prey, seeking a goal while avoiding obstacles, and defending a goal from threatening agents. As these examples show, we have a high level of interest in learning in multi-agent environments in which the behaviors of the external agents are not easily characterized by the learner. We have found that genetic algorithms provide an e cient way to learn strategies that take advantage of subtle regularities in the behavior of opposing agents. We are now beginning to investigate the more general case in which the behavior of the external agents changes over time. In particular, we are interested in learning competitive strategies against an opponent that is itself a learning agent. This is, of course, the usual situation in natural environments in which multiple species compete for survival. Our initial studies lead us to expect that genetic learning systems can successfully adapt to changing environmental conditions. While the range of applications of genetic algorithms continues to grow more rapidly each year, the study of the theoretical foundations is still in an early stage. Holland's early work [9] showed that a simple form of genetic algorithm implicitly estimates the utility of a vast number of distinct subspaces, and allocates future trials accordingly. Speci cally, let H be a hyperplane in the representation space. For example, if the structures are represented by six binary features, then the hyperplane denoted by H =0#1### consists of all structures in which the rst feature is absent and the third feature is present. Holland showed that the expected number of samples (o spring) allocated to a hyperplane H at time t + 1 is given by: M (H; t+ 1) M (H; t) f (H; t)

01 Jan 1993
TL;DR: Geometry is used to formulate and analyze two theoretical machine learning problems and to improve an existing machine learning method, and a form of best-case analysis is presented which can be used to generate bounds on the number of examples needed for learning.
Abstract: Due to advances in the sciences, the ability of mankind to collect data has increased enormously in recent years. In many different areas, the research bottleneck is no longer in data collection, but rather in data interpretation. Often, the underlying processes described by the data are very complex, compounding the problem. In response, much research in computer analysis of large datasets is under way. Machine learning is one of the fields responding to the challenge of this problem. This thesis considers the interpretation of machine learning problems and techniques under a geometric model. Geometry is a useful for several reasons. It aids in visualization, allowing for alternative ways of thinking. It provides a rigorous framework for analyzing learning problems and methods. It suggests modifications to methods which can increase their usefulness and efficiency. In this thesis, geometric analysis is used to formulate and analyze two theoretical machine learning problems and to improve an existing machine learning method. A form of best-case analysis is presented which can be used to generate bounds on the number of examples needed for learning. A formal trade-off between available memory and speed of learning is established. In addition, geometry is used to generalize the decision tree method of machine learning. An algorithm for learning based on this generalization is presented, and its usefulness is demonstrated on a variety of artificial and real-world databases.