scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 1996"


Journal ArticleDOI
TL;DR: Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis.
Abstract: Machine Learning is the study of methods for programming computers to learn. Computers are applied to a wide range of tasks, and for most of these it is relatively easy for programmers to design and implement the necessary software. However, there are many tasks for which this is difficult or impossible. These can be divided into four general categories. First, there are problems for which there exist no human experts. For example, in modern automated manufacturing facilities, there is a need to predict machine failures before they occur by analyzing sensor readings. Because the machines are new, there are no human experts who can be interviewed by a programmer to provide the knowledge necessary to build a computer system. A machine learning system can study recorded data and subsequent machine failures and learn prediction rules. Second, there are problems where human experts exist, but where they are unable to explain their expertise. This is the case in many perceptual tasks, such as speech recognition, hand-writing recognition, and natural language understanding. Virtually all humans exhibit expert-level abilities on these tasks, but none of them can describe the detailed steps that they follow as they perform them. Fortunately, humans can provide machines with examples of the inputs and correct outputs for these tasks, so machine learning algorithms can learn to map the inputs to the outputs. Third, there are problems where phenomena are changing rapidly. In finance, for example, people would like to predict the future behavior of the stock market, of consumer purchases, or of exchange rates. These behaviors change frequently, so that even if a programmer could construct a good predictive computer program, it would need to be rewritten frequently. A learning program can relieve the programmer of this burden by constantly modifying and tuning a set of learned prediction rules. Fourth, there are applications that need to be customized for each computer user separately. Consider, for example, a program to filter unwanted electronic mail messages. Different users will need different filters. It is unreasonable to expect each user to program his or her own rules, and it is infeasible to provide every user with a software engineer to keep the rules up-to-date. A machine learning system can learn which mail messages the user rejects and maintain the filtering rules automatically. Machine learning addresses many of the same research questions as the fields of statistics, data mining, and psychology, but with differences of emphasis. Statistics focuses on understanding the phenomena that have generated the data, often with the goal of testing different hypotheses about those phenomena. Data mining seeks to find patterns in the data that are understandable by people. Psychological studies of human learning aspire to understand the mechanisms underlying the various learning behaviors exhibited by people (concept learning, skill acquisition, strategy change, etc.).

13,246 citations


Journal ArticleDOI
TL;DR: In this article, the optimal data selection techniques have been used with feed-forward neural networks and showed how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.
Abstract: For many types of machine learning algorithms, one can compute the statistically "optimal" way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.

2,122 citations


Proceedings Article
03 Dec 1996
TL;DR: In an implementation of pole balancing on a complex anthropomorphic robot arm, it is demonstrated that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems.
Abstract: By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For teaming control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based reinforcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor.

592 citations


Journal ArticleDOI
TL;DR: A novel incremental algorithm that combines Q-learning with the TD(λ) return estimation process, which is typically used in actor-critic learning, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization.
Abstract: This paper presents a novel incremental algorithm that combines Q-learning, a well-known dynamic-programming based reinforcement learning method, with the TD(λ) return estimation process, which is typically used in actor-critic learning, another well-known dynamic-programming based reinforcement learning method. The parameter λ is used to distribute credit throughout sequences of actions, leading to faster learning and also helping to alleviate the non-Markovian effect of coarse state-space quatization. The resulting algorithm.Q(λ)-learning, thus combines some of the best features of the Q-learning and actor-critic learning paradigms. The behavior of this algorithm has been demonstrated through computer simulations.

355 citations


Journal ArticleDOI
TL;DR: Results demonstrate the effectiveness of structural learning with forgetting, applied to various examples: the discovery of Boolean functions, classification of irises, discovery of recurrent networks, prediction of time series and rule extraction from mushroom data.

319 citations


Posted Content
TL;DR: This work shows how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression.
Abstract: For many types of machine learning algorithms, one can compute the statistically `optimal' way to select training data. In this paper, we review how optimal data selection techniques have been used with feedforward neural networks. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are computationally expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate. Empirically, we observe that the optimality criterion sharply decreases the number of training examples the learner needs in order to achieve good performance.

274 citations


Proceedings Article
01 Jul 1996
TL;DR: A large-scale application of the memory-based approach to part of speech tagging is shown to be feasible, obtaining a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases.
Abstract: We introduce a memory-based approach to part of speech tagging. Memory-based learning is a form of supervised learning based on similarity-based reasoning. The part of speech tag of a word in a particular context is extrapolated from the most similar cases held in memory. Supervised learning approaches are useful when a tagged corpus is available as an example of the desired output of the tagger. Based on such a corpus, the tagger-generator automatically builds a tagger which is able to tag new text the same way, diminishing development time for the construction of a tagger considerably. Memory-based tagging shares this advantage with other statistical or machine learning approaches. Additional advantages specific to a memory-based approach include (i) the relatively small tagged corpus size sufficient for training, (ii) incremental learning, (iii) explanation capabilities, (iv) flexible integration of information in case representations, (v) its non-parametric nature, (vi) reasonably good results on unknown words without morphological analysis, and (vii) fast learning and tagging. In this paper we show that a large-scale application of the memory-based approach is feasible: we obtain a tagging accuracy that is on a par with that of known statistical approaches, and with attractive space and time complexity properties when using IGTree, a tree-based formalism for indexing and searching huge case bases. The use of IGTree has as additional advantage that optimal context size for disambiguation is dynamically computed. 1 I n t r o d u c t i o n Part of Speech (POS) tagging is a process in which syntactic categories are assigned to words. It can be seen as a mapping from sentences to strings of tags. Automatic tagging is useful for a number of applications: as a preprocessing stage to parsing, in information retrieval, in text to speech systems, in corpus linguistics, etc. The two factors determining the syntactic category of a word are its lexical probability (e.g. without context, man is more probably a noun than a verb), and its contextual probability (e.g. after a pronoun, man is more probably a verb than a noun, as in they man the boats). Several approaches have been proposed to construct automatic taggers. Most work on statistical methods has used n-gram models or Hidden Markov Model-based taggers (e.g. Church, 1988; DeRose, 1988; Cutting et al. 1992; Merialdo, 1994, etc.). In

274 citations


Proceedings Article
01 Jan 1996
TL;DR: An experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context finds the statistical and neural-network methods perform the best on this particular problem.
Abstract: This paper describes an experimental comparison of seven different learning algorithms on the problem of learning to disambiguate the meaning of a word from context. The algorithms tested include statistical, neural-network, decision-tree, rule-based, and case-based classification techniques. The specific problem tested involves disambiguating six senses of the word ``line'' using the words in the current and proceeding sentence as context. The statistical and neural-network methods perform the best on this particular problem and we discuss a potential reason for this observed difference. We also discuss the role of bias in machine learning and its importance in explaining performance differences observed on specific problems.

254 citations


Dissertation
01 Jan 1996
TL;DR: This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries, and proposes an active learning formulation for function approximation, and shows that the active example selection strategy learns its target with fewer data samples than random sampling.
Abstract: Object and pattern detection is a classical computer vision problem with many potential applications, ranging from automatic target recognition to image-based industrial inspection tasks in assembly lines. While there have been some successful object and pattern detection systems in the past, most such systems handle only specific rigid objects or patterns that can be accurately described by fixed geometric models or pictorial templates. This thesis presents a learning based approach for detecting classes of objects and patterns with variable image appearance but highly predictable image boundaries. Some examples of such object and pattern classes include human faces, aerial views of structured terrain features like volcanoes, localized material defect signatures in industrial parts, certain tissue anomalies in medical images, and instances of a given digit or character, which may be written or printed in many different styles. The thesis consists of two parts. In part one, we introduce our object and pattern detection approach using a concrete human face detection example. The approach first builds a distribution-based model of the target pattern class in an appropriate feature space to describe the target's variable image appearance. It then learns from examples a similarity measure for matching new patterns against the distribution-based target model. We also discuss some pertinent learning issues, including ideas on virtual example generation and example selection. The approach makes few assumptions about the target pattern class and should therefore be fairly general, as long as the target class has predictable image boundaries. We show that this is indeed the case by demonstrating the technique on two other pattern detection/recognition problems. Because our object and pattern detection approach is very much learning-based, how well a system eventually performs depends heavily on the quality of training examples it receives. The second part of this thesis looks at how one can select high quality examples for function approximation learning tasks. Active learning is an area of research that investigates how a learner can intelligently select future training examples to get better approximation results with less data. We propose an active learning formulation for function approximation, and show for three specific approximation function classes, that the active example selection strategy learns its target with fewer data samples than random sampling. Finally, we simplify the original active learning formulation, and show how it leads to a tractable example selection paradigm, suitable for use in many object and pattern detection problems. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

254 citations


Journal ArticleDOI
TL;DR: The authors apply techniques from optimal experiment design (OED) to guide the query/action selection of a neural network learner, and demonstrate that these techniques allow the learner to minimize its generalization error by exploring its domain efficiently and completely.

249 citations


Proceedings Article
01 Jan 1996
TL;DR: The task-clustering algorithm TC clusters learning tasks into classes of mutually related tasks, and outperforms its non-selective counterpart in situations where only a small number of tasks is relevant.
Abstract: Recently, there has been an increased interest in “lifelong” machine learning methods, that transfer knowledge across multiple learning tasks. Such methods have repeatedly been found to outperform conventional, single-task learning algorithms when the learning tasks are appropriately related. To increase robustness of such approaches, methods are desirable that can reason about the relatedness of individual learning tasks, in order to avoid the danger arising from tasks that are unrelated and thus potentially misleading. This paper describes the task-clustering (TC) algorithm. TC clusters learning tasks into classes of mutually related tasks. When facing a new learning task, TC first determines the most related task cluster, then exploits information selectively from this task cluster only. An empirical study carried out in a mobile robot domain shows that TC outperforms its non-selective counterpart in situations where only a small number of tasks is relevant.

Proceedings Article
03 Jul 1996
TL;DR: A method for learning Bayesian networks that handles the discretization of continuous variables as an integral part of the learning process is introduced, using a new metric based on the Minimal Description Length principle for choosing the threshold values for theDiscretization while learning the Bayesian network structure.
Abstract: We introduce a method for learning Bayesian networks that handles the discretization of continuous variables as an integral part of the learning process. The main ingredient in this method is a new metric based on the Minimal Description Length principle for choosing the threshold values for the discretization while learning the Bayesian network structure. This score balances the complexity of the learned discretization and the learned network structure against how well they model the training data. This ensures that the discretization of each variable introduces just enough intervals to capture its interaction with adjacent variables in the network. We formally derive the new metric, study its main properties, and propose an iterative algorithm for learning a discretization policy. Finally, we illustrate its behavior in applications to supervised learning.

Proceedings Article
02 Aug 1996
TL;DR: This paper combines data mining and constructive induction with more standard machine learning techniques to design methods for detecting fraudulent usage of cellular telephones based on profiling customer behavior, and uses a rule-learning program to uncover indicators of fraudulent behavior from a large database of cellular calls.
Abstract: This paper describes the automatic design of methods for detecting fraudulent behavior. Much of the design is accomplished using a series of machine learning methods. In particular, we combine data mining and constructive induction with more standard machine learning techniques to design methods for detecting fraudulent usage of cellular telephones based on profiling customer behavior. Specifically, we use a rule-learning program to uncover indicators of fraudulent behavior from a large database of cellular calls. These indicators are used to create profilers, which then serve as features to a system that combines evidence from multiple profilers to generate high-confidence alarms. Experiments indicate that this automatic approach performs nearly as well as the best hand-tuned methods for detecting fraud.

Proceedings Article
03 Jul 1996

Book
15 Mar 1996
TL;DR: Embedded machine learning systems for natural language processing: Acquiring and updating hierarchical knowledge for machine translation based on a clustering technique and applying an existing machine learning algorithm to text categorization.
Abstract: Learning approaches for natural language processing- Separating learning and representation- Natural language grammatical inference: A comparison of recurrent neural networks and machine learning methods- Extracting rules for grammar recognition from Cascade-2 networks- Generating English plural determiners from semantic representations: A neural network learning approach- Knowledge acquisition in concept and document spaces by using self-organizing neural networks- Using hybrid connectionist learning for speech/language analysis- SKOPE: A connectionist/symbolic architecture of spoken Korean processing- Integrating different learning approaches into a multilingual spoken language translation system- Learning language using genetic algorithms- A statistical syntactic disambiguation program and what it learns- Training stochastic grammars on semantical categories- Learning restricted probabilistic link grammars- Learning PP attachment from corpus statistics- A minimum description length approach to grammar inference- Automatic classification of dialog acts with Semantic Classification Trees and Polygrams- Sample selection in natural language learning- Learning information extraction patterns from examples- Implications of an automatic lexical acquisition system- Using learned extraction patterns for text classification- Issues in inductive learning of domain-specific text extraction rules- Applying machine learning to anaphora resolution- Embedded machine learning systems for natural language processing: A general framework- Acquiring and updating hierarchical knowledge for machine translation based on a clustering technique- Applying an existing machine learning algorithm to text categorization- Comparative results on using inductive logic programming for corpus-based parser construction- Learning the past tense of English verbs using inductive logic programming- A dynamic approach to paradigm-driven analogy- Can punctuation help learning?- Using parsed corpora for circumventing parsing- A symbolic and surgical acquisition of terms through variation- A revision learner to acquire verb selection rules from human-made rules and examples- Learning from texts - A terminological metareasoning perspective

Proceedings Article
04 Aug 1996
TL;DR: This paper analyzes the effects of polynomial-space-bounded learning on runtime complexity of backtrack search and finds that relevance- bounded learning allows better runtime bounds than size-bounds learning on structurally restricted constraint satisfaction problems.
Abstract: Learning during backtrack search is a space-intensive process that records information (such as additional constraints) in order to avoid redundant work. In this paper, we analyze the effects of polynomial-space-bounded learning on runtime complexity of backtrack search. One space-bounded learning scheme records only those constramts With limited size, and another records arbitrarily large constraints but deletes those that become irrelevant to the portion of the search space being explored. We find that relevance-bounded learning allows better runtime bounds than size-bounded learning on structurally restricted constraint satisfaction problems. Even when restricted to linear space, our relevance-bounded learning algorithm has runtime complexity near that of unrestricted (exponential space-consuming) learning schemes.

Book ChapterDOI
01 Jan 1996
TL;DR: The proposed concave minimization formulation, a successive linearization algorithm without stepsize terminates after a maximum average of 7 linear programs on problems with as many as 4192 points in 14-dimensional space, is quite effective and more efficient than other approaches.
Abstract: Two fundamental problems of machine learning, misclassification minimization [10, 24, 18] and feature selection, [25, 29, 14] are formulated as the minimization of a concave function on a polyhedral set. Other formulations of these problems utilize linear programs with equilibrium constraints [18, 1, 4, 3] which are generally intractable. In contrast, for the proposed concave minimization formulation, a successive linearization algorithm without stepsize terminates after a maximum average of 7 linear programs on problems with as many as 4192 points in 14-dimensional space. The algorithm terminates at a stationary point or a global solution to the problem. Preliminary numerical results indicate that the proposed approach is quite effective and more efficient than other approaches.

Journal ArticleDOI
TL;DR: This brief article introduces a learning controller developed by synthesizing several basic ideas from fuzzy set and control theory, self-organizing control, and conventional adaptive control that can achieve high performance learning control for a nonlinear time-varying rocket velocity control problem and a multi-input multi-output two-degree-of-freedom robot manipulator.
Abstract: A learning system possesses the capability to improve its performance over time by interaction with its environment. A learning control system is designed so that its learning controller has the ability to improve the performance of the closed-loop system by generating command inputs to the plant and utilizing feedback information from the plant. In this brief article, we introduce a learning controller that is developed by synthesizing several basic ideas from fuzzy set and control theory, self-organizing control, and conventional adaptive control. We utilize a learning mechanism that observes the plant outputs and adjusts the membership functions of the rules in a direct fuzzy controller so that the overall system behaves like reference model. The effectiveness of this fuzzy model reference learning controller is illustrated by showing that it can achieve high performance learning control for a nonlinear time-varying rocket velocity control problem and a multi-input multi-output two-degree-of-freedom robot manipulator.

Journal ArticleDOI
TL;DR: A new incremental learning method for pattern recognition is presented, called the "incremental backpropagation learning network", which employs bounded weight modification and structural adaptation learning rules and applies initial knowledge to constrain the learning process.
Abstract: How to learn new knowledge without forgetting old knowledge is a key issue in designing an incremental-learning neural network. In this paper, we present a new incremental learning method for pattern recognition, called the "incremental backpropagation learning network", which employs bounded weight modification and structural adaptation learning rules and applies initial knowledge to constrain the learning process. The viability of this approach is demonstrated for classification problems including the iris and the promoter domains.

Journal ArticleDOI
TL;DR: This paper shows how a particular first-order learning system is modified to customize it for finding definitions of functional relations, which leads to faster learning times and, in some cases, to definitions that have higher predictive accuracy.
Abstract: First-order learning involves finding a clause-form definition of a relation from examples of the relation and relevant background information. In this paper, a particular first-order learning system is modified to customize it for finding definitions of functional relations. This restriction leads to faster learning times and, in some cases, to definitions that have higher predictive accuracy. Other first-order learning systems might benefit from similar specialization.

Proceedings ArticleDOI
04 Nov 1996
TL;DR: A method of modular learning which coordinates multiple behaviors taking account of a trade-off between learning time and performance is presented, applied to one to one soccer playing robots.
Abstract: Coordination of multiple behaviors independently obtained by a reinforcement learning method is one of the issues in order for the method to be scaled to larger and more complex robot learning tasks. Direct combination of all the state spaces for individual modules (subtasks) needs enormous learning time, and it causes hidden states. This paper presents a method of modular learning which coordinates multiple behaviors taking account of a trade-off between learning time and performance. First, in order to reduce the learning time the whole state space is classified into two categories based on the action values separately obtained by Q learning: the area where one of the learned behaviors is directly applicable (no more learning area), and the area where learning is necessary due to competition of multiple behaviors (re-learning area). Second, hidden states are detected by model fitting to the learned action values based on the information criterion. Finally, the initial action valves in the re-learning area are adjusted so that they can be consistent with the values in the no more learning area. The method is applied to one to one soccer playing robots. Computer simulation and real robot experiments are given, to show the validity of the proposed method.

Journal ArticleDOI
TL;DR: It is proved that incremental learning can be always simulated by inference devices that are both set-driven and conservative and feed-back learning is shown to be more powerful than iterative inference, and its learning power is incomparable to that of bounded example memory inference.

Book ChapterDOI
01 Jan 1996
TL;DR: A new approach to predicting a given example’s class by locating it in the “example space” and then choosing the best learner in that region of the example space to make predictions, which is compared to other methods for selecting from multiple learning algorithms.
Abstract: Determining the conditions for which a given learning algorithm is appropriate is an open problem in machine learning. Methods for selecting a learning algorithm for a given domain have met with limited success. This paper proposes a new approach to predicting a given example’s class by locating it in the “example space” and then choosing the best learner(s) in that region of the example space to make predictions. The regions of the example space are defined by the prediction patterns of the learners being used. The learner(s) chosen for prediction are selected according to their past performance in that region. This dynamic approach to learning algorithm selection is compared to other methods for selecting from multiple learning algorithms. The approach is then extended to weight rather than select the algorithms according to their past performance in a given region. Both approaches are further evaluated on a set of ten domains and compared to several other meta-learning strategies.

Proceedings Article
03 Dec 1996
TL;DR: An adaptive on-line algorithm extending the learning of learning idea can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available.
Abstract: An adaptive on-line algorithm extending the learning of learning idea is proposed and theoretically motivated. Relying only on gradient flow information it can be applied to learning continuous functions or distributions, even when no explicit loss function is given and the Hessian is not available. Its efficiency is demonstrated for a non-stationary blind separation task of acoustic signals.

Journal ArticleDOI
01 Feb 1996
TL;DR: This paper shows how the subsequent "dynamically focused learning" (DFL) can be used to enhance the performance of the "fuzzy model reference learning controller" (FMRLC) and furthermore it performs comparative analysis with a conventional adaptive control technique.
Abstract: A "learning system" possesses the capability to improve its performance over time by interacting with its environment. A learning control system is designed so that its "learning controller" has the ability to improve the performance of the closed-loop system by generating command inputs to the plant and utilizing feedback information from the plant. Learning controllers are often designed to mimic the manner in which a human in the control loop would learn how to control a system while it operates. Some characteristics of this human learning process may include: (i) a natural tendency for the human to focus their learning by paying particular attention to the current operating conditions of the system since these may be most relevant to determining how to enhance performance; (ii) after learning how to control the plant for some operating condition, if the operating conditions change, then the best way to control the system may have to be re-learned; and (iii) a human with a significant amount of experience at controlling the system in one operating region should not forget this experience if the operating condition changes. To mimic these types of human learning behavior, we introduce three strategies that can be used to dynamically focus a learning controller onto the current operating region of the system. We show how the subsequent "dynamically focused learning" (DFL) can be used to enhance the performance of the "fuzzy model reference learning controller" (FMRLC) and furthermore we perform comparative analysis with a conventional adaptive control technique. A magnetic ball suspension system is used throughout the paper to perform the comparative analyses, and to illustrate the concept of dynamically focused fuzzy learning control.

Journal Article
TL;DR: This paper formally distinguish three types of features: primary, contextual, and irrelevant features, and formally define what it means for one feature to be context-sensitive to another feature.
Abstract: A large body of research in machine learning is concerned with supervised learning from examples. The examples are typically represented as vectors in a multi- dimensional feature space (also known as attribute-value descriptions). A teacher partitions a set of training examples into a finite number of classes. The task of the learning algorithm is to induce a concept from the training examples. In this paper, we formally distinguish three types of features: primary, contextual, and irrelevant features. We also formally define what it means for one feature to be context-sensitive to another feature. Context-sensitive features complicate the task of the learner and potentially impair the learner's performance. Our formal definitions make it possible for a learner to automatically identify context-sensitive features. After context-sensitive features have been identified, there are several strategies that the learner can employ for managing the features; however, a discussion of these strategies is outside of the scope of this paper. The formal definitions presented here correct a flaw in previously proposed definitions. We discuss the relationship between our work and a formal definition of relevance.

Journal ArticleDOI
TL;DR: Simulation results show that the learning speed achieved by the method is superior to that of other adaptive selection methods.

Patent
26 Aug 1996
TL;DR: In this article, the authors present a method, computer program product, and system for teaching reinforcing concepts, principals, and other learned information without requiring user initiation of a learning sequence.
Abstract: In a given environment where a user may work, play, or otherwise interact, such as the environment provided by a system comprising computer hardware and software, the present invention provides a method, computer program product, and system for teaching reinforcing concepts, principals, and other learned information without requiring user initiation of a learning sequence. Learning or reinforcement occurs by presenting "learning frames" in the environment automatically without requiring user initiation of the learning sequence. The user of the environment receives these intrusive or non-intrusive opportunities for learning while doing other tasks within the environment and may be interrupted from the task at hand and be required to respond to the presented learning frame or may simply have the opportunity for learning without requiring interruption of the task at hand according to the implementation of the present invention. In this manner, learning occurs as a by-product of other useful work, play, or other interaction with the environment and does not require dedicated user time and overt effort.

01 Jan 1996
TL;DR: A survey of results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of on-line algorithms research can be found in this article.
Abstract: : The areas of On-Line Algorithms and Machine Learning are both concerned with problems of making decisions about the present based only on knowledge of the past. Although these areas differ in terms of their emphasis and the problems typically studied, there are a collection of results in Computational Learning Theory that fit nicely into the 'on-line algorithms' framework. This survey article discusses some of the results, models, and open problems from Computational Learning Theory that seem particularly interesting from the point of view of on-line algorithms research. The emphasis in this article is on describing some of the simpler, more intuitive results, whose proofs can be given in their entirety. Pointers to the literature are given for more sophisticated versions of these algorithms.

Journal ArticleDOI
TL;DR: In this paper, a rule-based inductive learning algorithm called multiscale classification (MSC) is proposed to classify the training data by successively splitting the feature space in half.
Abstract: Proposes a rule-based inductive learning algorithm called multiscale classification (MSC). It can be applied to any N-dimensional real or binary classification problem to classify the training data by successively splitting the feature space in half. The algorithm has several significant differences from existing rule-based approaches: learning is incremental, the tree is non-binary, and backtracking of decisions is possible to some extent. The paper first provides background on current machine learning techniques and outlines some of their strengths and weaknesses. It then describes the MSC algorithm and compares it to other inductive learning algorithms with particular reference to ID3, C4.5, and back-propagation neural networks. Its performance on a number of standard benchmark problems is then discussed and related to standard learning issues such as generalization, representational power, and over-specialization.