scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 1999"


Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations


01 Apr 1999
TL;DR: This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems and performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases.
Abstract: Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and filters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, filters have proven to be more practical than wrappers because they are much faster. However, most existing filter algorithms only work with discrete classification problems. This paper describes a fast, correlation-based filter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does—reducing the data dimensionality by fifty percent in most cases. Also, decision and model trees built from the preprocessed data are often significantly smaller.

1,653 citations


Book
08 Feb 1999
TL;DR: This chapter presents algorithmic and computational results developed for SV M light V2.0, which make large-scale SVM training more practical and give guidelines for the application of SVMs to large domains.
Abstract: Training a support vector machine (SVM) leads to a quadratic optimization problem with bound constraints and one linear equality constraint. Despite the fact that this type of problem is well understood, there are many issues to be considered in designing an SVM learner. In particular, for large learning tasks with many training examples, oo-the-shelf optimization techniques for general quadratic programs quickly become intractable in their memory and time requirements. SV M light1 is an implementation of an SVM learner which addresses the problem of large tasks. This chapter presents algorithmic and computational results developed for SV M light V2.0, which make large-scale SVM training more practical. The results give guidelines for the application of SVMs to large domains.

1,386 citations


Book ChapterDOI
01 May 1999
TL;DR: This framework encompasses the most common online learning algorithms in use today, as illustrated by several examples, and provides general results describing the convergence of all these learning algorithms at once.
Abstract: The convergence of online learning algorithms is analyzed using the tools of the stochastic approximation theory, and proved under very weak conditions. A general framework for online learning algorithms is first presented. This framework encompasses the most common online learning algorithms in use today, as illustrated by several examples. The stochastic approximation theory then provides general results describing the convergence of all these learning algorithms at once. Revised version, October 15th 2012.

569 citations


Proceedings Article
29 Nov 1999
TL;DR: This work considers the problem of learning a grid-based map using a robot with noisy sensors and actuators and introduces a method for approximating the Bayesian solution, called Rao-Blackwellised particle filtering, which is fast but accurate.
Abstract: We consider the problem of learning a grid-based map using a robot with noisy sensors and actuators. We compare two approaches: online EM, where the map is treated as a fixed parameter, and Bayesian inference, where the map is a (matrix-valued) random variable. We show that even on a very simple example, online EM can get stuck in local minima, which causes the robot to get "lost" and the resulting map to be useless. By contrast, the Bayesian approach, by maintaining multiple hypotheses, is much more robust. We then introduce a method for approximating the Bayesian solution, called Rao-Blackwellised particle filtering. We show that this approximation, when coupled with an active learning strategy, is fast but accurate.

523 citations


01 Jan 1999
TL;DR: It is demonstrated that a manually-constructed model that contains multiple states per extraction field outperforms a model with one state per field, and the use of distantly-labeled data to set model parameters provides a significant improvement in extraction accuracy.
Abstract: Statistical machine learning techniques, while well proven in fields such as speech recognition, are just beginning to be applied to the information extraction domain. We explore the use of hidden Markov models for information extraction tasks, specifically focusing on how to learn model structure from data and how to make the best use of labeled and unlabeled data. We show that a manually-constructed model that contains multiple states per extraction field outperforms a model with one state per field, and discuss strategies for learning the model structure automatically from data. We also demonstrate that the use of distantly-labeled data to set model parameters provides a significant improvement in extraction accuracy. Our models are applied to the task of extracting important fields from the headers of computer science research papers, and achieve an extraction accuracy of 92.9%.

449 citations


Proceedings Article
27 Jun 1999
TL;DR: It is shown that active learning can signicantly reduce the number of annotated examples required to achieve a given level of performance for these complex tasks: semantic parsing and information extraction.
Abstract: In natural language acquisition, it is dicult to gather the annotated data needed for supervised learning; however, unannotated data is fairly plentiful. Active learning methods attempt to select for annotation and training only the most informative examples, and therefore are potentially very useful in natural language applications. However, existing results for active learning have only considered standard classication tasks. To reduce annotation eort while maintaining accuracy, we apply active learning to two non-classication tasks in natural language processing: semantic parsing and information extraction. We show that active learning can signicantly reduce the number of annotated examples required to achieve a given level of performance for these complex tasks.

356 citations


Proceedings ArticleDOI
01 Aug 1999
TL;DR: Empirical results using benchmark machine learning datasets are provided to show that support vectors form a svccdnct and suficient set for block-by-block incremental learning.
Abstract: With the increase in the size of real-world databases, there is an ever-increasing need to scale up inductive learning algorithms. Incremental learning techniques are one possible solution to the scalability problem. In this paper, we propose three ctiteria to evaluate the robustness and reliability of incremental learning methods, and use them to study the robustness of an incremental training method for Support Vector Machines. We provide empirical results using benchmark machine learning datasets to show that support vectors form a svccdnct and suficient set for block-by-block incremental learning.

276 citations


Proceedings Article
27 Jun 1999
TL;DR: This paper presents an algorithm for learning a value function that maps hyperlinks to future discounted reward using a naive Bayes text classifier and shows a threefold improvement in spidering efficiency over traditional breadth-first search, and up to a two-fold improvement over reinforcement learning with immediate reward.
Abstract: Consider the task of exploring the Web in order to find pages of a particular kind or on a particular topic. This task arises in the construction of search engines and Web knowledge bases. This paper argues that the creation of efficient web spiders is best framed and solved by reinforcement learning, a branch of machine learning that concerns itself with optimal sequential decision making. One strength of reinforcement learning is that it provides a formalism for measuring the utility of actions that give benefit only in the future. We present an algorithm for learning a value function that maps hyperlinks to future discounted reward using a naive Bayes text classifier. Experiments on two real-world spidering tasks show a threefold improvement in spidering efficiency over traditional breadth-first search, and up to a two-fold improvement over reinforcement learning with immediate reward only.

262 citations


Journal ArticleDOI
TL;DR: It is shown that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy, and that decision-tree learning often performs worse than memory-based learning.
Abstract: We show that in language learning, contrary to received wisdom, keeping exceptional training instances in memory can be beneficial for generalization accuracy. We investigate this phenomenon empirically on a selection of benchmark natural language processing tasks: grapheme-to-phoneme conversion, part-of-speech tagging, prepositional-phrase attachment, and base noun phrase chunking. In a first series of experiments we combine memory-based learning with training set editing techniques, in which instances are edited based on their typicality and class prediction strength. Results show that editing exceptional instances (with low typicality or low class prediction strength) tends to harm generalization accuracy. In a second series of experiments we compare memory-based learning and decision-tree learning methods on the same selection of tasks, and find that decision-tree learning often performs worse than memory-based learning. Moreover, the decrease in performance can be linked to the degree of abstraction from exceptions (i.e., pruning or eagerness). We provide explanations for both results in terms of the properties of the natural language processing tasks and the learning algorithms.

245 citations


Proceedings Article
27 Jun 1999
TL;DR: Machine-LearningApplicationsofAlgorithmicRandomnessVolodyaovk,AlexGammerman,CraigSaundersComputerLearningResearchCentreandDepartment ofScienceRoyalHollowa,UniversitofLondon,Egham,SurreyTW200EX,Englandfvovk ,alex,craigg@dcs.rhbnc.ac.ukAbstractMostmachinelearningalgorithmsshare the followingdrawback: they onlyoutputbarepredictions but not
Abstract: Machine-LearningApplicationsofAlgorithmicRandomnessVolodyaovk,AlexGammerman,CraigSaundersComputerLearningResearchCentreandDepartmentofScienceRoyalHollowa,UniversitofLondon,Egham,SurreyTW200EX,Englandfvovk,alex,craigg@dcs.rhbnc.ac.ukMostmachinelearningalgorithmssharethefollowingdrawback:theyonlyoutputbarepredictionsbutnotthecon denceinthosepredictions.Inthe1960salgorithmicinfor-mationtheorysupplieduniversalmeasuresofcon dencebuttheseare,unfortunately,non-computable.Inthispap erwecombinetheideasofalgorithmicinformationtheorywiththetheoryofSupp ortVectormachinestoobtainpracticableapproximationsuni-versalmeasuresofcon dence.Weshowthatinsomestandardproblemsofpatternrecog-nitionourapproximationsworkell.1INTRODUCTIONTwoimp ortantdi erencesofmostmo dernmetho dsmachinelearning(suchasstatisticaltheory,seeVapnik[21],1998,orPACtheory)fromclassicalstatisticalmetho dsarethat:machinelearningmetho dspro ducebarepredic-tions,withoutestimatingcon denceinthosepre-dictions(unlike,eg,predictionoffutureobser-vationsintraditionalstatistics(Guttman[5],1970));manymachinelearningmetho dsaredesignedtowork(andtheirp erformanceisanalysed)un-derthegeneraliidassumption(unlikeclas-sicalparametricstatistics)andtheyareabletodealwithextremelyhigh-dimensionalhyp othesisspaces;cfVapnik[21](1998).Inthispap erwewillfurtherdeveloptheapproachofGammermanetal[4](1998)andSaunders[17Figure1:Ifthetrainingsetonlycontainsclear2sand7s,weouldliktoattachmucloercon dencethemiddleimagethantorightandleftones(1999),wherethegoalistoobtaincon dencesforpredictionsunderthegeneraliidassumptioninhigh-dimensionalsituations.Figure1demonstratesthede-sirabilityofcon dences.Themaincontributionthispap erisemb eddingtheapproachesofGammermanetal[4](1998)andSaunderset[17(1999)intoagen-eralschemebasedonthenotionofalgorithmicran-domness.Aswillb ecomeclearlater,theproblemofassigningcon dencestopredictionsiscloselyconnectedtheproblemofde ningrandomsequences.ThelatterproblemwassolvedbyKolmogorov[8](1965),whobasedhisde nitionontheexistenceUniver-salTuringMachine(thoughitb ecameclearthatKol-mogorov'sde nitiondo essolvetheproblemofde ningrandomsequencesonlyafterMartin-Lof 'spap er[15],1966);Kolmogorov'sde nitionmovedthenotionofrandomnessfromthegreyareasurroundingprobabil-itytheoryandstatisticstomathematicalcomputersci-ence.Kolmogorovb elievedhisnotionofrandomnesstob easuitablebasisforapplicationsofprobability.Unfor-tunately,fateideaasdi erentfromKol-mogorov's1933axioms(Kolmogorov[7],1933),which

Proceedings ArticleDOI
15 Mar 1999
TL;DR: This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition and presents results on several standard vowel and phonetic Classification tasks and shows better performance than Gaussian mixture classifiers.
Abstract: Support vector machines (SVMs) represent a new approach to pattern classification which has attracted a great deal of interest in the machine learning community. Their appeal lies in their strong connection to the underlying statistical learning theory, in particular the theory of structural risk minimization. SVMs have been shown to be particularly successful in fields such as image identification and face recognition; in many problems SVM classifiers have been shown to perform much better than other nonlinear classifiers such as artificial neural networks and k-nearest neighbors. This paper explores the issues involved in applying SVMs to phonetic classification as a first step to speech recognition. We present results on several standard vowel and phonetic classification tasks and show better performance than Gaussian mixture classifiers. We also present an analysis of the difficulties we foresee in applying SVMs to continuous speech recognition problems.

Book ChapterDOI
01 May 1999
TL;DR: Choosing this family to be Gaussian, it is shown that the algorithm achieves asymptotic efficiency and an application to learning in single layer neural networks is given.
Abstract: Online learning is discussed from the viewpoint of Bayesian statistical inference. By replacing the true posterior distribution with a simpler parametric distribution, one can define an online algorithm by a repetition of two steps: An update of the approximate posterior, when a new example arrives, and an optimal projection into the parametric family. Choosing this family to be Gaussian, we show that the algorithm achieves asymptotic efficiency. An application to learning in single layer neural networks is given.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: This work reports high correct classification of unseen views, especially considering that no domain knowledge is including into the proposed system, and suggests an active learning algorithm to reduce further the required number of training views.
Abstract: Support vector machines have demonstrated excellent results in pattern recognition tasks and 3D object recognition. We confirm some of the results in 3D object recognition and compare it to other object recognition systems. We use different pixel-level representations to perform the experiments, while we extend the setting to the more challenging and practical case when only a limited number of views of the object are presented during training. We report high correct classification of unseen views, especially considering that no domain knowledge is including into the proposed system. Finally, we suggest an active learning algorithm to reduce further the required number of training views.

Journal ArticleDOI
TL;DR: A procedure is a specific way of making this association; a procedure is optimal if the sequence of choices it generates converges to the action that maximizes the expected payoff.

Journal ArticleDOI
TL;DR: A fuzzy learning algorithm based on the maximum information gain is proposed to manage linguistic information and generates fuzzy rules from “soft” instances, which differ from conventional instances in that they have class membership values.

Journal ArticleDOI
TL;DR: In this paper, a hierarchy of more and more powerful feedback learners in dependence on the number k of queries allowed to be asked is established, and the union of at most k pattern languages is shown to be iteratively inferable.
Abstract: Important refinements of concept learning in the limit from positive data considerably restricting the accessibility of input data are studied. Let c be any concept; every infinite sequence of elements exhausting c is called positive presentation of c. In all learning models considered the learning machine computes a sequence of hypotheses about the target concept from a positive presentation of it. With iterative learning, the learning machine, in making a conjecture, has access to its previous conjecture and the latest data items coming in. In k-bounded example-memory inference (k is a priori fixed) the learner is allowed to access, in making a conjecture, its previous hypothesis, its memory of up to k data items it has already seen, and the next element coming in. In the case of k-feedback identification, the learning machine, in making a conjecture, has access to its previous conjecture, the latest data item coming in, and, on the basis of this information, it can compute k items and query the database of previous data to find out, for each of the k items, whether or not it is in the database (k is again a priori fixed). In all cases, the sequence of conjectures has to converge to a hypothesis correctly describing the target concept. Our results are manyfold. An infinite hierarchy of more and more powerful feedback learners in dependence on the number k of queries allowed to be asked is established. However, the hierarchy collapses to 1-feedback inference if only indexed families of infinite concepts are considered, and moreover, its learning power is then equal to learning in the limit. But it remains infinite for concept classes of only infinite r.e. concepts. Both k-feedback inference and k-bounded example-memory identification are more powerful than iterative learning but incomparable to one another. Furthermore, there are cases where redundancy in the hypothesis space is shown to be a resource increasing the learning power of iterative learners. Finally, the union of at most k pattern languages is shown to be iteratively inferable.

Journal ArticleDOI
01 Dec 1999
TL;DR: An adaptive method that allows mobile robots to learn cognitive maps of indoor environments incrementally and online by means of a variable-resolution partitioning that discretizes the world in perceptually homogeneous regions is presented.
Abstract: This paper presents an adaptive method that allows mobile robots to learn cognitive maps of indoor environments incrementally and online. Our approach models the environment. By means of a variable-resolution partitioning that discretizes the world in perceptually homogeneous regions. The resulting model incorporates both a compact geometrical representation of the environment and a topological map of the spatial relationships between its obstacle-free areas. The efficiency of the learning process is based on the use of local memory-based techniques for partitioning and of active learning techniques for selecting the most appropriate region to be explored next. In addition, a feedforward neural network is used to interpret sensor readings. We present experimental results obtained with two different mobile robots, the Nomad 200 and Khepera. The current implementation of the method relies on the assumption that obstacles are parallel or perpendicular to each other. This results in variable-resolution partitioning consisting of simple rectangular partitions and reduces the complexity of treating the underlying geometrical properties.

09 Feb 1999
TL;DR: A new principle – homeokinesis – is introduced which is completely unspecific and yet induces specific, seemingly goal–oriented behaviors of an agent in a complex external world.
Abstract: It is well known that individual learning can speed up artificial evolution enormously. However both supervised learning and reinforcement learning require specific learning goals which usually are not available or difficult to find. We introduce a new principle – homeokinesis –which is completely unspecific and yet induces specific, seemingly goal–oriented behaviors of an agent in a complex external world. The principle is based on the assumption that the agent is equipped with an adaptive model of its behavior. A learning signal for both the model and the controller is derived from the misfit between the real behavior of the agent in the world and that predicted by the model. If the structural complexity of the model is chosen adequately, this misfit is minimized if the agent exhibits a smooth controlled behavior. The principle is explicated by two examples. We moreover discuss how functional modularization emerges in a natural way in a structured system from a mechanism of competition for the best internal representation.

Book ChapterDOI
01 Jan 1999
TL;DR: This work has shown that inductive machine learning approaches such as connectionist learning algorithms, decision tree induction and case-based learning have interesting properties not present in existing statistical and rule-based approaches to the syntactic wordclass disambiguation problem.
Abstract: The usefulness and feasibility of automatically training a syntactic wordclass tagger instead of hand-crafting it motivated a large body of work on statistical and rule-learning approaches to the problem. Syntactic wordclass taggers trained on corpora are claimed to be equally accurate as, and more robust and more portable than, hand-crafted systems1. Moreover, development time is considerably faster. Recently, inductive machine learning approaches such as connectionist learning algorithms, decision tree induction and case-based learning have also been applied to the syntactic wordclass disambiguation problem. In some cases these approaches have interesting properties not present in existing statistical and rule-based approaches.

Book ChapterDOI
01 Jun 1999
TL;DR: ELM-ART, an example of a substantial adaptive learning system on the WWW, uses several adaptive techniques and offers some degree of adaptability, and is based on two different types of user models.
Abstract: With the steadily growing demand for further education, the World Wide Web is becoming a more and more popular vehicle for delivering on-line learning courses. A challenging research goal is the development of advanced Web-based learning applications that can offer some amount of interactivity and adaptivity in order to support learners who their start learning with different background knowledge and skills. In existing on-line learning systems, some types of adaptivity and adaptability require different types of user models. This paper briefly introduces ELM-ART, an example of a substantial adaptive learning system on the WWW. It uses several adaptive techniques and offers some degree of adaptability. The adaptive techniques are based on two different types of user models: a multi-layered overlay model that allows for sophisticated link annotation and individual curriculum sequencing; and an episodic learner model that enables the system to analyze and diagnose problem solutions and to offer individualized examples to programming problems. The last section gives an overview of empirical results with adaptive learning systems and discusses the problems concerned with the evaluation of complex learning systems in real-world learning situations.

Journal ArticleDOI
TL;DR: The present paper gives a unified framework of statistical analysis for batch and on-line learning, which includes the asymptotic learning curve, generalization error and training error, over-fitting and over-training, efficiency of learning, and an adaptive method of determining learning rate.

Proceedings ArticleDOI
01 Jan 1999
TL;DR: The adaptive margin (AM-) SVM is proposed, a reformulation of the minimization problem such that adaptive margins for each training pattern are utilized, which gives bounds on the generalization error of AM-SVMs which justify their robustness against outliers.
Abstract: We propose a learning algorithm for classification learning based on the support vector machine (SVM) approach. Existing approaches for constructing SVMs are based on minimization of a regularized margin loss where the margin is treated equivalently for each training pattern. We propose a reformulation of the minimization problem such that adaptive margins for each training pattern are utilized, which we call the adaptive margin (AM-) SVM. We give bounds on the generalization error of AM-SVMs which justify their robustness against outliers, and show experimentally that the generalization error of AM-SVMs is comparable to classical SVMs on benchmark datasets from the UCI repository.

Journal ArticleDOI
01 Sep 1999
TL;DR: This paper reviews recent advances in supervised learning with a focus on two most important issues: performance and efficiency, and focuses on a special type of adaptive learning systems with a neural architecture.
Abstract: This paper reviews recent advances in supervised learning with a focus on two most important issues: performance and efficiency. Performance addresses the generalization capability of a learning machine on randomly chosen samples that are not included in a training set. Efficiency deals with the complexity of a learning machine in both space and time. As these two issues are general to various learning machines and learning approaches, we focus on a special type of adaptive learning systems with a neural architecture. We discuss four types of learning approaches: training an individual model; combinations of several well-trained models; combinations of many weak models; and evolutionary computation of models. We explore advantages and weaknesses of each approach and their interrelations, and we pose open questions for possible future research.

Journal ArticleDOI
TL;DR: A study demonstrates the integration of these approaches can both improve the accuracy of the developed knowledge base and reduce development time and it was found that users expected the expert systems created through the integrated approach to have higher accuracy and were less difficult to use.
Abstract: Machine learning and knowledge acquisition from experts have distinct capabilities that appear to complement one another. We report a study that demonstrates the integration of these approaches can both improve the accuracy of the developed knowledge base and reduce development time. In addition, we found that users expected the expert systems created through the integrated approach to have higher accuracy than those created without machine learning and rated the integrated approach less difficult to use. They also provided favorable evaluations of both the specific integrated software, a system called The Knowledge Factory, and of the general value of machine learning for knowledge acquisition.

Proceedings Article
27 Jun 1999
TL;DR: The knowledge required for task performance is described, how this knowledge is learned is described by KnoMic, and efforts to learn performance knowledge in the tactical air combat domain and the computer game Quake II are reported on.
Abstract: Developing automated agents that intelligently perform complex real world tasks is time consuming and expensive. The most expensive part of developing these intelligent task performance agents involves extracting knowledge from human experts and encoding it into a form useable by automated agents. Machine learning from a sufficiently rich and focused knowledge source can significantly reduce the cost of developing intelligent performance agents by automating the knowledge acquisition and encoding process. Potential knowledge sources include instructions from human experts, experiments performed in the task environment and observation of an expert performing the task. Observation is particularly well suited to learning hierarchical performance knowledge for tasks that require realistic, human-like behavior. Our learning by observation system, called KnoMic (Knowledge Mimic), extracts knowledge from observations of an expert performing a task and generalizes this knowledge into rules that an agent can use to perform the same task. Learning performance knowledge by observation is more efficient than hand-coding the knowledge in a number of ways. Knowledge can be encoded directly from the expert without the need for a knowledge engineer to act as an intermediary. Also, the expert only needs to demonstrate the task rather than organize and communicate all the relevant information. This paper will describe the knowledge required for task performance, describe how this knowledge is learned by KnoMic, and report on our efforts to learn performance knowledge in the tactical air combat domain and the computer game Quake II.

Book ChapterDOI
15 Sep 1999
TL;DR: The aim of meta-level learning is to relate the performance of different machine learning algorithms to the characteristics of the dataset to induced on the basis of empirical data about the performance on the different datasets.
Abstract: When considering new datasets for analysis with machine learning algorithms, we encounter the problem of choosing the algorithm which is best suited for the task at hand. The aim of meta-level learning is to relate the performance of different machine learning algorithms to the characteristics of the dataset. The relation is induced on the basis of empirical data about the performance of machine learning algorithms on the different datasets.

Journal ArticleDOI
TL;DR: This article reviews the work over the last 10 years in the area of supervised learning, focusing on three interlinked directions of research -- theory, engineering applications, and neuroscience -- that contribute to and complement each other.
Abstract: The problem of learning is arguably at the very core of the problem of intelligence, both biological and artificial. In this article, we review our work over the last 10 years in the area of supervised learning, focusing on three interlinked directions of research -- (1) theory, (2) engineering applications (making intelligent software), and (3) neuroscience (understanding the brain's mechanisms of learnings) -- that contribute to and complement each other.

Proceedings Article
31 Jul 1999
TL;DR: An integrated problem-solving model which will learn introspectively feature weights in a case base in order to be responsive dynamically to its users and has the advantage of being able to capture accurate learning information in the interactions with its users.
Abstract: Recently more and more researchers have been supporting the view that learning is a goaldriven process. One of the key properties of a goal-driven learner is introspectiveness-the ability to notice the gaps in its knowledge and to reason about the information required to fill in those gaps. In this paper, we introduce a quantitative introspective learning paradigm into case-based reasoning (CBR). The result is an integrated problem-solving model which will learn introspectively feature weights in a case base in order to be responsive dynamically to its users. In contrast to the existing qualitative methods for introspective learning, our model has the advantage of being able to capture accurate learning information in the interactions with its users. A CBR system equipped with quantitative introspective learning ability can allow the feature weights to be captured automatically and to track its users' changing preferences continuously. In such a system, while the reasoning part is still case-based, the learning part is shouldered by a quantitative introspective learning model. Weight learning and evolution are accomplished in the background. The effectiveness of this integration will be demonstrated through a series of empirical experiments.

Book ChapterDOI
01 Jun 1999
TL;DR: This paper presents and compares adaptive systems that use either knowledge representation or machine learning for user modeling, and presents the LaboUr (Learning about the User) approach to user modeling which attempts to take an ideal position in the resulting multi-dimensional space by combining machine learning and knowledge representation techniques.
Abstract: In early user-adaptive systems, the use of knowledge representation methods for user modeling has often been the focus of research. In recent years, however, the application of machine learning techniques to control user-adapted interaction has become popular. In this paper, we present and compare adaptive systems that use either knowledge representation or machine learning for user modeling. Based on this comparison, several dimensions are identified that can be used to distinguish both approaches, but also to characterize user modeling systems in general. The LaboUr (Learning about the User) approach to user modeling is presented which attempts to take an ideal position in the resulting multi-dimensional space by combining machine learning and knowledge representation techniques. Finally, an implementation of LaboUr ideas into the information server ELFI is sketched.