scispace - formally typeset
Search or ask a question

Showing papers on "Algorithmic learning theory published in 2008"


Journal ArticleDOI
TL;DR: This paper examines learning of complex motor skills with human-like limbs, and combines the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning with the theory of stochastic policy gradient learning.

921 citations


Book ChapterDOI
11 Jan 2008

779 citations


Proceedings Article
13 Jul 2008
TL;DR: The main contributions of this work lie in the presentation of a general formalization of zero-data learning, in an experimental analysis of its properties and in empirical evidence showing that generalization is possible and significant in this context.
Abstract: We introduce the problem of zero-data learning, where a model must generalize to classes or tasks for which no training data are available and only a description of the classes or tasks are provided. Zero-data learning is useful for problems where the set of classes to distinguish or tasks to solve is very large and is not entirely covered by the training data. The main contributions of this work lie in the presentation of a general formalization of zero-data learning, in an experimental analysis of its properties and in empirical evidence showing that generalization is possible and significant in this context. The experimental work of this paper addresses two classification problems of character recognition and a multitask ranking problem in the context of drug discovery. Finally, we conclude by discussing how this new framework could lead to a novel perspective on how to extend machine learning towards AI, where an agent can be given a specification for a learning problem before attempting to solve it (with very few or even zero examples).

437 citations


Book
12 Sep 2008
TL;DR: A new view on logical and relational learning and its role in machine learning and artificial intelligence is reflected by identifying some of the lessons learned and formulating some challenges for future developments.
Abstract: I use the term logical and relational learning (LRL) to refer to the subfield of machine learning and data mining that is concerned with learning in expressive logical or relational representations. It is the union of inductive logic programming, (statistical) relational learning and multi-relational data mining and constitutes a general class of techniques and methodology for learning from structured data (such as graphs, networks, relational databases) and background knowledge. During the course of its existence, logical and relational learning has changed dramatically. Whereas early work was mainly concerned with logical issues (and even program synthesis from examples), in the 90s its focus was on the discovery of new and interpretable knowledge from structured data, often in the form of rules or patterns. Since then the range of tasks to which logical and relational learning has been applied has significantly broadened and now covers almost all machine learning problems and settings. Today, there exist logical and relational learning methods for reinforcement learning, statistical learning, distance- and kernel-based learning in addition to traditional symbolic machine learning approaches. At the same time, logical and relational learning problems are appearing everywhere. Advances in intelligent systems are enabling the generation of high-level symbolic and structured data in a wide variety of domains, including the semantic web, robotics, vision, social networks, and the life sciences, which in turn raises new challenges and opportunities for logical and relational learning. These developments have led to a new view on logical and relational learning and its role in machine learning and artificial intelligence. In this talk, I shall reflect on this view by identifying some of the lessons learned in logical and relational learning and formulating some challenges for future developments.

414 citations


Book ChapterDOI
08 Dec 2008
TL;DR: This paper extends previous work on policy learning from the immediate reward case to episodic reinforcement learning, resulting in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly well-suited for dynamic motor primitives.
Abstract: Many motor skills in humanoid robotics can be learned using parametrized motor primitives as done in imitation learning. However, most interesting motor learning problems are high-dimensional reinforcement learning problems often beyond the reach of current methods. In this paper, we extend previous work on policy learning from the immediate reward case to episodic reinforcement learning. We show that this results in a general, common framework also connected to policy gradient methods and yielding a novel algorithm for policy learning that is particularly well-suited for dynamic motor primitives. The resulting algorithm is an EM-inspired algorithm applicable to complex motor learning tasks. We compare this algorithm to several well-known parametrized policy search methods and show that it outperforms them. We apply it in the context of motor learning and show that it can learn a complex Ball-in-a-Cup task using a real Barrett WAM™ robot arm.

411 citations


Journal ArticleDOI
Yaochu Jin1, Bernhard Sendhoff1
01 May 2008
TL;DR: An overview of the existing research on multiobjective machine learning, focusing on supervised learning is provided, and a number of case studies are provided to illustrate the major benefits of the Pareto-based approach to machine learning.
Abstract: Machine learning is inherently a multiobjective task. Traditionally, however, either only one of the objectives is adopted as the cost function or multiple objectives are aggregated to a scalar cost function. This can be mainly attributed to the fact that most conventional learning algorithms can only deal with a scalar cost function. Over the last decade, efforts on solving machine learning problems using the Pareto-based multiobjective optimization methodology have gained increasing impetus, particularly due to the great success of multiobjective optimization using evolutionary algorithms and other population-based stochastic search methods. It has been shown that Pareto-based multiobjective learning approaches are more powerful compared to learning algorithms with a scalar cost function in addressing various topics of machine learning, such as clustering, feature selection, improvement of generalization ability, knowledge extraction, and ensemble generation. One common benefit of the different multiobjective learning approaches is that a deeper insight into the learning problem can be gained by analyzing the Pareto front composed of multiple Pareto-optimal solutions. This paper provides an overview of the existing research on multiobjective machine learning, focusing on supervised learning. In addition, a number of case studies are provided to illustrate the major benefits of the Pareto-based approach to machine learning, e.g., how to identify interpretable models and models that can generalize on unseen data from the obtained Pareto-optimal solutions. Three approaches to Pareto-based multiobjective ensemble generation are compared and discussed in detail. Finally, potentially interesting topics in multiobjective machine learning are suggested.

399 citations


Book ChapterDOI
01 Jan 2008
TL;DR: This chapter outlines three classical settings for inductive logic programming, namely learning from entailment, learning from interpretations, and learning from proofs or traces, and shows how they can be adapted to cover state-of-the-art statistical relational learning approaches.
Abstract: Probabilistic inductive logic programming aka. statistical relational learning addresses one of the central questions of artificial intelligence: the integration of probabilistic reasoning with machine learning and first order and relational logic representations. A rich variety of different formalisms and learning techniques have been developed. A unifying characterization of the underlying learning settings, however, is missing so far. In this chapter, we start from inductive logic programming and sketch how the inductive logic programming formalisms, settings and techniques can be extended to the statistical case. More precisely, we outline three classical settings for inductive logic programming, namely learning from entailment, learning from interpretations, and learning from proofs or traces, and show how they can be adapted to cover state-of-the-art statistical relational learning approaches.

350 citations


Journal Article
TL;DR: A general theory of which samples should be used to learn models for each source of "nearby" data is provided, applicable in a broad decision-theoretic learning framework, and yields results for classification and regression generally, and for density estimation within the exponential family.
Abstract: We consider the problem of learning accurate models from multiple sources of "nearby" data. Given distinct samples from multiple data sources and estimates of the dissimilarities between these sources, we provide a general theory of which samples should be used to learn models for each source. This theory is applicable in a broad decision-theoretic learning framework, and yields general results for classification and regression. A key component of our approach is the development of approximate triangle inequalities for expected loss, which may be of independent interest. We discuss the related problem of learning parameters of a distribution from multiple data sources. Finally, we illustrate our theory through a series of synthetic simulations.

303 citations


Journal ArticleDOI
25 Apr 2008-Science
TL;DR: Undergraduate students may benefit more from learning mathematics through a single abstract, symbolic representation than from learning multiple concrete examples.
Abstract: Undergraduate students may benefit more from learning mathematics through a single abstract, symbolic representation than from learning multiple concrete examples.

263 citations


Journal ArticleDOI
Dana Ron1
01 Mar 2008
TL;DR: This survey takes the learning-theory point of view and focuses on results for testing properties of functions that are of interest to the learning theory community, and covers results forTesting algebraic properties of function such as linearity, testing properties defined by concise representations, such as having a small DNF representation, and more.
Abstract: Property testing deals with tasks where the goal is to distinguish between the case that an object (e.g., function or graph) has a prespecified property (e.g., the function is linear or the graph is bipartite) and the case that it differs significantly from any such object. The task should be performed by observing only a very small part of the object, in particular by querying the object, and the algorithm is allowed a small failure probability. One view of property testing is as a relaxation of learning the object (obtaining an approximate representation of the object). Thus property testing algorithms can serve as a preliminary step to learning. That is, they can be applied in order to select, very efficiently, what hypothesis class to use for learning. This survey takes the learning-theory point of view and focuses on results for testing properties of functions that are of interest to the learning theory community. In particular, we cover results for testing algebraic properties of functions such as linearity, testing properties defined by concise representations, such as having a small DNF representation, and more.

157 citations


Book ChapterDOI
10 Sep 2008
TL;DR: A FOIL-like algorithm is presented that can be applied to general DL languages, discussing related theoretical aspects of learning with the inherent incompleteness underlying the semantics of this representation.
Abstract: In this paper we focus on learning concept descriptions expressed in Description Logics After stating the learning problem in this context, a FOIL-like algorithm is presented that can be applied to general DL languages, discussing related theoretical aspects of learning with the inherent incompleteness underlying the semantics of this representation Subsequently we present an experimental evaluation of the implementation of this algorithm performed on some real ontologies in order to empirically assess its performance

Journal ArticleDOI
TL;DR: This work presents a stopping criterion for active learning based on the way instances are selected during uncertainty-based sampling and verifies its applicability in a variety of settings.

Proceedings ArticleDOI
25 Oct 2008
TL;DR: Exploratory results on learning to predict potential code-switching points in Spanish-English are presented, using a transcription of code- Switched discourse to evaluate the performance of the classifiers.
Abstract: Predicting possible code-switching points can help develop more accurate methods for automatically processing mixed-language text, such as multilingual language models for speech recognition systems and syntactic analyzers. We present in this paper exploratory results on learning to predict potential code-switching points in Spanish-English. We trained different learning algorithms using a transcription of code-switched discourse. To evaluate the performance of the classifiers, we used two different criteria: 1) measuring precision, recall, and F-measure of the predictions against the reference in the transcription, and 2) rating the naturalness of artificially generated code-switched sentences. Average scores for the code-switched sentences generated by our machine learning approach were close to the scores of those generated by humans.

Proceedings ArticleDOI
06 Apr 2008
TL;DR: This paper presents interviews of eleven researchers experienced in applying statistical machine learning algorithms and techniques to human-computer interaction problems, as well as a study of ten participants working during a five-hour study to apply statistical machineLearning algorithms and Techniques to a realistic problem.
Abstract: As statistical machine learning algorithms and techniques continue to mature, many researchers and developers see statistical machine learning not only as a topic of expert study, but also as a tool for software development. Extensive prior work has studied software development, but little prior work has studied software developers applying statistical machine learning. This paper presents interviews of eleven researchers experienced in applying statistical machine learning algorithms and techniques to human-computer interaction problems, as well as a study of ten participants working during a five-hour study to apply statistical machine learning algorithms and techniques to a realistic problem. We distill three related categories of difficulties that arise in applying statistical machine learning as a tool for software development: (1) difficulty pursuing statistical machine learning as an iterative and exploratory process, (2) difficulty understanding relationships between data and the behavior of statistical machine learning algorithms, and (3) difficulty evaluating the performance of statistical machine learning algorithms and techniques in the context of applications. This paper provides important new insight into these difficulties and the need for development tools that better support the application of statistical machine learning.

Journal ArticleDOI
TL;DR: A new understanding of adaptive and generative learning within organizations is put forward, grounded in some ideas from complexity theories: mainly self-organization and implicate order.
Abstract: One of the most important classical typologies within the organizational learning literature is the distinction between adaptive and generative learning. However, the processes of these types of learning, particularly the latter, have not been widely analyzed and incorporated into the organizational learning process. This paper puts forward a new understanding of adaptive and generative learning within organizations, grounded in some ideas from complexity theories: mainly self-organization and implicate order. Adaptive learning involves any improvement or development of the explicate order through a process of self-organization. Self-organization is a self-referential process characterized by logical deductive reasoning, concentration, discussion and improvement. Generative learning involves any approach to the implicate order through a process of self-transcendence. Self-transcendence is a holo-organizational process characterized by intuition, attention, dialogue and inquiry. The main implications of the two types of learning for organizational learning are discussed.

Proceedings ArticleDOI
21 Apr 2008
TL;DR: Experimental results show that the proposed method outperforms the baseline methods for two ranking tasks (Pseudo Relevance Feedback and Topic Distillation) in web search, indicating that the suggested method can indeed make effective use of relation information and content information in ranking.
Abstract: Learning to rank is a new statistical learning technology on creating a ranking model for sorting objects. The technology has been successfully applied to web search, and is becoming one of the key machineries for building search engines. Existing approaches to learning to rank, however, did not consider the cases in which there exists relationship between the objects to be ranked, despite of the fact that such situations are very common in practice. For example, in web search, given a query certain relationships usually exist among the the retrieved documents, e.g., URL hierarchy, similarity, etc., and sometimes it is necessary to utilize the information in ranking of the documents. This paper addresses the issue and formulates it as a novel learning problem, referred to as, 'learning to rank relational objects'. In the new learning task, the ranking model is defined as a function of not only the contents (features) of objects but also the relations between objects. The paper further focuses on one setting of the learning problem in which the way of using relation information is predetermined. It formalizes the learning task as an optimization problem in the setting. The paper then proposes a new method to perform the optimization task, particularly an implementation based on SVM. Experimental results show that the proposed method outperforms the baseline methods for two ranking tasks (Pseudo Relevance Feedback and Topic Distillation) in web search, indicating that the proposed method can indeed make effective use of relation information and content information in ranking.

Journal ArticleDOI
Chen Yu1
TL;DR: This work suggests that using lexical knowledge accumulated in prior statistical learning could play an important role in vocabulary growth, and is the first model that attempts to simulate the effects of cumulative knowledge on subsequent learning using realistic data collected from child-caregiver interactions.
Abstract: There are an infinite number of possible word-to-world pairings. One way children could learn words at an early stage is by computing statistical regularities across different modalities—pairing spoken words with possible referents in the co-occurring extralinguistic environment, collecting a number of such pairs, and then figuring out the common elements. This paper provides computational evidence that such a statistical mechanism is possible for object name learning. Moreover, young children learn words much more effectively and efficiently at later stages. Could statistical learning account for this behavioral change? The current paper explores this question by presenting a developmental model of word learning that relies on a general associative mechanism and recruits previously learned words to guide subsequent word learning. This mechanism leads to increasingly fast learning and corresponding behavioral changes. Simulation studies are conducted using the data collected from a series of picture-book ...

Proceedings Article
13 Jul 2008
TL;DR: In this paper, three variants of the same evolutionary learning algorithm (NeuroEvolution of Augmenting Topologies), whose representations vary in their capacity to encode geometry, are compared in checkers.
Abstract: An important feature of many problem domains in machine learning is their geometry. For example, adjacency relationships, symmetries, and Cartesian coordinates are essential to any complete description of board games, visual recognition, or vehicle control. Yet many approaches to learning ignore such information in their representations, instead inputting flat parameter vectors with no indication of how those parameters are situated geometrically. This paper argues that such geometric information is critical to the ability of any machine learning approach to effectively generalize; even a small shift in the configuration of the task in space from what was experienced in training can go wholly unrecognized unless the algorithm is able to learn the regularities in decision-making across the problem geometry. To demonstrate the importance of learning from geometry, three variants of the same evolutionary learning algorithm (NeuroEvolution of Augmenting Topologies), whose representations vary in their capacity to encode geometry, are compared in checkers. The result is that the variant that can learn geometric regularities produces a significantly more general solution. The conclusion is that it is important to enable machine learning to detect and thereby learn from the geometry of its problems.

Journal ArticleDOI
TL;DR: This chapter presents an approach to system identification based on viewing identification as a problem in statistical learning theory, and a result is derived showing that in the case of systems with fading memory, it is possible to combine standard results in statisticallearning theory with some fading memory arguments to obtain finite time estimates of the desired kind.

Proceedings Article
08 Dec 2008
TL;DR: A rational analysis of function learning is provided, using the equivalence of Bayesian linear regression and Gaussian processes, to define a Gaussian process model of human function learning that combines the strengths of both approaches.
Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches.

Proceedings Article
08 Dec 2008
TL;DR: This first quantitative study comparing human category learning in active versus passive settings indicates that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory.
Abstract: We investigate a topic at the interface of machine learning and cognitive science. Human active learning, where learners can actively query the world for information, is contrasted with passive learning from random examples. Furthermore, we compare human active learning performance with predictions from statistical learning theory. We conduct a series of human category learning experiments inspired by a machine learning task for which active and passive learning error bounds are well understood, and dramatically distinct. Our results indicate that humans are capable of actively selecting informative queries, and in doing so learn better and faster than if they are given random training data, as predicted by learning theory. However, the improvement over passive learning is not as dramatic as that achieved by machine active learning algorithms. To the best of our knowledge, this is the first quantitative study comparing human category learning in active versus passive settings.

Book ChapterDOI
01 Jun 2008
TL;DR: This work proposes framing machine learning problems as Stackelberg games, which allows for efficient systematic search of large numbers of hyper-parameters in the resulting bilevel optimization problem.
Abstract: We examine the interplay of optimization and machine learning. Great progress has been made in machine learning by cleverly reducing machine learning problems to convex optimization problems with one or more hyper-parameters. The availability of powerful convex-programming theory and algorithms has enabled a flood of new research in machine learning models and methods. But many of the steps necessary for successful machine learning models fall outside of the convex machine learning paradigm. Thus we now propose framing machine learning problems as Stackelberg games. The resulting bilevel optimization problem allows for efficient systematic search of large numbers of hyper-parameters. We discuss recent progress in solving these bilevel problems and the many interesting optimization challenges that remain. Finally, we investigate the intriguing possibility of novel machine learning models enabled by bilevel programming.


Journal ArticleDOI
TL;DR: A neural network with a multilayer perceptron (MLP) structure as the base learning model is used and results show the effectiveness of this method in various video stream data sets.
Abstract: This paper proposes an incremental multiple-object recognition and localization (IMORL) method. The objective of IMORL is to adaptively learn multiple interesting objects in an image. Unlike the conventional multiple-object learning algorithms, the proposed method can automatically and adaptively learn from continuous video streams over the entire learning life. This kind of incremental learning capability enables the proposed approach to accumulate experience and use such knowledge to benefit future learning and the decision making process. Furthermore, IMORL can effectively handle variations in the number of instances in each data chunk over the learning life. Another important aspect analyzed in this paper is the concept drifting issue. In multiple-object learning scenarios, it is a common phenomenon that new interesting objects may be introduced during the learning life. To handle this situation, IMORL uses an adaptive learning principle to autonomously adjust to such new information. The proposed approach is independent of the base learning models, such as decision tree, neural networks, support vector machines, and others, which provide the flexibility of using this method as a general learning methodology in multiple-object learning scenarios. In this paper, we use a neural network with a multilayer perceptron (MLP) structure as the base learning model and test the performance of this method in various video stream data sets. Simulation results show the effectiveness of this method.

Journal ArticleDOI
TL;DR: The last four decades of research in that area are surveyed, with a special focus on Rolf Wiehagen's work, which has made him one of the most influential scientists in the theory of learning recursive functions.

Book ChapterDOI
01 Jan 2008
TL;DR: This chapter describes first how studies investigating this form of learning in laboratory situations have shifted from a rule-based interpretation to interpretations assuming a progressive tuning to the statistical regularities of the environment.
Abstract: All of us have learned much about language, music, physical or social environment, and other complex domains, without any intentional attempts to acquire information. This chapter describes first how studies investigating this form of learning in laboratory situations have shifted from a rule-based interpretation to interpretations assuming a progressive tuning to the statistical regularities of the environment. The next section examines the potential of statistical learning and whether statistical learning stems from statistical computations or chunk formation. Then the acceptations in which this form of learning may be qualified as implicit are analyzed. Finally, implications for the nativist/empiricist debate are discussed.

Journal ArticleDOI
TL;DR: It is argued for separation of the acquisition and classification phases in semi-supervised machine learning, and a probabilistic acquisition model is presented which is evaluated both theoretically and experimentally.

Proceedings ArticleDOI
19 Jun 2008
TL;DR: An extensive experimental study of a Statistical Machine Translation system, Moses, from the point of view of its learning capabilities is presented, most notably the integration of linguistic rules into the model inference phase, and the development of active learning procedures.
Abstract: We present an extensive experimental study of a Statistical Machine Translation system, Moses (Koehn et al., 2007), from the point of view of its learning capabilities. Very accurate learning curves are obtained, by using high-performance computing, and extrapolations are provided of the projected performance of the system under different conditions. We provide a discussion of learning curves, and we suggest that: 1) the representation power of the system is not currently a limitation to its performance, 2) the inference of its models from finite sets of i.i.d. data is responsible for current performance limitations, 3) it is unlikely that increasing dataset sizes will result in significant improvements (at least in traditional i.i.d. setting), 4) it is unlikely that novel statistical estimation methods will result in significant improvements. The current performance wall is mostly a consequence of Zipf's law, and this should be taken into account when designing a statistical machine translation system. A few possible research directions are discussed as a result of this investigation, most notably the integration of linguistic rules into the model inference phase, and the development of active learning procedures.


Posted Content
TL;DR: In this article, the authors reanalyzed the results of two previous studies on strategy selection from a learning perspective, which argues that people learn to select strategies for making probabilistic inferences.
Abstract: The assumption that people possess a repertoire of strategies to solve the inference problems they face has been made repeatedly. The experimental findings of two previous studies on strategy selection are reexamined from a learning perspective, which argues that people learn to select strategies for making probabilistic inferences. This learning process is modeled with the strategy selection learning (SSL) theory, which assumes that people develop subjective expectancies for the strategies they have. They select strategies proportional to their expectancies, which are updated on the basis of experience. For the study by Newell, Weston, and Shanks (2003) it can be shown that people did not anticipate the success of a strategy from the beginning of the experiment. Instead, the behavior observed at the end of the experiment was the result of a learning process that can be described by the SSL theory. For the second study, by Broder and Schiffer (2006), the SSL theory is able to provide an explanation for why participants only slowly adapted to new environments in a dynamic inference situation. The reanalysis of the previous studies illustrates the importance of learning for probabilistic inferences.