scispace - formally typeset
Search or ask a question

Showing papers on "Algorithmic learning theory published in 2009"


Book
01 Jan 2009
TL;DR: A survey of previous comparisons and theoretical work descriptions of methods dataset descriptions criteria for comparison and methodology (including validation) empirical results machine learning on machine learning can be found in this article, where the authors also discuss their own work.
Abstract: Survey of previous comparisons and theoretical work descriptions of methods dataset descriptions criteria for comparison and methodology (including validation) empirical results machine learning on machine learning.

2,325 citations


Book
29 Jun 2009
TL;DR: This introductory book presents some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi- supervised support vector machines, and discusses their basic mathematical formulation.
Abstract: Semi-supervised learning is a learning paradigm concerned with the study of how computers and natural systems such as humans learn in the presence of both labeled and unlabeled data. Traditionally, learning has been studied either in the unsupervised paradigm (e.g., clustering, outlier detection) where all the data is unlabeled, or in the supervised paradigm (e.g., classification, regression) where all the data is labeled.The goal of semi-supervised learning is to understand how combining labeled and unlabeled data may change the learning behavior, and design algorithms that take advantage of such a combination. Semi-supervised learning is of great interest in machine learning and data mining because it can use readily available unlabeled data to improve supervised learning tasks when the labeled data is scarce or expensive. Semi-supervised learning also shows potential as a quantitative tool to understand human category learning, where most of the input is self-evidently unlabeled. In this introductory book, we present some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi-supervised support vector machines. For each model, we discuss its basic mathematical formulation. The success of semi-supervised learning depends critically on some underlying assumptions. We emphasize the assumptions made by each model and give counterexamples when appropriate to demonstrate the limitations of the different models. In addition, we discuss semi-supervised learning for cognitive psychology. Finally, we give a computational learning theoretic perspective on semi-supervised learning, and we conclude the book with a brief discussion of open questions in the field.

1,913 citations


Proceedings ArticleDOI
14 Jun 2009
TL;DR: This work presents a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions that uses importance weighting to correct sampling bias, and is able to give rigorous label complexity bounds for the learning process.
Abstract: We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process.

335 citations


Journal ArticleDOI
TL;DR: An ensemble of online sequential extreme learning machine (EOS-ELM) based on OS- ELM is proposed, which is more stable and accurate than the original OS-ELm and can be further improved.

323 citations


Journal ArticleDOI
TL;DR: This article argues that the pragmatic goal of computer-assisted language learning (CALL) developers and researchers pushes them to consider a variety of theoretical approaches to second language acquisition (SLA), which have developed, in part, in response to the need to theorize the role of instruction.
Abstract: The point of departure for this article is the contrast between the theoretical landscape within view of language teaching professionals in 1991 and that of today. I argue that the pragmatic goal of computer-assisted language learning (CALL) developers and researchers to create and evaluate learning opportunities pushes them to consider a variety of theoretical approaches to second language acquisition (SLA), which have developed, in part, in response to the need to theorize the role of instruction in SLA. To illustrate connections between SLA and CALL, I touch on multiple theoretical perspectives grouped into four general approaches: cognitive linguistic (Universal Grammar, autonomous induction theory, and the concept-oriented approach); psycholinguistic (processibility theory, input processing theory, interactionist theory); human learning (associative–cognitive CREED, skill acquisition theory); and language in social context (sociocultural, language socialization, conversation analysis, systemic–functional, complexity theory). I suggest that such theoretical approaches can be useful in the development and evaluation of CALL materials and tasks. Finally, I propose that the expanding use of technology changes the nature of communicative competence theory, challenges SLA theory, and increases the number of consumers for SLA research. [ABSTRACT FROM AUTHOR]

273 citations


Proceedings Article
07 Dec 2009
TL;DR: In this article, the authors prove that online learning with delayed updates converges well, thereby facilitating parallel online learning, and they further show that the convergence of online learning can be improved with the use of modern multi-core architectures.
Abstract: Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning.

189 citations


01 Jan 2009
TL;DR: Novel and distinct stability-based generalization bounds for stationary φ-mixing and β- Mixing sequences are proved, which can be viewed as the first theoretical basis for the use of these algorithms in non-i.i.d. scenarios.
Abstract: Most generalization bounds in learning theory are based on some measure of the complexity of the hypothesis class used, independently of any algorithm. In contrast, the notion of algorithmic stability can be used to derive tight generalization bounds that are tailored to specific learning algorithms by exploiting their particular properties. However, as in much of learning theory, existing stability analyses and bounds apply only in the scenario where the samples are independently and identically distributed. In many machine learning applications, however, this assumption does not hold. The observations received by the learning algorithm often have some inherent temporal dependence. This paper studies the scenario where the observations are drawn from a stationary ϕ-mixing or β-mixing sequence, a widely adopted assumption in the study of non-i.i.d. processes that implies a dependence between observations weakening over time. We prove novel and distinct stability-based generalization bounds for stationary ϕ-mixing and βmixing sequences. These bounds strictly generalize the bounds given in the i.i.d. case and apply to all stable learning algorithms, thereby extending the use of stability-bounds to non-i.i.d. scenarios. We also illustrate the application of our ϕ-mixing generalization bounds to general classes of learning algorithms, including Support Vector Regression, Kernel Ridge Regression, and Support Vector Machines, and many other kernel regularization-based and relative entropy-based regularization algorithms. These novel bounds can thus be viewed as the first theoretical basis for the use of these algorithms in non-i.i.d. scenarios.

125 citations


Proceedings ArticleDOI
14 Jun 2009
TL;DR: This paper proposes a general framework, called EigenTransfer, to tackle a variety of transfer learning problems, e.g. cross-domain learning, self-taught learning, etc, and applies it on three different transfer learning tasks to demonstrate its unifying ability and show through experiments that Eigen transfer can greatly outperform several representative non-transfer learners.
Abstract: This paper proposes a general framework, called EigenTransfer, to tackle a variety of transfer learning problems, e.g. cross-domain learning, self-taught learning, etc. Our basic idea is to construct a graph to represent the target transfer learning task. By learning the spectra of a graph which represents a learning task, we obtain a set of eigenvectors that reflect the intrinsic structure of the task graph. These eigenvectors can be used as the new features which transfer the knowledge from auxiliary data to help classify target data. Given an arbitrary non-transfer learner (e.g. SVM) and a particular transfer learning task, EigenTransfer can produce a transfer learner accordingly for the target transfer learning task. We apply EigenTransfer on three different transfer learning tasks, cross-domain learning, cross-category learning and self-taught learning, to demonstrate its unifying ability, and show through experiments that EigenTransfer can greatly outperform several representative non-transfer learners.

118 citations


01 Jan 2009
TL;DR: Borders on the rates of convergence achievable by active learning are derived, under various noise models and under general conditions on the hypothesis class.
Abstract: I study the informational complexity of active learning in a statistical learning theory framework. Specifically, I derive bounds on the rates of convergence achievable by active learning, under various noise models and under general conditions on the hypothesis class. I also study the theoretical advantages of active learning over passive learning, and develop procedures for transforming passive learning algorithms into active learning algorithms with asymptotically superior label complexity. Finally, I study generalizations of active learning to more general forms of interactive statistical learning.

96 citations


Journal ArticleDOI
TL;DR: Experimental results indicate that the evaluation results of the proposed scheme are very close to those of summative assessment results and the factor analysis provides simple and clear learning performance assessment rules.
Abstract: Current trends clearly indicate that online learning has become an important learning mode. However, no effective assessment mechanism for learning performance yet exists for e-learning systems. Learning performance assessment aims to evaluate what learners learned during the learning process. Traditional summative evaluation only considers final learning outcomes, without concerning the learning processes of learners. With the evolution of learning technology, the use of learning portfolios in a web-based learning environment can be beneficially adopted to record the procedure of the learning, which evaluates the learning performances of learners and produces feedback information to learners in ways that enhance their learning. Accordingly, this study presents a mobile formative assessment tool using data mining, which involves six computational intelligence theories, i.e. statistic correlation analysis, fuzzy clustering analysis, grey relational analysis, K-means clustering, fuzzy association rule mining and fuzzy inference, in order to identify the key formative assessment rules according to the web-based learning portfolios of an individual learner for the performance promotion of web-based learning. Restated, the proposed method can help teachers to precisely assess the learning performance of individual learner utilizing only the learning portfolios in a web-based learning environment. Hence, teachers can devote themselves to teaching and designing courseware, since they save a lot of time in measuring learning performance. More importantly, teachers can understand the main factors influencing learning performance in a web-based learning environment based on the interpretable learning performance assessment rules obtained. Experimental results indicate that the evaluation results of the proposed scheme are very close to those of summative assessment results and the factor analysis provides simple and clear learning performance assessment rules. Furthermore, the proposed learning feedback with formative assessment can clearly promote the learning performances and interests of learners.

91 citations


DOI
01 Jan 2009
TL;DR: This thesis is that the KWIK learning model provides a flexible, modularized, and unifying way for creating and analyzing reinforcement-learning algorithms with provably efficient exploration and facilitates the development of new algorithms with smaller sample complexity, which have demonstrated empirically faster learning speed in real-world problems.
Abstract: Computational learning theory studies mathematical models that allow one to formally analyze and compare the performance of supervised-learning algorithms such as their sample complexity. While existing models such as PAC (Probably Approximately Correct) have played an influential role in understanding the nature of supervised learning, they have not been as successful in reinforcement learning (RL). Here, the fundamental barrier is the need for active exploration in sequential decision problems. An RL agent tries to maximize long-term utility by exploiting its knowledge about the problem, but this knowledge has to be acquired by the agent itself through exploring the problem that may reduce short-term utility. The need for active exploration is common in many problems in daily life, engineering, and sciences. For example, a Backgammon program strives to take good moves to maximize the probability of winning a game, but sometimes it may try novel and possibly harmful moves to discover how the opponent reacts in the hope of discovering a better game-playing strategy. It has been known since the early days of RL that a good tradeoff between exploration and exploitation is critical for the agent to learn fast (i.e., to reach near-optimal strategies with a small sample complexity), but a general theoretical analysis of this tradeoff remained open until recently. In this dissertation, we introduce a novel computational learning model called KWIK (Knows What It Knows) that is designed particularly for its utility in analyzing learning problems like RL where active exploration can impact the training data the learner is exposed to. My thesis is that the KWIK learning model provides a flexible, modularized, and unifying way for creating and analyzing reinforcement-learning algorithms with provably efficient exploration. In particular, we show how the KWIK perspective can be used to unify the analysis of existing RL algorithms with polynomial sample complexity. It also facilitates the development of new algorithms with smaller sample complexity, which have demonstrated empirically faster learning speed in real-world problems. Furthermore, we provide an improved, matching sample complexity lower bound, which suggests the optimality (in a sense) of one of the KWIK-based algorithms known as delayed Q-learning.

Journal ArticleDOI
TL;DR: A model for an interactive, introductory secondary- or tertiary-level statistics course that is designed to develop students’ statistical reasoning is described, built on the constructivist theory of learning.
Abstract: Summary This article describes a model for an interactive, introductory secondary- or tertiary-level statistics course that is designed to develop students’ statistical reasoning. This model is called a ‘Statistical Reasoning Learning Environment’ and is built on the constructivist theory of learning. test_363 72..77

Journal ArticleDOI
TL;DR: A learning model which considers both the machine and human learning effects simultaneously simultaneously is proposed and the position-based and sum-of-processing-time-based learning models in the literature are special cases of the proposed model.

Journal ArticleDOI
TL;DR: A new learning model is proposed that unifies the two main approaches to scheduling with learning effect and shows that some single-machine problems and some specified flowshop problems are polynomially solvable.

Proceedings Article
18 Jun 2009
TL;DR: This work presents an objective function for learning with unlabeled data that utilizes auxiliary expectation constraints and optimize this objective function using a procedure that alternates between information and moment projections.
Abstract: We present an objective function for learning with unlabeled data that utilizes auxiliary expectation constraints We optimize this objective function using a procedure that alternates between information and moment projections Our method provides an alternate interpretation of the posterior regularization framework (Graca et al, 2008), maintains uncertainty during optimization unlike constraint-driven learning (Chang et al, 2007), and is more efficient than generalized expectation criteria (Mann & McCallum, 2008) Applications of this framework include minimally supervised learning, semi-supervised learning, and learning with constraints that are more expressive than the underlying model In experiments, we demonstrate comparable accuracy to generalized expectation criteria for minimally supervised learning, and use expressive structural constraints to guide semi-supervised learning, providing a 3%-6% improvement over state-of-the-art constraint-driven learning


Journal ArticleDOI
TL;DR: It is proved that the minimization problem with an S-shaped learning model is strongly NP-hard even if the experience provided by each job is equal to its normal processing time.

Proceedings Article
04 Jul 2009
TL;DR: This research focuses on the adaptive course sequencing method that uses soft computing techniques as an alternative to a rule-based adaptation for an adaptive learning system and the ability of soft computing technique in handling uncertainty and incompleteness of a problem is exploited.
Abstract: Advancements in technology have led to a paradigm shift from traditional to personalized learning methods with varied implementation strategies. Presenting an optimal personalized learning path in an educational hypermedia system is one of the strategies that is important in order to increase the effectiveness of a learning session for each student. However, this task requires much effort and cost particularly in defining rules for the adaptation of learning materials. This research focuses on the adaptive course sequencing method that uses soft computing techniques as an alternative to a rule-based adaptation for an adaptive learning system. The ability of soft computing technique in handling uncertainty and incompleteness of a problem is exploited in the study. In this paper we present recent work concerning concept-based classification of learning object using artificial neural network (ANN). Self Organizing Map (SOM) and Back Propagation (BP) algorithm were employed to discover the connection between the domain concepts contained in the learning object and the learner’s learning need. The experiment result shows that this approach is assuring in determining a suitable learning object for a particular student in an adaptive and dynamic learning environment.

Proceedings ArticleDOI
09 Nov 2009
TL;DR: In this article, the authors propose a framework for quantitative security analysis of machine learning methods, where the key parts are the formal specification of a deployed learning model and attacker's constraints, the computation of an optimal attack, and the derivation of an upper bound on adversarial impact.
Abstract: We propose a framework for quantitative security analysis of machine learning methods. The key parts of this framework are the formal specification of a deployed learning model and attacker's constraints, the computation of an optimal attack, and the derivation of an upper bound on adversarial impact. We exemplarily apply the framework for the analysis of one specific learning scenario, online centroid anomaly detection, and experimentally verify the tightness of obtained theoretical bounds.

Journal ArticleDOI
TL;DR: This work model the acquisition of the English anaphoric pronoun one in order to identify necessary constraints for successful acquisition, and the nature of those constraints are argued.
Abstract: We identify three components of any learning theory: the representations, the learner's data intake, and the learning algorithm. With these in mind, we model the acquisition of the English anaphoric pronoun one in order to identify necessary constraints for successful acquisition, and the nature of those constraints. Whereas previous modeling efforts have succeeded by using a domain-general learning algorithm that implicitly restricts the data intake to be a subset of the input, we show that the same kind of domain-general learning algorithm fails when it does not restrict the data intake. We argue that the necessary data intake restrictions are domain-specific in nature. Thus, while a domain-general algorithm can be quite powerful, a successful learner must also rely on domain-specific learning mechanisms when learning anaphoric one.

Proceedings ArticleDOI
25 Apr 2009
TL;DR: This thesis elaborated the concept, significance and main strategy of machine learning as well as the basic structure of machineLearning system, introducing several machine learning methods, such as Rote learning, Explanation-based learning, Learning from instruction, Learning by deduction, learning by analogy and Inductive learning, etc.
Abstract: This thesis elaborated the concept, significance and main strategy of machine learning as well as the basic structure of machine learning system. By combining several basic ideas of main strategies, great effort are laid on introducing several machine learning methods, such as Rote learning, Explanation-based learning, Learning from instruction, Learning by deduction, Learning by analogy and Inductive learning, etc. Meanwhile, comparison and analysis are made upon their respective advantages and limitations. At the end of the article, it proposes the research objective of machine learning and points out its development trend.Machine learning is a fundamental way that enable the computer to have the intelligence ; Its application which had been used mainly the method of induction and the synthesis?rather than the deduction has already reached many fields of Artificial Intelligence.

Proceedings ArticleDOI
Sam Chao1, Fai Wong1
12 Jul 2009
TL;DR: i+Learning (Intelligent, Incremental and Interactive Learning) theory is proposed to complement the traditional incremental decision tree learning algorithms by concerning new available attributes in addition to the new incoming instances.
Abstract: Decision tree is one kind of inductive learning algorithms that offers an efficient and practical method for generalizing classification rules from previous concrete cases that already solved by domain experts. It is considered attractive for many real-life applications, mostly due to its interpretability. Recently, many researches have been reported to endow decision trees with incremental learning ability, which is able to address the learning task with a stream of training instances. However, there are few literatures discussing the algorithms with incremental learning ability regarding the new attributes. In this paper, i+Learning (Intelligent, Incremental and Interactive Learning) theory is proposed to complement the traditional incremental decision tree learning algorithms by concerning new available attributes in addition to the new incoming instances. The experimental results reveal that i+Learning method offers the promise of making decision trees a more powerful, flexible, accurate and valuable paradigm, especially in medical data mining community.

Proceedings Article
10 May 2009
TL;DR: An algorithm, Reinforcement Learning with Decision Trees (rl-dt), that uses supervised learning techniques to learn the model by generalizing the relative effect of actions across states, which consistently accrues high cumulative rewards in comparison with the other algorithms tested.
Abstract: Improving the sample efficiency of reinforcement learning algorithms to scale up to larger and more realistic domains is a current research challenge in machine learning. Model-based methods use experiential data more efficiently than model-free approaches but often require exhaustive exploration to learn an accurate model of the domain. We present an algorithm, Reinforcement Learning with Decision Trees (rl-dt), that uses supervised learning techniques to learn the model by generalizing the relative effect of actions across states. Specifically, rl-dt uses decision trees to model the relative effects of actions in the domain. The agent explores the environment exhaustively in early episodes when its model is inaccurate. Once it believes it has developed an accurate model, it exploits its model, taking the optimal action at each step. The combination of the learning approach with the targeted exploration policy enables fast learning of the model. The sample efficiency of the algorithm is evaluated empirically in comparison to five other algorithms across three domains. rl-dt consistently accrues high cumulative rewards in comparison with the other algorithms tested.

Journal ArticleDOI
TL;DR: This paper presents an iterative algorithm for enhancing the performance of any inductive learning process through the use of feature construction as a pre-processing step and applies it on three learning methods, namely genetic algorithms, C4.5 and lazy learner, and shows improvement in performance.
Abstract: Inductive learning algorithms, in general, perform well on data that have been pre-processed to reduce complexity. By themselves they are not particularly effective in reducing data complexity while learning difficult concepts. Feature construction has been shown to reduce complexity of space spanned by input data. In this paper, we present an iterative algorithm for enhancing the performance of any inductive learning process through the use of feature construction as a pre-processing step. We apply the procedure on three learning methods, namely genetic algorithms, C4.5 and lazy learner, and show improvement in performance.

Journal ArticleDOI
TL;DR: The results show that the power of the adaptive learning sequence lies in the way it takes into account students' personal characteristics and performance; for this reason, it constitutes an important innovation in the field of Teaching English as a Second Language (TESL).
Abstract: The purpose of this paper is to propose an adaptive system analysis for optimizing learning sequences. The analysis employs a decision tree algorithm, based on students' profiles, to discover the most adaptive learning sequences for a particular teaching content. The profiles were created on the basis of pretesting and posttesting, and from a set of five student characteristics: gender, personality type, cognitive style, learning style, and the students' grades from the previous semester. This paper address the problem of adhering to a fixed learning sequence in the traditional method of teaching English, and recommend a rule for setting up an optimal learning sequence for facilitating students' learning processes and for maximizing their learning outcome. By using the technique proposed in this paper, teachers will be able both to lower the cost of teaching and to achieve an optimally adaptive learning sequence for students. The results show that the power of the adaptive learning sequence lies in the way it takes into account students' personal characteristics and performance; for this reason, it constitutes an important innovation in the field of Teaching English as a Second Language (TESL).

Proceedings ArticleDOI
14 Jun 2009
TL;DR: A new setting is considered: given training vectors in space X along with labels and description of this data in another space X, find in spaceX a decision rule better than the one found in the classical paradigm.
Abstract: In this paper we consider a new paradigm of learning: learning using hidden information. The classical paradigm of the supervised learning is to learn a decision rule from labeled data (x i , y i ), x i ∈ X, y i ∈ {−1, 1}, i = 1, …, l. In this paper we consider a new setting: given training vectors in space X along with labels and description of this data in another space X*, find in space X a decision rule better than the one found in the classical paradigm.

Proceedings ArticleDOI
06 Aug 2009
TL;DR: This paper proposes a novel form of semantic analysis called reading to learn, where the goal is to obtain a high-level semantic abstract of multiple documents in a representation that facilitates learning.
Abstract: Machine learning offers a range of tools for training systems from data, but these methods are only as good as the underlying representation. This paper proposes to acquire representations for machine learning by reading text written to accommodate human learning. We propose a novel form of semantic analysis called reading to learn, where the goal is to obtain a high-level semantic abstract of multiple documents in a representation that facilitates learning. We obtain this abstract through a generative model that requires no labeled data, instead leveraging repetition across multiple documents. The semantic abstract is converted into a transformed feature space for learning, resulting in improved generalization on a relational learning task.

Journal ArticleDOI
TL;DR: Interpreting the core of mathematical economic theory to be defined by General Equilibrium Theory and Game Theory, a general - but concise - analysis of the computable and decidable content of the implications of these two areas are discussed.

Posted Content
TL;DR: Introduction to Machine learning covering Statistical Inference, algebraic and spectral methods, and PAC learning (the Formal model, VC dimension, Double Sampling theorem).
Abstract: Introduction to Machine learning covering Statistical Inference (Bayes, EM, ML/MaxEnt duality), algebraic and spectral methods (PCA, LDA, CCA, Clustering), and PAC learning (the Formal model, VC dimension, Double Sampling theorem).

Journal ArticleDOI
TL;DR: The theory of direct learning portrays learning as specificity between higher order informational quantities, referred to as information for learning, and change in performance that occurs with practice, and this study illustrates and further develops the theory.
Abstract: The theory of direct learning portrays learning as specificity between higher order informational quantities, referred to as information for learning, and change in performance that occurs with practice (Jacobs & Michaels, 2007). The focus of the theory is on the lawful generation and possible use of information for learning. This study illustrates and further develops the theory. Participants in the study were asked to judge the mass of unseen handheld objects. In Experiment 1, different participants received feedback on different mechanical properties of the objects, and in Experiment 2, different participants practiced with different sets of objects. The practice led to changes in performance that, in the present portrayal, show up as movements through manifolds. As predicted by the theory, these movements are specific to information for learning, the most precise description of which is obtained with difference equations. A second and more theoretical part of the article provides a tentative formaliza...