Showing papers on "Unsupervised learning published in 2009"

PDF

Open Access

Book•

[...]

Yoshua Bengio¹•Institutions (1)

01 Jan 2009

TL;DR: The motivations and principles regarding learning algorithms for deep architectures, in particular those exploiting as building blocks unsupervised learning of single-layer modelssuch as Restricted Boltzmann Machines, used to construct deeper models such as Deep Belief Networks are discussed.

...read moreread less

Abstract: Can machine learning deliver AI? Theoretical results, inspiration from the brain and cognition, as well as machine learning experiments suggest that in order to learn the kind of complicated functions that can represent high-level abstractions (e.g. in vision, language, and other AI-level tasks), one would need deep architectures. Deep architectures are composed of multiple levels of non-linear operations, such as in neural nets with many hidden layers, graphical models with many levels of latent variables, or in complicated propositional formulae re-using many sub-formulae. Each level of the architecture represents features at a different level of abstraction, defined as a composition of lower-level features. Searching the parameter space of deep architectures is a difficult task, but new algorithms have been discovered and a new sub-area has emerged in the machine learning community since 2006, following these discoveries. Learning algorithms such as those for Deep Belief Networks and other related unsupervised learning algorithms have recently been proposed to train deep architectures, yielding exciting results and beating the state-of-the-art in certain areas. Learning Deep Architectures for AI discusses the motivations for and principles of learning algorithms for deep architectures. By analyzing and comparing recent results with different learning algorithms for deep architectures, explanations for their success are proposed and discussed, highlighting challenges and suggesting avenues for future explorations in this area.

...read moreread less

7,767 citations

Proceedings Article•DOI•

Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations

[...]

Honglak Lee¹, Roger Grosse¹, Rajesh Ranganath¹, Andrew Y. Ng¹•Institutions (1)

Stanford University¹

14 Jun 2009

TL;DR: The convolutional deep belief network is presented, a hierarchical generative model which scales to realistic image sizes and is translation-invariant and supports efficient bottom-up and top-down probabilistic inference.

...read moreread less

Abstract: There has been much interest in unsupervised learning of hierarchical generative models such as deep belief networks. Scaling such models to full-sized, high-dimensional images remains a difficult problem. To address this problem, we present the convolutional deep belief network, a hierarchical generative model which scales to realistic image sizes. This model is translation-invariant and supports efficient bottom-up and top-down probabilistic inference. Key to our approach is probabilistic max-pooling, a novel technique which shrinks the representations of higher layers in a probabilistically sound way. Our experiments show that the algorithm learns useful high-level visual features, such as object parts, from unlabeled images of objects and natural scenes. We demonstrate excellent performance on several visual recognition tasks and show that our model can perform hierarchical (bottom-up and top-down) inference over full-sized images.

...read moreread less

2,668 citations

Book•

Machine Learning: Neural and Statistical Classification

[...]

Donald Michie¹, David Spiegelhalter, Charles C. Taylor², John A. Campbell³•Institutions (3)

University of Edinburgh¹, University of Leeds², University College London³

01 Jan 2009

TL;DR: A survey of previous comparisons and theoretical work descriptions of methods dataset descriptions criteria for comparison and methodology (including validation) empirical results machine learning on machine learning can be found in this article, where the authors also discuss their own work.

...read moreread less

Abstract: Survey of previous comparisons and theoretical work descriptions of methods dataset descriptions criteria for comparison and methodology (including validation) empirical results machine learning on machine learning.

...read moreread less

2,325 citations

Proceedings Article•DOI•

What is the best multi-stage architecture for object recognition?

[...]

Kevin Jarrett¹, Koray Kavukcuoglu¹, Marc'Aurelio Ranzato¹, Yann LeCun¹•Institutions (1)

Courant Institute of Mathematical Sciences¹

01 Sep 2009

TL;DR: It is shown that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks and that two stages of feature extraction yield better accuracy than one.

...read moreread less

Abstract: In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode This paper addresses three questions: 1 How does the non-linearities that follow the filter banks influence the recognition accuracy? 2 does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3 Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks We show that two stages of feature extraction yield better accuracy than one Most surprisingly, we show that a two-stage system with random filters can yield almost 63% recognition rate on Caltech-101, provided that the proper non-linearities and pooling layers are used Finally, we show that with supervised refinement, the system achieves state-of-the-art performance on NORB dataset (56%) and unsupervised pre-training followed by supervised refinement produces good accuracy on Caltech-101 (≫ 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (053%)

...read moreread less

2,317 citations

Book•

Introduction to Semi-Supervised Learning

[...]

Xiaojin Zhu¹, Andrew Goldberg¹, Ronald Brachman, Thomas G. Dietterich•Institutions (1)

University of Wisconsin-Madison¹

29 Jun 2009

TL;DR: This introductory book presents some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi- supervised support vector machines, and discusses their basic mathematical formulation.

...read moreread less

Abstract: Semi-supervised learning is a learning paradigm concerned with the study of how computers and natural systems such as humans learn in the presence of both labeled and unlabeled data. Traditionally, learning has been studied either in the unsupervised paradigm (e.g., clustering, outlier detection) where all the data is unlabeled, or in the supervised paradigm (e.g., classification, regression) where all the data is labeled.The goal of semi-supervised learning is to understand how combining labeled and unlabeled data may change the learning behavior, and design algorithms that take advantage of such a combination. Semi-supervised learning is of great interest in machine learning and data mining because it can use readily available unlabeled data to improve supervised learning tasks when the labeled data is scarce or expensive. Semi-supervised learning also shows potential as a quantitative tool to understand human category learning, where most of the input is self-evidently unlabeled. In this introductory book, we present some popular semi-supervised learning models, including self-training, mixture models, co-training and multiview learning, graph-based methods, and semi-supervised support vector machines. For each model, we discuss its basic mathematical formulation. The success of semi-supervised learning depends critically on some underlying assumptions. We emphasize the assumptions made by each model and give counterexamples when appropriate to demonstrate the limitations of the different models. In addition, we discuss semi-supervised learning for cognitive psychology. Finally, we give a computational learning theoretic perspective on semi-supervised learning, and we conclude the book with a brief discussion of open questions in the field.

...read moreread less

1,913 citations

Journal Article•DOI•

Transfer Learning for Reinforcement Learning Domains: A Survey

[...]

Matthew D. Taylor¹, Peter Stone•Institutions (1)

University of Southern California¹

01 Dec 2009-Journal of Machine Learning Research

TL;DR: This article presents a framework that classifies transfer learning methods in terms of their capabilities and goals, and then uses it to survey the existing literature, as well as to suggest future directions for transfer learning work.

...read moreread less

Abstract: The reinforcement learning paradigm is a popular way to address problems that have only limited environmental feedback, rather than correctly labeled examples, as is common in other machine learning contexts. While significant progress has been made to improve learning in a single task, the idea of transfer learning has only recently been applied to reinforcement learning tasks. The core idea of transfer is that experience gained in learning to perform one task can help improve learning performance in a related, but different, task. In this article we present a framework that classifies transfer learning methods in terms of their capabilities and goals, and then use it to survey the existing literature, as well as to suggest future directions for transfer learning work.

...read moreread less

1,634 citations

Journal Article•DOI•

Exploring Strategies for Training Deep Neural Networks

[...]

Hugo Larochelle, Yoshua Bengio, Jérôme Louradour, Pascal Lamblin

01 Dec 2009-Journal of Machine Learning Research

TL;DR: These experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input.

...read moreread less

Abstract: Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly non-linear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization often appears to get stuck in poor solutions. Hinton et al. recently proposed a greedy layer-wise unsupervised learning procedure relying on the training algorithm of restricted Boltzmann machines (RBM) to initialize the parameters of a deep belief network (DBN), a generative model with many layers of hidden causal variables. This was followed by the proposal of another greedy layer-wise procedure, relying on the usage of autoassociator networks. In the context of the above optimization problem, we study these algorithms empirically to better understand their success. Our experiments confirm the hypothesis that the greedy layer-wise unsupervised training strategy helps the optimization by initializing weights in a region near a good local minimum, but also implicitly acts as a sort of regularization that brings better generalization and encourages internal distributed representations that are high-level abstractions of the input. We also present a series of experiments aimed at evaluating the link between the performance of deep neural networks and practical aspects of their topology, for example, demonstrating cases where the addition of more depth helps. Finally, we empirically explore simple variants of these training algorithms, such as the use of different RBM input unit distributions, a simple way of combining gradient estimators to improve performance, as well as on-line versions of those algorithms.

...read moreread less

1,124 citations

Proceedings Article•DOI•

Large-scale deep unsupervised learning using graphics processors

[...]

Rajat Raina¹, Anand Madhavan¹, Andrew Y. Ng¹•Institutions (1)

Stanford University¹

14 Jun 2009

TL;DR: It is argued that modern graphics processors far surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods.

...read moreread less

Abstract: The promise of unsupervised learning methods lies in their potential to use vast amounts of unlabeled data to learn complex, highly nonlinear models with millions of free parameters. We consider two well-known unsupervised learning models, deep belief networks (DBNs) and sparse coding, that have recently been applied to a flurry of machine learning applications (Hinton & Salakhutdinov, 2006; Raina et al., 2007). Unfortunately, current learning algorithms for both models are too slow for large-scale applications, forcing researchers to focus on smaller-scale models, or to use fewer training examples.In this paper, we suggest massively parallel methods to help resolve these problems. We argue that modern graphics processors far surpass the computational capabilities of multicore CPUs, and have the potential to revolutionize the applicability of deep unsupervised learning methods. We develop general principles for massively parallelizing unsupervised learning tasks using graphics processors. We show that these principles can be applied to successfully scaling up learning algorithms for both DBNs and sparse coding. Our implementation of DBN learning is up to 70 times faster than a dual-core CPU implementation for large models. For example, we are able to reduce the time required to learn a four-layer DBN with 100 million free parameters from several weeks to around a single day. For sparse coding, we develop a simple, inherently parallel algorithm, that leads to a 5 to 15-fold speedup over previous methods.

...read moreread less

711 citations

Journal Article•DOI•

Sentiment analysis: A combined approach

[...]

Rudy Prabowo¹, Mike Thelwall¹•Institutions (1)

Information Technology University¹

01 Apr 2009-Journal of Informetrics

TL;DR: This paper combines rule-based classification, supervised learning and machine learning into a new combined method, and proposes a semi-automatic, complementary approach in which each classifier can contribute to other classifiers to achieve a good level of effectiveness.

...read moreread less

700 citations

Journal Article•DOI•

Unsupervised Activity Perception in Crowded and Complicated Scenes Using Hierarchical Bayesian Models

[...]

Xiaogang Wang¹, Xiaoxu Ma¹, W.E.L. Grimson¹•Institutions (1)

Massachusetts Institute of Technology¹

01 Mar 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes with many kinds of activities co-occurring, and three hierarchical Bayesian models are proposed that advance existing language models, such as LDA and HDP.

...read moreread less

Abstract: We propose a novel unsupervised learning framework to model activities and interactions in crowded and complicated scenes. Hierarchical Bayesian models are used to connect three elements in visual surveillance: low-level visual features, simple "atomic" activities, and interactions. Atomic activities are modeled as distributions over low-level visual features, and multi-agent interactions are modeled as distributions over atomic activities. These models are learnt in an unsupervised way. Given a long video sequence, moving pixels are clustered into different atomic activities and short video clips are clustered into different interactions. In this paper, we propose three hierarchical Bayesian models, Latent Dirichlet Allocation (LDA) mixture model, Hierarchical Dirichlet Process (HDP) mixture model, and Dual Hierarchical Dirichlet Processes (Dual-HDP) model. They advance existing language models, such as LDA [1] and HDP [2]. Our data sets are challenging video sequences from crowded traffic scenes and train station scenes with many kinds of activities co-occurring. Without tracking and human labeling effort, our framework completes many challenging visual surveillance tasks of board interest such as: (1) discovering typical atomic activities and interactions; (2) segmenting long video sequences into different interactions; (3) segmenting motions into different activities; (4) detecting abnormality; and (5) supporting high-level queries on activities and interactions.

...read moreread less

522 citations

Proceedings Article•DOI•

Unsupervised Learning of Narrative Schemas and their Participants

[...]

Nathanael Chambers¹, Dan Jurafsky¹•Institutions (1)

Stanford University¹

02 Aug 2009

TL;DR: An unsupervised system for learning narrative schemas, coherent sequences or sets of events whose arguments are filled with participant semantic roles defined over words to improve on previous results in narrative/frame learning and induce rich frame-specific semantic roles.

...read moreread less

Abstract: We describe an unsupervised system for learning narrative schemas, coherent sequences or sets of events (arrested(POLICE, SUSPECT), convicted(JUDGE, SUSPECT)) whose arguments are filled with participant semantic roles defined over words (Judge = {judge, jury, court}, Police = {police, agent, authorities}). Unlike most previous work in event structure or semantic role learning, our system does not use supervised techniques, hand-built knowledge, or predefined classes of events or roles. Our unsupervised learning algorithm uses coreferring arguments in chains of verbs to learn both rich narrative event structure and argument roles. By jointly addressing both tasks, we improve on previous results in narrative/frame learning and induce rich frame-specific semantic roles.

...read moreread less

Journal Article•DOI•

Neural evidence of statistical learning: Efficient detection of visual regularities without awareness

[...]

Nicholas B. Turk-Browne¹, Brian J. Scholl¹, Marvin M. Chun¹, Marcia K. Johnson¹•Institutions (1)

Yale University¹

21 Aug 2009-Journal of Cognitive Neuroscience

TL;DR: Evidence of learning emerged early during familiarization, showing that statistical learning can operate very quickly and with little exposure, and the findings help elucidate the underlying nature of statistical learning.

...read moreread less

Abstract: Our environment contains regularities distributed in space and time that can be detected by way of statistical learning. This unsupervised learning occurs without intent or awareness, but little is known about how it relates to other types of learning, how it affects perceptual processing, and how quickly it can occur. Here we use fMRI during statistical learning to explore these questions. Participants viewed statistically structured versus unstructured sequences of shapes while performing a task unrelated to the structure. Robust neural responses to statistical structure were observed, and these responses were notable in four ways: First, responses to structure were observed in the striatum and medial temporal lobe, suggesting that statistical learning may be related to other forms of associative learning and relational memory. Second, statistical regularities yielded greater activation in category-specific visual regions (object-selective lateral occipital cortex and word-selective ventral occipito-temporal cortex), demonstrating that these regions are sensitive to information distributed in time. Third, evidence of learning emerged early during familiarization, showing that statistical learning can operate very quickly and with little exposure. Finally, neural signatures of learning were dissociable from subsequent explicit familiarity, suggesting that learning can occur in the absence of awareness. Overall, our findings help elucidate the underlying nature of statistical learning.

...read moreread less

Proceedings Article•DOI•

Named entity recognition in query

[...]

Jiafeng Guo¹, Gu Xu², Xueqi Cheng¹, Hang Li²•Institutions (2)

Chinese Academy of Sciences¹, Microsoft²

19 Jul 2009

TL;DR: Experimental results show that the proposed method based on WS-LDA can accurately perform NERQ, and outperform the baseline methods.

...read moreread less

Abstract: This paper addresses the problem of Named Entity Recognition in Query (NERQ), which involves detection of the named entity in a given query and classification of the named entity into predefined classes. NERQ is potentially useful in many applications in web search. The paper proposes taking a probabilistic approach to the task using query log data and Latent Dirichlet Allocation. We consider contexts of a named entity (i.e., the remainders of the named entity in queries) as words of a document, and classes of the named entity as topics. The topic model is constructed by a novel and general learning method referred to as WS-LDA (Weakly Supervised Latent Dirichlet Allocation), which employs weakly supervised learning (rather than unsupervised learning) using partially labeled seed entities. Experimental results show that the proposed method based on WS-LDA can accurately perform NERQ, and outperform the baseline methods.

...read moreread less

Proceedings Article•DOI•

Deep learning from temporal coherence in video

[...]

Hossein Mobahi¹, Ronan Collobert², Jason Weston²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Princeton University²

14 Jun 2009

TL;DR: A learning method for deep architectures that takes advantage of sequential data, in particular from the temporal coherence that naturally exists in unlabeled video recordings, and is used to improve the performance on a supervised task of interest.

...read moreread less

Abstract: This work proposes a learning method for deep architectures that takes advantage of sequential data, in particular from the temporal coherence that naturally exists in unlabeled video recordings. That is, two successive frames are likely to contain the same object or objects. This coherence is used as a supervisory signal over the unlabeled data, and is used to improve the performance on a supervised task of interest. We demonstrate the effectiveness of this method on some pose invariant object and face recognition tasks.

...read moreread less

Journal Article•DOI•

Agnostic active learning

[...]

Maria-Florina Balcan¹, Alina Beygelzimer², John Langford³•Institutions (3)

Carnegie Mellon University¹, IBM², Yahoo!³

01 Jan 2009-Journal of Computer and System Sciences

TL;DR: This work state and analyze the first active learning algorithm that finds an @e-optimal hypothesis in any hypothesis class, when the underlying distribution has arbitrary forms of noise, and achieves an exponential improvement over the usual sample complexity of supervised learning.

...read moreread less

Journal Article•DOI•

Robotic neurorehabilitation: a computational motor learning perspective

[...]

Vincent S. Huang¹, John W. Krakauer¹•Institutions (1)

Columbia University¹

25 Feb 2009-Journal of Neuroengineering and Rehabilitation

TL;DR: Robots are particularly suitable for both rigorous testing and application of motor learning principles to neurorehabilitation and are considered as a general learning problem from the perspective of theoretical learning frameworks such as supervised and unsupervised learning.

...read moreread less

Abstract: Conventional neurorehabilitation appears to have little impact on impairment over and above that of spontaneous biological recovery Robotic neurorehabilitation has the potential for a greater impact on impairment due to easy deployment, its applicability across of a wide range of motor impairment, its high measurement reliability, and the capacity to deliver high dosage and high intensity training protocols We first describe current knowledge of the natural history of arm recovery after stroke and of outcome prediction in individual patients Rehabilitation strategies and outcome measures for impairment versus function are compared The topics of dosage, intensity, and time of rehabilitation are then discussed Robots are particularly suitable for both rigorous testing and application of motor learning principles to neurorehabilitation Computational motor control and learning principles derived from studies in healthy subjects are introduced in the context of robotic neurorehabilitation Particular attention is paid to the idea of context, task generalization and training schedule The assumptions that underlie the choice of both movement trajectory programmed into the robot and the degree of active participation required by subjects are examined We consider rehabilitation as a general learning problem, and examine it from the perspective of theoretical learning frameworks such as supervised and unsupervised learning We discuss the limitations of current robotic neurorehabilitation paradigms and suggest new research directions from the perspective of computational motor learning

...read moreread less

Proceedings Article•DOI•

Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams

[...]

Yaodong Zhang¹, James Glass¹•Institutions (1)

Vassar College¹

01 Dec 2009

TL;DR: An unsupervised learning framework is presented to address the problem of detecting spoken keywords by using segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances and obtaining the keyword detection result.

...read moreread less

Abstract: In this paper, we present an unsupervised learning framework to address the problem of detecting spoken keywords. Without any transcription information, a Gaussian Mixture Model is trained to label speech frames with a Gaussian posteriorgram. Given one or more spoken examples of a keyword, we use segmental dynamic time warping to compare the Gaussian posteriorgrams between keyword samples and test utterances. The keyword detection result is then obtained by ranking the distortion scores of all the test utterances. We examine the TIMIT corpus as a development set to tune the parameters in our system, and the MIT Lecture corpus for more substantial evaluation. The results demonstrate the viability and effectiveness of our unsupervised learning framework on the keyword spotting task.

...read moreread less

Proceedings Article•DOI•

Importance weighted active learning

[...]

Alina Beygelzimer¹, Sanjoy Dasgupta², John Langford³•Institutions (3)

IBM¹, University of California, San Diego², Yahoo!³

14 Jun 2009

TL;DR: This work presents a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions that uses importance weighting to correct sampling bias, and is able to give rigorous label complexity bounds for the learning process.

...read moreread less

Abstract: We present a practical and statistically consistent scheme for actively learning binary classifiers under general loss functions. Our algorithm uses importance weighting to correct sampling bias, and by controlling the variance, we are able to give rigorous label complexity bounds for the learning process.

...read moreread less

Journal Article•DOI•

SemiBoost: Boosting for Semi-Supervised Learning

[...]

Pavan Kumar Mallapragada¹, Rong Jin¹, Anil K. Jain¹, Yi Liu¹•Institutions (1)

Michigan State University¹

01 Nov 2009-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A boosting framework for semi-supervised learning, termed as SemiBoost, that improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples and is comparable to the state-of-the-art semi- supervised learning algorithms.

...read moreread less

Abstract: Semi-supervised learning has attracted a significant amount of attention in pattern recognition and machine learning. Most previous studies have focused on designing special algorithms to effectively exploit the unlabeled data in conjunction with labeled data. Our goal is to improve the classification accuracy of any given supervised learning algorithm by using the available unlabeled examples. We call this as the Semi-supervised improvement problem, to distinguish the proposed approach from the existing approaches. We design a metasemi-supervised learning algorithm that wraps around the underlying supervised algorithm and improves its performance using unlabeled data. This problem is particularly important when we need to train a supervised learning algorithm with a limited number of labeled examples and a multitude of unlabeled examples. We present a boosting framework for semi-supervised learning, termed as SemiBoost. The key advantages of the proposed semi-supervised learning approach are: 1) performance improvement of any supervised learning algorithm with a multitude of unlabeled data, 2) efficient computation by the iterative boosting algorithm, and 3) exploiting both manifold and cluster assumption in training classification models. An empirical study on 16 different data sets and text categorization demonstrates that the proposed framework improves the performance of several commonly used supervised learning algorithms, given a large number of unlabeled examples. We also show that the performance of the proposed algorithm, SemiBoost, is comparable to the state-of-the-art semi-supervised learning algorithms.

...read moreread less

Journal Article•DOI•

Letters: Ensemble of online sequential extreme learning machine

[...]

Yuan Lan¹, Yeng Chai Soh¹, Guang-Bin Huang¹•Institutions (1)

Nanyang Technological University¹

01 Aug 2009-Neurocomputing

TL;DR: An ensemble of online sequential extreme learning machine (EOS-ELM) based on OS- ELM is proposed, which is more stable and accurate than the original OS-ELm and can be further improved.

...read moreread less

Proceedings Article•DOI•

Reinforcement Learning for Mapping Instructions to Actions

[...]

S.R.K. Branavan¹, Harr Chen¹, Luke Zettlemoyer¹, Regina Barzilay¹•Institutions (1)

Massachusetts Institute of Technology¹

02 Aug 2009

TL;DR: This paper presents a reinforcement learning approach for mapping natural language instructions to sequences of executable actions, and uses a policy gradient algorithm to estimate the parameters of a log-linear model for action selection.

...read moreread less

Abstract: In this paper, we present a reinforcement learning approach for mapping natural language instructions to sequences of executable actions. We assume access to a reward function that defines the quality of the executed actions. During training, the learner repeatedly constructs action sequences for a set of documents, executes those actions, and observes the resulting reward. We use a policy gradient algorithm to estimate the parameters of a log-linear model for action selection. We apply our method to interpret instructions in two domains --- Windows troubleshooting guides and game tutorials. Our results demonstrate that this method can rival supervised learning techniques while requiring few or no annotated training examples.

...read moreread less

Book•

Unsupervised Learning of Natural Languages

[...]

Zach Solan¹, David Horn, Eytan Ruppin, Shimon Edelman•Institutions (1)

Tel Aviv University¹

26 Jun 2009

TL;DR: This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.

...read moreread less

Abstract: We address the problem, fundamental to linguistics, bioinformatics, and certain other disciplines, of using corpora of raw symbolic sequential data to infer underlying rules that govern their production. Given a corpus of strings (such as text, transcribed speech, chromosome or protein sequence data, sheet music, etc.), our unsupervised algorithm recursively distills from it hierarchically structured patterns. The adios (automatic distillation of structure) algorithm relies on a statistical method for pattern extraction and on structured generalization, two processes that have been implicated in language acquisition. It has been evaluated on artificial context-free grammars with thousands of rules, on natural languages as diverse as English and Chinese, and on protein data correlating sequence with function. This unsupervised algorithm is capable of learning complex syntax, generating grammatical novel sentences, and proving useful in other fields that call for structure discovery from raw data, such as bioinformatics.

...read moreread less

Proceedings Article•

Semi-supervised Learning by Sparse Representation.

[...]

Shuicheng Yan¹, Huan Wang²•Institutions (2)

National University of Singapore¹, Yale University²

01 Jan 2009

TL;DR: This paper proposes a semi-supervised learning framework based on `1 graph to utilize both labeled and unlabeled data for inference on a graph and demonstrates the superiority of this framework over the counterparts based on traditional graphs.

...read moreread less

Abstract: In this paper, we present a novel semi-supervised learning framework based on `1 graph. The `1 graph is motivated by that each datum can be reconstructed by the sparse linear superposition of the training data. The sparse reconstruction coefficients, used to deduce the weights of the directed `1 graph, are derived by solving an `1 optimization problem on sparse representation. Different from conventional graph construction processes which are generally divided into two independent steps, i.e., adjacency searching and weight selection, the graph adjacency structure as well as the graph weights of the `1 graph is derived simultaneously and in a parameter-free manner. Illuminated by the validated discriminating power of sparse representation in [16], we propose a semi-supervised learning framework based on `1 graph to utilize both labeled and unlabeled data for inference on a graph. Extensive experiments on semi-supervised face recognition and image classification demonstrate the superiority of our proposed semi-supervised learning framework based on `1 graph over the counterparts based on traditional graphs.

...read moreread less

Proceedings Article•DOI•

A Markov Clustering Topic Model for mining behaviour in video

[...]

Timothy M. Hospedales¹, Shaogang Gong¹, Tao Xiang¹•Institutions (1)

Queen Mary University of London¹

01 Sep 2009

TL;DR: A novel Markov Clustering Topic Model (MCTM) is introduced which builds on existing Dynamic Bayesian Network models and Bayesian topic models, and overcomes their drawbacks on accuracy, robustness and computational efficiency.

...read moreread less

Abstract: This paper addresses the problem of fully automated mining of public space video data. A novel Markov Clustering Topic Model (MCTM) is introduced which builds on existing Dynamic Bayesian Network models (e.g. HMMs) and Bayesian topic models (e.g. Latent Dirichlet Allocation), and overcomes their drawbacks on accuracy, robustness and computational efficiency. Specifically, our model profiles complex dynamic scenes by robustly clustering visual events into activities and these activities into global behaviours, and correlates behaviours over time. A collapsed Gibbs sampler is derived for offline learning with unlabeled training data, and significantly, a new approximation to online Bayesian inference is formulated to enable dynamic scene understanding and behaviour mining in new video data online in real-time. The strength of this model is demonstrated by unsupervised learning of dynamic scene models, mining behaviours and detecting salient events in three complex and crowded public scenes.

...read moreread less

A literature survey of active machine learning in the context of natural language processing

[...]

Fredrik Olsson¹•Institutions (1)

Swedish Institute of Computer Science¹

01 Apr 2009

TL;DR: Active learning has been successfully applied to a number of natural language processing tasks, such as, information extraction, named entity recognition, text categorization, part-of-speech tagging, parsing, and word sense disambiguation.

...read moreread less

Abstract: Active learning is a supervised machine learning technique in which the learner is in control of the data used for learning. That control is utilized by the learner to ask an oracle, typically a human with extensive knowledge of the domain at hand, about the classes of the instances for which the model learned so far makes unreliable predictions. The active learning process takes as input a set of labeled examples, as well as a larger set of unlabeled examples, and produces a classifier and a relatively small set of newly labeled data. The overall goal is to create as good a classifier as possible, without having to mark-up and supply the learner with more data than necessary. The learning process aims at keeping the human annotation effort to a minimum, only asking for advice where the training utility of the result of such a query is high. Active learning has been successfully applied to a number of natural language processing tasks, such as, information extraction, named entity recognition, text categorization, part-of-speech tagging, parsing, and word sense disambiguation. This report is a literature survey of active learning from the perspective of natural language processing.

...read moreread less

Journal Article•DOI•

Migration and stopover in a small pelagic seabird, the Manx shearwater Puffinus puffinus: insights from machine learning.

[...]

Tim Guilford¹, Jessica Meade, Jay Willis¹, Richard A. Phillips², Dave Boyle, Stephen J. Roberts¹, Matthew Collett¹, Robin Freeman³, Christopher M. Perrins⁴ - Show less +5 more•Institutions (4)

University of Oxford¹, British Antarctic Survey², Microsoft³, Edward Grey Institute of Field Ornithology⁴

07 Apr 2009-Proceedings of The Royal Society B: Biological Sciences

TL;DR: A successful attempt, using miniature archival light loggers, to elucidate the migratory behaviour of the Manx shearwater Puffinus puffinus, a small Northern Hemisphere breeding procellariform that undertakes a trans-equatorial, trans-Atlantic migration.

...read moreread less

Abstract: The migratory movements of seabirds (especially smaller species) remain poorly understood, despite their role as harvesters of marine ecosystems on a global scale and their potential as indicators of ocean health. Here we report a successful attempt, using miniature archival light loggers (geolocators), to elucidate the migratory behaviour of the Manx shearwater Puffinus puffinus, a small (400 g) Northern Hemisphere breeding procellariform that undertakes a trans-equatorial, trans-Atlantic migration. We provide details of over-wintering areas, of previously unobserved marine stopover behaviour, and the long-distance movements of females during their pre-laying exodus. Using salt-water immersion data from a subset of loggers, we introduce a method of behaviour classification based on Bayesian machine learning techniques. We used both supervised and unsupervised machine learning to classify each bird's daily activity based on simple properties of the immersion data. We show that robust activity states emerge, characteristic of summer feeding, winter feeding and active migration. These can be used to classify probable behaviour throughout the annual cycle, highlighting the likely functional significance of stopovers as refuelling stages.

...read moreread less

Journal Article•DOI•

Reinforcement Learning Versus Model Predictive Control: A Comparison on a Power System Problem

[...]

Damien Ernst, Mevludin Glavic¹, Florin Capitanescu¹, Louis Wehenkel¹•Institutions (1)

University of Liège¹

01 Apr 2009

TL;DR: This paper compares reinforcement learning with model predictive control in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem.

...read moreread less

Abstract: This paper compares reinforcement learning (RL) with model predictive control (MPC) in a unified framework and reports experimental results of their application to the synthesis of a controller for a nonlinear and deterministic electrical power oscillations damping problem. Both families of methods are based on the formulation of the control problem as a discrete-time optimal control problem. The considered MPC approach exploits an analytical model of the system dynamics and cost function and computes open-loop policies by applying an interior-point solver to a minimization problem in which the system dynamics are represented by equality constraints. The considered RL approach infers in a model-free way closed-loop policies from a set of system trajectories and instantaneous cost values by solving a sequence of batch-mode supervised learning problems. The results obtained provide insight into the pros and cons of the two approaches and show that RL may certainly be competitive with MPC even in contexts where a good deterministic system model is available.

...read moreread less

Journal Article•DOI•

Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series

[...]

Xiaozhe Wang¹, Kate Smith-Miles², Rob J. Hyndman²•Institutions (2)

La Trobe University¹, Monash University²

01 Jun 2009-Neurocomputing

TL;DR: A novel approach for selecting a forecasting method for univariate time series based on measurable data characteristics is presented that combines elements of data mining, meta-learning, clustering, classification and statistical measurement.

...read moreread less

Proceedings Article•

Streaming k-means approximation

[...]

Nir Ailon¹, Ragesh Jaiswal², Claire Monteleoni²•Institutions (2)

Google¹, Columbia University²

07 Dec 2009

TL;DR: A clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting, which is applicable to unsupervised learning on massive data sets, or resource-constrained devices.

...read moreread less

Abstract: We provide a clustering algorithm that approximately optimizes the k-means objective, in the one-pass streaming setting. We make no assumptions about the data, and our algorithm is very light-weight in terms of memory, and computation. This setting is applicable to unsupervised learning on massive data sets, or resource-constrained devices. The two main ingredients of our theoretical work are: a derivation of an extremely simple pseudo-approximation batch algorithm for k-means (based on the recent k-means++), in which the algorithm is allowed to output more than k centers, and a streaming clustering algorithm in which batch clustering algorithms are performed on small inputs (fitting in memory) and combined in a hierarchical manner. Empirical evaluations on real and simulated data reveal the practical utility of our method.

...read moreread less

Journal Article•DOI•

Supervised learning with decision tree-based methods in computational and systems biology

[...]

Pierre Geurts¹, Alexandre Irrthum¹, Louis Wehenkel¹•Institutions (1)

University of Liège¹

12 Nov 2009-Molecular BioSystems

TL;DR: The goal of this paper is to provide an accessible and comprehensive introduction to decision tree-based methods and a survey of their applications in the context of computational and systems biology.

...read moreread less

Abstract: At the intersection between artificial intelligence and statistics, supervised learning allows algorithms to automatically build predictive models from just observations of a system. During the last twenty years, supervised learning has been a tool of choice to analyze the always increasing and complexifying data generated in the context of molecular biology, with successful applications in genome annotation, function prediction, or biomarker discovery. Among supervised learning methods, decision tree-based methods stand out as non parametric methods that have the unique feature of combining interpretability, efficiency, and, when used in ensembles of trees, excellent accuracy. The goal of this paper is to provide an accessible and comprehensive introduction to this class of methods. The first part of the review is devoted to an intuitive but complete description of decision tree-based methods and a discussion of their strengths and limitations with respect to other supervised learning methods. The second part of the review provides a survey of their applications in the context of computational and systems biology.

...read moreread less

Collapse