Showing papers on "Active learning (machine learning) published in 2016"

PDF

Open Access

Journal Article•DOI•

[...]

Karl R. Weiss¹, Taghi M. Khoshgoftaar¹, Dingding Wang¹•Institutions (1)

28 May 2016-Journal of Big Data

TL;DR: This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied toTransfer learning, which can be applied to big data environments.

...read moreread less

Abstract: Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.

...read moreread less

2,900 citations

Journal Article•DOI•

Extreme Learning Machine for Multilayer Perceptron

[...]

Jiexiong Tang¹, Chenwei Deng¹, Guang-Bin Huang²•Institutions (2)

Beijing Institute of Technology¹, Nanyang Technological University²

01 Apr 2016-IEEE Transactions on Neural Networks

TL;DR: Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods, and multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme.

...read moreread less

Abstract: Extreme learning machine (ELM) is an emerging learning algorithm for the generalized single hidden layer feedforward neural networks, of which the hidden node parameters are randomly generated and the output weights are analytically computed. However, due to its shallow architecture, feature learning using ELM may not be effective for natural signals (e.g., images/videos), even with a large number of hidden nodes. To address this issue, in this paper, a new ELM-based hierarchical learning framework is proposed for multilayer perceptron. The proposed architecture is divided into two main components: 1) self-taught feature extraction followed by supervised feature classification and 2) they are bridged by random initialized hidden weights. The novelties of this paper are as follows: 1) unsupervised multilayer encoding is conducted for feature extraction, and an ELM-based sparse autoencoder is developed via $\ell _{1}$ constraint. By doing so, it achieves more compact and meaningful feature representations than the original ELM; 2) by exploiting the advantages of ELM random feature mapping, the hierarchically encoded outputs are randomly projected before final decision making, which leads to a better generalization with faster learning speed; and 3) unlike the greedy layerwise training of deep learning (DL), the hidden layers of the proposed framework are trained in a forward manner. Once the previous layer is established, the weights of the current layer are fixed without fine-tuning. Therefore, it has much better learning efficiency than the DL. Extensive experiments on various widely used classification data sets show that the proposed algorithm achieves better and faster convergence than the existing state-of-the-art hierarchical learning methods. Furthermore, multiple applications in computer vision further confirm the generality and capability of the proposed learning scheme.

...read moreread less

1,166 citations

Posted Content•

Learning to Communicate with Deep Multi-Agent Reinforcement Learning

[...]

Jakob Foerster, Yannis M. Assael, Nando de Freitas, Shimon Whiteson

21 May 2016-arXiv: Artificial Intelligence

TL;DR: By embracing deep neural networks, this work is able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability.

...read moreread less

Abstract: We consider the problem of multiple agents sensing and acting in environments with the goal of maximising their shared utility. In these environments, agents must learn communication protocols in order to share information that is needed to solve the tasks. By embracing deep neural networks, we are able to demonstrate end-to-end learning of protocols in complex environments inspired by communication riddles and multi-agent computer vision problems with partial observability. We propose two approaches for learning in these domains: Reinforced Inter-Agent Learning (RIAL) and Differentiable Inter-Agent Learning (DIAL). The former uses deep Q-learning, while the latter exploits the fact that, during learning, agents can backpropagate error derivatives through (noisy) communication channels. Hence, this approach uses centralised learning but decentralised execution. Our experiments introduce new environments for studying the learning of communication protocols and present a set of engineering innovations that are essential for success in these domains.

...read moreread less

1,007 citations

Posted Content•

Generative Adversarial Imitation Learning

[...]

Jonathan Ho¹, Stefano Ermon²•Institutions (2)

OpenAI¹, Stanford University²

10 Jun 2016-arXiv: Learning

TL;DR: A new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning, is proposed and a certain instantiation of this framework draws an analogy between imitation learning and generative adversarial networks.

...read moreread less

Abstract: Consider learning a policy from example expert behavior, without interaction with the expert or access to reinforcement signal. One approach is to recover the expert's cost function with inverse reinforcement learning, then extract a policy from that cost function with reinforcement learning. This approach is indirect and can be slow. We propose a new general framework for directly extracting a policy from data, as if it were obtained by reinforcement learning following inverse reinforcement learning. We show that a certain instantiation of our framework draws an analogy between imitation learning and generative adversarial networks, from which we derive a model-free imitation learning algorithm that obtains significant performance gains over existing model-free methods in imitating complex behaviors in large, high-dimensional environments.

...read moreread less

964 citations

Posted Content•

Matching Networks for One Shot Learning

[...]

Oriol Vinyals¹, Charles Blundell¹, Timothy P. Lillicrap¹, Koray Kavukcuoglu¹, Daan Wierstra¹ - Show less +1 more•Institutions (1)

Google¹

13 Jun 2016-arXiv: Learning

TL;DR: This work employs ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories to learn a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types.

...read moreread less

Abstract: Learning from a few examples remains a key challenge in machine learning. Despite recent advances in important domains such as vision and language, the standard supervised deep learning paradigm does not offer a satisfactory solution for learning new concepts rapidly from little data. In this work, we employ ideas from metric learning based on deep neural features and from recent advances that augment neural networks with external memories. Our framework learns a network that maps a small labelled support set and an unlabelled example to its label, obviating the need for fine-tuning to adapt to new class types. We then define one-shot learning problems on vision (using Omniglot, ImageNet) and language tasks. Our algorithm improves one-shot accuracy on ImageNet from 87.6% to 93.2% and from 88.0% to 93.8% on Omniglot compared to competing approaches. We also demonstrate the usefulness of the same model on language modeling by introducing a one-shot task on the Penn Treebank.

...read moreread less

856 citations

Journal Article•DOI•

Interactive machine learning for health informatics: when do we need the human-in-the-loop?

[...]

Andreas Holzinger¹•Institutions (1)

University of Graz¹

02 Mar 2016-Brain Informatics

TL;DR: Interactive machine learning (iML) is defined as “algorithms that can interact with agents and can optimize their learning behavior through these interactions, where the agents can also be human.”

...read moreread less

Abstract: Machine learning (ML) is the fastest growing field in computer science, and health informatics is among the greatest challenges. The goal of ML is to develop algorithms which can learn and improve over time and can be used for predictions. Most ML researchers concentrate on automatic machine learning (aML), where great advances have been made, for example, in speech recognition, recommender systems, or autonomous vehicles. Automatic approaches greatly benefit from big data with many training sets. However, in the health domain, sometimes we are confronted with a small number of data sets or rare events, where aML-approaches suffer of insufficient training samples. Here interactive machine learning (iML) may be of help, having its roots in reinforcement learning, preference learning, and active learning. The term iML is not yet well used, so we define it as “algorithms that can interact with agents and can optimize their learning behavior through these interactions, where the agents can also be human.” This “human-in-the-loop” can be beneficial in solving computationally hard problems, e.g., subspace clustering, protein folding, or k-anonymization of health data, where human expertise can help to reduce an exponential search space through heuristic selection of samples. Therefore, what would otherwise be an NP-hard problem, reduces greatly in complexity through the input and the assistance of a human agent involved in the learning phase.

...read moreread less

651 citations

Journal Article•DOI•

A survey of machine learning for big data processing

[...]

Junfei Qiu¹, Qihui Wu¹, Guoru Ding¹, Yuhua Xu¹, Shuo Feng¹ - Show less +1 more•Institutions (1)

Penn State College of Communications¹

28 May 2016-EURASIP Journal on Advances in Signal Processing

TL;DR: A literature survey of the latest advances in researches on machine learning for big data processing finds some promising learning methods in recent studies, such as representation learning, deep learning, distributed and parallel learning, transfer learning, active learning, and kernel-based learning.

...read moreread less

Abstract: There is no doubt that big data are now rapidly expanding in all science and engineering domains. While the potential of these massive data is undoubtedly significant, fully making sense of them requires new ways of thinking and novel learning techniques to address the various challenges. In this paper, we present a literature survey of the latest advances in researches on machine learning for big data processing. First, we review the machine learning techniques and highlight some promising learning methods in recent studies, such as representation learning, deep learning, distributed and parallel learning, transfer learning, active learning, and kernel-based learning. Next, we focus on the analysis and discussions about the challenges and possible solutions of machine learning for big data. Following that, we investigate the close connections of machine learning with signal processing techniques for big data processing. Finally, we outline several open issues and research trends.

...read moreread less

636 citations

Proceedings Article•

Learning to Navigate in Complex Environments

[...]

Piotr Mirowski¹, Razvan Pascanu¹, Fabio Viola², Hubert Soyer³, Andrew J. Ballard⁴, Andrea Banino¹, Misha Denil⁵, Ross Goroshin⁶, Laurent Sifre¹, Koray Kavukcuoglu¹, Dharshan Kumaran¹, Raia Hadsell¹ - Show less +8 more•Institutions (6)

Google¹, University of Palermo², National Institute of Informatics³, University of Cambridge⁴, University of Oxford⁵, Courant Institute of Mathematical Sciences⁶

11 Nov 2016

TL;DR: This work considers jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks and shows that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs.

...read moreread less

Abstract: Learning to navigate in complex environments with dynamic elements is an important milestone in developing AI agents. In this work we formulate the navigation question as a reinforcement learning problem and show that data efficiency and task performance can be dramatically improved by relying on additional auxiliary tasks leveraging multimodal sensory inputs. In particular we consider jointly learning the goal-driven reinforcement learning problem with auxiliary depth prediction and loop closure classification tasks. This approach can learn to navigate from raw sensory input in complicated 3D mazes, approaching human-level performance even under conditions where the goal location changes frequently. We provide detailed analysis of the agent behaviour, its ability to localise, and its network activity dynamics, showing that the agent implicitly learns key navigation abilities.

...read moreread less

556 citations

Book•

Lifelong Machine Learning

[...]

Zhiyuan Chen¹, Bing Liu¹•Institutions (1)

University of Illinois at Chicago¹

07 Nov 2016

TL;DR: As statistical machine learning matures, it is time to make a major effort to break the isolated learning tradition and to study lifelong learning to bring machine learning to new heights.

...read moreread less

Abstract: Lifelong Machine Learning (or Lifelong Learning) is an advanced machine learning paradigm that learns continuously, accumulates the knowledge learned in previous tasks, and uses it to help future learning. In the process, the learner becomes more and more knowledgeable and effective at learning. This learning ability is one of the hallmarks of human intelligence. However, the current dominant machine learning paradigm learns in isolation: given a training dataset, it runs a machine learning algorithm on the dataset to produce a model. It makes no attempt to retain the learned knowledge and use it in future learning. Although this isolated learning paradigm has been very successful, it requires a large number of training examples, and is only suitable for well-defined and narrow tasks. In comparison, we humans can learn effectively with a few examples because we have accumulated so much knowledge in the past which enables us to learn with little data or effort. Lifelong learning aims to achieve this capability. As statistical machine learning matures, it is time to make a major effort to break the isolated learning tradition and to study lifelong learning to bring machine learning to new heights. Applications such as intelligent assistants, chatbots, and physical robots that interact with humans and systems in real-life environments are also calling for such lifelong learning capabilities. Without the ability to accumulate the learned knowledge and use it to learn more knowledge incrementally, a system will probably never be truly intelligent. This book serves as an introductory text and survey to lifelong learning.

...read moreread less

542 citations

Journal Article•DOI•

Machine Learning Methods for Attack Detection in the Smart Grid

[...]

Mete Ozay¹, Inaki Esnaola², Fatos T. Yarman Vural³, Sanjeev R. Kulkarni², H. Vincent Poor² - Show less +1 more•Institutions (3)

University of Birmingham¹, Princeton University², Middle East Technical University³

01 Aug 2016-IEEE Transactions on Neural Networks

TL;DR: Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

...read moreread less

Abstract: Attack detection problems in the smart grid are posed as statistical learning problems for different attack scenarios in which the measurements are observed in batch or online settings. In this approach, machine learning algorithms are used to classify measurements as being either secure or attacked. An attack detection framework is provided to exploit any available prior knowledge about the system and surmount constraints arising from the sparse structure of the problem in the proposed approach. Well-known batch and online learning algorithms (supervised and semisupervised) are employed with decision- and feature-level fusion to model the attack detection problem. The relationships between statistical and geometric properties of attack vectors employed in the attack scenarios and learning algorithms are analyzed to detect unobservable attacks using statistical learning methods. The proposed algorithms are examined on various IEEE test systems. Experimental analyses show that machine learning algorithms can detect attacks with performances higher than attack detection algorithms that employ state vector estimation methods in the proposed attack detection framework.

...read moreread less

470 citations

Journal Article•DOI•

Traffic signal timing via deep reinforcement learning

[...]

Li Li¹, Yisheng Lv², Fei-Yue Wang²•Institutions (2)

Tsinghua University¹, Chinese Academy of Sciences²

10 Jul 2016-IEEE/CAA Journal of Automatica Sinica

TL;DR: A set of algorithms to design signal timing plans via deep reinforcement learning to set up a deep neural network to learn the Q-function of reinforcement learning from the sampled traffic state/control inputs and the corresponding traffic system performance output.

...read moreread less

Abstract: In this paper, we propose a set of algorithms to design signal timing plans via deep reinforcement learning. The core idea of this approach is to set up a deep neural network (DNN) to learn the Q-function of reinforcement learning from the sampled traffic state/control inputs and the corresponding traffic system performance output. Based on the obtained DNN, we can find the appropriate signal timing policies by implicitly modeling the control actions and the change of system states. We explain the possible benefits and implementation tricks of this new approach. The relationships between this new approach and some existing approaches are also carefully discussed.

...read moreread less

Journal Article•DOI•

Label Distribution Learning

[...]

Xin Geng¹•Institutions (1)

Southeast University¹

01 Jul 2016-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes six working LDL algorithms in three ways: problem transformation, algorithm adaptation, and specialized algorithm design, and results show clear advantages of the specialized algorithms, which indicates the importance of special design for the characteristics of the LDL problem.

...read moreread less

Abstract: Although multi-label learning can deal with many problems with label ambiguity, it does not fit some real applications well where the overall distribution of the importance of the labels matters. This paper proposes a novel learning paradigm named label distribution learning (LDL) for such kind of applications. The label distribution covers a certain number of labels, representing the degree to which each label describes the instance. LDL is a more general learning framework which includes both single-label and multi-label learning as its special cases. This paper proposes six working LDL algorithms in three ways: problem transformation, algorithm adaptation, and specialized algorithm design. In order to compare the performance of the LDL algorithms, six representative and diverse evaluation measures are selected via a clustering analysis, and the first batch of label distribution datasets are collected and made publicly available. Experimental results on one artificial and 15 real-world datasets show clear advantages of the specialized algorithms, which indicates the importance of special design for the characteristics of the LDL problem.

...read moreread less

Proceedings Article•DOI•

Unsupervised Cross-Dataset Transfer Learning for Person Re-identification

[...]

Peixi Peng¹, Tao Xiang², Yaowei Wang³, Massimiliano Pontil⁴, Shaogang Gong², Tiejun Huang¹, Yonghong Tian¹ - Show less +3 more•Institutions (4)

Peking University¹, Queen Mary University of London², Beijing Institute of Technology³, Istituto Italiano di Tecnologia⁴

01 Jun 2016

TL;DR: This work presents an multi-task dictionary learning method which is able to learn a dataset-shared but target-data-biased representation, and demonstrates that the method significantly outperforms the state-of-the-art.

...read moreread less

Abstract: Most existing person re-identification (Re-ID) approaches follow a supervised learning framework, in which a large number of labelled matching pairs are required for training. This severely limits their scalability in realworld applications. To overcome this limitation, we develop a novel cross-dataset transfer learning approach to learn a discriminative representation. It is unsupervised in the sense that the target dataset is completely unlabelled. Specifically, we present an multi-task dictionary learning method which is able to learn a dataset-shared but targetdata-biased representation. Experimental results on five benchmark datasets demonstrate that the method significantly outperforms the state-of-the-art.

...read moreread less

Journal Article•DOI•

Quantum-Enhanced Machine Learning.

[...]

Vedran Dunjko, Jacob M. Taylor¹, Jacob M. Taylor², Hans J. Briegel•Institutions (2)

University of Maryland, College Park¹, National Institute of Standards and Technology²

20 Sep 2016-Physical Review Letters

TL;DR: This work proposes an approach for the systematic treatment of machine learning, from the perspective of quantum information, and shows that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

...read moreread less

Abstract: The emerging field of quantum machine learning has the potential to substantially aid in the problems and scope of artificial intelligence. This is only enhanced by recent successes in the field of classical machine learning. In this work we propose an approach for the systematic treatment of machine learning, from the perspective of quantum information. Our approach is general and covers all three main branches of machine learning: supervised, unsupervised, and reinforcement learning. While quantum improvements in supervised and unsupervised learning have been reported, reinforcement learning has received much less attention. Within our approach, we tackle the problem of quantum enhancements in reinforcement learning as well, and propose a systematic scheme for providing improvements. As an example, we show that quadratic improvements in learning efficiency, and exponential improvements in performance over limited time periods, can be obtained for a broad class of learning problems.

...read moreread less

Book•DOI•

Machine Learning Models and Algorithms for Big Data Classification

[...]

Shan Suthaharan

01 Jan 2016

Proceedings Article•

A review of supervised machine learning algorithms

[...]

Amanpreet Singh¹, Narina Thakur¹, Aakanksha Sharma¹•Institutions (1)

Bharati Vidyapeeth's College of Engineering¹

16 Mar 2016

TL;DR: The efficacy of supervised machine learning algorithms in terms of the accuracy, speed of learning, complexity and risk of over fitting measures is discussed.

...read moreread less

Abstract: Supervised machine learning is the construction of algorithms that are able to produce general patterns and hypotheses by using externally supplied instances to predict the fate of future instances. Supervised machine learning classification algorithms aim at categorizing data from prior information. Classification is carried out very frequently in data science problems. Various successful techniques have been proposed to solve such problems viz. Rule-based techniques, Logic-based techniques, Instance-based techniques, stochastic techniques. This paper discusses the efficacy of supervised machine learning algorithms in terms of the accuracy, speed of learning, complexity and risk of over fitting measures. The main objective of this paper is to provide a general comparison with state of art machine learning algorithms.

...read moreread less

Proceedings Article•

Cooperative Inverse Reinforcement Learning

[...]

Dylan Hadfield-Menell¹, Stuart Russell¹, Pieter Abbeel¹, Anca D. Dragan¹•Institutions (1)

University of California, Berkeley¹

01 Jun 2016

TL;DR: In this article, a cooperative inverse reinforcement learning (CIRL) game with two agents, a human and a robot, is considered, where the human is assumed to act optimally in isolation and the robot does not initially know what its reward function is.

...read moreread less

Abstract: For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, partial- information game with two agents, human and robot; both are rewarded according to the human’s reward function, but the robot does not initially know what this is. In contrast to classical IRL, where the human is assumed to act optimally in isolation, optimal CIRL solutions produce behaviors such as active teaching, active learning, and communicative actions that are more effective in achieving value alignment. We show that computing optimal joint policies in CIRL games can be reduced to solving a POMDP, prove that optimality in isolation is suboptimal in CIRL, and derive an approximate CIRL algorithm.

...read moreread less

Journal Article•DOI•

Assessing small failure probabilities by AK–SS: An active learning method combining Kriging and Subset Simulation

[...]

Xiaoxu Huang¹, Jianqiao Chen¹, Hongping Zhu¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Mar 2016-Structural Safety

TL;DR: AK–SS: an active learning method combining Kriging model and SS can provide accurate solutions more efficiently, making it a promising approach for structural reliability analyses involving small failure probabilities, high-dimensional performance functions, and time-consuming simulation codes in practical engineering.

...read moreread less

Proceedings Article•DOI•

TensorFlow: learning functions at scale

[...]

Martín Abadi¹•Institutions (1)

Google¹

04 Sep 2016

TL;DR: This talk describes Tensor Flow and outlines some of its applications, and discusses the question of what TensorFlow and deep learning may have to do with functional programming.

...read moreread less

Abstract: TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Its computational model is based on dataflow graphs with mutable state. Graph nodes may be mapped to different machines in a cluster, and within each machine to CPUs, GPUs, and other devices. TensorFlow supports a variety of applications, but it particularly targets training and inference with deep neural networks. It serves as a platform for research and for deploying machine learning systems across many areas, such as speech recognition, computer vision, robotics, information retrieval, and natural language processing. In this talk, we describe TensorFlow and outline some of its applications. We also discuss the question of what TensorFlow and deep learning may have to do with functional programming. Although TensorFlow is not purely functional, many of its uses are concerned with optimizing functions (during training), then with applying those functions (during inference). These functions are defined as compositions of simple primitives (as is common in functional programming), with internal data representations that are learned rather than manually designed. TensorFlow is joint work with many other people in the Google Brain team and elsewhere. More information is available at tensorflow.org.

...read moreread less

Journal Article•DOI•

A review of automatic selection methods for machine learning algorithms and hyper-parameter values

[...]

Gang Luo¹•Institutions (1)

University of Utah¹

23 May 2016-Network Modeling Analysis in Health Informatics and BioInformatics

TL;DR: These findings establish a foundation for future research on automatically selecting algorithms and hyper-parameter values for analyzing big biomedical data and identify several of their limitations in thebig biomedical data environment.

...read moreread less

Abstract: Machine learning studies automatic algorithms that improve themselves through experience. It is widely used for analyzing and extracting value from large biomedical data sets, or “big biomedical data,” advancing biomedical research, and improving healthcare. Before a machine learning model is trained, the user of a machine learning software tool typically must manually select a machine learning algorithm and set one or more model parameters termed hyper-parameters. The algorithm and hyper-parameter values used can greatly impact the resulting model’s performance, but their selection requires special expertise as well as many labor-intensive manual iterations. To make machine learning accessible to layman users with limited computing expertise, computer science researchers have proposed various automatic selection methods for algorithms and/or hyper-parameter values for a given supervised machine learning problem. This paper reviews these methods, identifies several of their limitations in the big biomedical data environment, and provides preliminary thoughts on how to address these limitations. These findings establish a foundation for future research on automatically selecting algorithms and hyper-parameter values for analyzing big biomedical data.

...read moreread less

Proceedings Article•DOI•

Learning invariants using decision trees and implication counterexamples

[...]

Pranav Garg¹, Daniel Neider¹, P. Madhusudan¹, Dan Roth¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

11 Jan 2016

TL;DR: The first learning algorithms in this model with implication counter-examples that are based on machine learning techniques are proposed, and a decision-tree learning algorithm is developed that is guaranteed to converge to the right concept (invariant) if one exists.

...read moreread less

Abstract: Inductive invariants can be robustly synthesized using a learning model where the teacher is a program verifier who instructs the learner through concrete program configurations, classified as positive, negative, and implications. We propose the first learning algorithms in this model with implication counter-examples that are based on machine learning techniques. In particular, we extend classical decision-tree learning algorithms in machine learning to handle implication samples, building new scalable ways to construct small decision trees using statistical measures. We also develop a decision-tree learning algorithm in this model that is guaranteed to converge to the right concept (invariant) if one exists. We implement the learners and an appropriate teacher, and show that the resulting invariant synthesis is efficient and convergent for a large suite of programs.

...read moreread less

Journal Article•

The benefit of multitask representation learning

[...]

Andreas Maurer, Massimiliano Pontil¹, Bernardino Romera-Paredes²•Institutions (2)

Istituto Italiano di Tecnologia¹, University of Oxford²

01 Jan 2016-Journal of Machine Learning Research

TL;DR: In this paper, the authors discuss a general method to learn data representations from multiple tasks and provide a justification for this method in both settings of multitask learning and learning-to-learn.

...read moreread less

Abstract: We discuss a general method to learn data representations from multiple tasks. We provide a justification for this method in both settings of multitask learning and learning-to-learn. The method is illustrated in detail in the special case of linear feature learning. Conditions on the theoretical advantage offered by multitask representation learning over independent task learning are established. In particular, focusing on the important example of half-space learning, we derive the regime in which multitask representation learning is beneficial over independent task learning, as a function of the sample size, the number of tasks and the intrinsic data dimensionality. Other potential applications of our results include multitask feature learning in reproducing kernel Hilbert spaces and multilayer, deep networks.

...read moreread less

Posted Content•

Learning feed-forward one-shot learners

[...]

Luca Bertinetto¹, João F. Henriques¹, Jack Valmadre¹, Philip H. S. Torr¹, Andrea Vedaldi¹ - Show less +1 more•Institutions (1)

University of Oxford¹

16 Jun 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, the authors propose a method to learn the parameters of a deep model in one shot by minimizing a one-shot classification objective in a learning-to-learn formulation.

...read moreread less

Abstract: One-shot learning is usually tackled by using generative models or discriminative embeddings. Discriminative methods based on deep learning, which are very effective in other learning scenarios, are ill-suited for one-shot learning as they need large amounts of training data. In this paper, we propose a method to learn the parameters of a deep model in one shot. We construct the learner as a second deep network, called a learnet, which predicts the parameters of a pupil network from a single exemplar. In this manner we obtain an efficient feed-forward one-shot learner, trained end-to-end by minimizing a one-shot classification objective in a learning to learn formulation. In order to make the construction feasible, we propose a number of factorizations of the parameters of the pupil network. We demonstrate encouraging results by learning characters from single exemplars in Omniglot, and by tracking visual objects from a single initial exemplar in the Visual Object Tracking benchmark.

...read moreread less

Posted Content•

Optimization Methods for Large-Scale Machine Learning

[...]

Léon Bottou¹, Frank E. Curtis², Jorge Nocedal³•Institutions (3)

Facebook¹, Lehigh University², Northwestern University³

15 Jun 2016-arXiv: Machine Learning

TL;DR: A major theme of this study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter, leading to a discussion about the next generation of optimization methods for large- scale machine learning.

...read moreread less

Abstract: This paper provides a review and commentary on the past, present, and future of numerical optimization algorithms in the context of machine learning applications. Through case studies on text classification and the training of deep neural networks, we discuss how optimization problems arise in machine learning and what makes them challenging. A major theme of our study is that large-scale machine learning represents a distinctive setting in which the stochastic gradient (SG) method has traditionally played a central role while conventional gradient-based nonlinear optimization techniques typically falter. Based on this viewpoint, we present a comprehensive theory of a straightforward, yet versatile SG algorithm, discuss its practical behavior, and highlight opportunities for designing algorithms with improved performance. This leads to a discussion about the next generation of optimization methods for large-scale machine learning, including an investigation of two main streams of research on techniques that diminish noise in the stochastic directions and methods that make use of second-order derivative approximations.

...read moreread less

Book•

Machine Learning: Algorithms and Applications

[...]

Mohssen Mohammed, Muhammad Badruddin Khan, Eihab Bashier Mohammed Bashier¹•Institutions (1)

Al Baha University¹

18 Jul 2016

TL;DR: The authors provides a more practical approach by explaining the concepts of machine learning algorithms and describing the areas of application for each algorithm, using simple practical examples to demonstrate each algorithm and showing how different issues related to these algorithms are applied.

...read moreread less

Abstract: Machine learning, one of the top emerging sciences, has an extremely broad range of applications. However, many books on the subject provide only a theoretical approach, making it difficult for a newcomer to grasp the subject material. This book provides a more practical approach by explaining the concepts of machine learning algorithms and describing the areas of application for each algorithm, using simple practical examples to demonstrate each algorithm and showing how different issues related to these algorithms are applied.

...read moreread less

Proceedings Article•DOI•

TABLA: A unified template-based framework for accelerating statistical machine learning

[...]

Divya Mahajan¹, Jongse Park¹, Emmanuel Amaro¹, Hardik Sharma¹, Amir Yazdanbakhsh¹, Joon Kyung Kim¹, Hadi Esmaeilzadeh¹ - Show less +3 more•Institutions (1)

Georgia Institute of Technology¹

12 Mar 2016

TL;DR: TABLA provides a template-based framework that generates accelerators for a class of machine learning algorithms and rigorously compares the benefits of FPGA acceleration to multi-core CPUs and many-core GPUs using real hardware measurements.

...read moreread less

Abstract: A growing number of commercial and enterprise systems increasingly rely on compute-intensive Machine Learning (ML) algorithms. While the demand for these compute-intensive applications is growing, the performance benefits from general-purpose platforms are diminishing. Field Programmable Gate Arrays (FPGAs) provide a promising path forward to accommodate the needs of machine learning algorithms and represent an intermediate point between the efficiency of ASICs and the programmability of general-purpose processors. However, acceleration with FPGAs still requires long development cycles and extensive expertise in hardware design. To tackle this challenge, instead of designing an accelerator for a machine learning algorithm, we present TABLA, a framework that generates accelerators for a class of machine learning algorithms. The key is to identify the commonalities across a wide range of machine learning algorithms and utilize this commonality to provide a high-level abstraction for programmers. TABLA leverages the insight that many learning algorithms can be expressed as a stochastic optimization problem. Therefore, learning becomes solving an optimization problem using stochastic gradient descent that minimizes an objective function over the training data. The gradient descent solver is fixed while the objective function changes for different learning algorithms. TABLA provides a template-based framework to accelerate this class of learning algorithms. Therefore, a developer can specify the learning task by only expressing the gradient of the objective function using our high-level language. Tabla then automatically generates the synthesizable implementation of the accelerator for FPGA realization using a set of hand-optimized templates. We use Tabla to generate accelerators for ten different learning tasks targeted at a Xilinx Zynq FPGA platform. We rigorously compare the benefits of FPGA acceleration to multi-core CPUs (ARM Cortex A15 and Xeon E3) and many-core GPUs (Tegra K1, GTX 650 Ti, and Tesla K40) using real hardware measurements. TABLA-generated accelerators provide 19.4x and 2.9x average speedup over the ARM and Xeon processors, respectively. These accelerators provide 17.57x, 20.2x, and 33.4x higher Performance-per-Watt in comparison to Tegra, GTX 650 Ti and Tesla, respectively. These benefits are achieved while the programmers write less than 50 lines of code.

...read moreread less

Journal Article•DOI•

Why Deep Learning Works: A Manifold Disentanglement Perspective

[...]

Pratik Prabhanjan Brahma¹, Dapeng Wu¹, Yiyuan She²•Institutions (2)

University of Florida¹, Florida State University²

01 Oct 2016-IEEE Transactions on Neural Networks

TL;DR: This paper provides quantitative evidence to validate the flattening hypothesis and proposes a few quantities for measuring manifold entanglement under certain assumptions and conducts experiments with both synthetic and real-world data, which validate the proposition and lead to new insights on deep learning.

...read moreread less

Abstract: Deep hierarchical representations of the data have been found out to provide better informative features for several machine learning applications. In addition, multilayer neural networks surprisingly tend to achieve better performance when they are subject to an unsupervised pretraining. The booming of deep learning motivates researchers to identify the factors that contribute to its success. One possible reason identified is the flattening of manifold-shaped data in higher layers of neural networks. However, it is not clear how to measure the flattening of such manifold-shaped data and what amount of flattening a deep neural network can achieve. For the first time, this paper provides quantitative evidence to validate the flattening hypothesis. To achieve this, we propose a few quantities for measuring manifold entanglement under certain assumptions and conduct experiments with both synthetic and real-world data. Our experimental results validate the proposition and lead to new insights on deep learning.

...read moreread less

Posted Content•

Active Deep Learning for Classification of Hyperspectral Images

[...]

Peng Liu¹, Hui Zhang, Kie B. Eom²•Institutions (2)

Chinese Academy of Sciences¹, George Washington University²

30 Nov 2016-arXiv: Learning

TL;DR: In this paper, an active deep learning algorithm based on a weighted incremental dictionary learning is proposed for hyperspectral images classification, which selects training samples that maximize two selection criteria, namely representative and uncertainty.

...read moreread less

Abstract: Active deep learning classification of hyperspectral images is considered in this paper. Deep learning has achieved success in many applications, but good-quality labeled samples are needed to construct a deep learning network. It is expensive getting good labeled samples in hyperspectral images for remote sensing applications. An active learning algorithm based on a weighted incremental dictionary learning is proposed for such applications. The proposed algorithm selects training samples that maximize two selection criteria, namely representative and uncertainty. This algorithm trains a deep network efficiently by actively selecting training samples at each iteration. The proposed algorithm is applied for the classification of hyperspectral images, and compared with other classification algorithms employing active learning. It is shown that the proposed algorithm is efficient and effective in classifying hyperspectral images.

...read moreread less

Book•

Machine Learning : The New AI

[...]

Ethem Alpaydin

30 Sep 2016

TL;DR: This book offers an account of how digital technology advanced from number-crunching mainframes to mobile devices, putting today's machine learning boom in context and some future directions for machine learning and the new field of "data science," and discusses the ethical and legal implications for data privacy and security.

...read moreread less

Abstract: Today, machine learning underlies a range of applications we use every day, from product recommendations to voice recognition -- as well as some we don't yet use everyday, including driverless cars. It is the basis of the new approach in computing where we do not write programs but collect data; the idea is to learn the algorithms for the tasks automatically from data. As computing devices grow more ubiquitous, a larger part of our lives and work is recorded digitally, and as "Big Data" has gotten bigger, the theory of machine learning -- the foundation of efforts to process that data into knowledge -- has also advanced. In this book, machine learning expert Ethem Alpaydin offers a concise overview of the subject for the general reader, describing its evolution, explaining important learning algorithms, and presenting example applications. Alpaydin offers an account of how digital technology advanced from number-crunching mainframes to mobile devices, putting today's machine learning boom in context. He describes the basics of machine learning and some applications; the use of machine learning algorithms for pattern recognition; artificial neural networks inspired by the human brain; algorithms that learn associations between instances, with such applications as customer segmentation and learning recommendations; and reinforcement learning, when an autonomous agent learns act so as to maximize reward and minimize penalty. Alpaydin then considers some future directions for machine learning and the new field of "data science," and discusses the ethical and legal implications for data privacy and security.

...read moreread less

Journal Article•DOI•

Image Classification by Cross-Media Active Learning With Privileged Information

[...]

Yan Yan¹, Feiping Nie², Wen Li³, Chenqiang Gao⁴, Yi Yang¹, Dong Xu⁵ - Show less +2 more•Institutions (5)

University of Technology, Sydney¹, Northwestern Polytechnical University², ETH Zurich³, Chongqing University of Posts and Telecommunications⁴, University of Sydney⁵

01 Dec 2016-IEEE Transactions on Multimedia

TL;DR: A novel cross-media active learning algorithm is proposed to reduce the effort on labeling images for training, and train classifiers on both visual features and privileged information, and measure the uncertainty of unlabeled data by exploiting the learned classifiers and slacking function.

...read moreread less

Abstract: In this paper, we propose a novel cross-media active learning algorithm to reduce the effort on labeling images for training. The Internet images are often associated with rich textual descriptions. Even though such textual information is not available in test images, it is still useful for learning robust classifiers. In light of this, we apply the recently proposed supervised learning paradigm, learning using privileged information, to the active learning task. Specifically, we train classifiers on both visual features and privileged information, and measure the uncertainty of unlabeled data by exploiting the learned classifiers and slacking function. Then, we propose to select unlabeled samples by jointly measuring the cross-media uncertainty and the visual diversity. Our method automatically learns the optimal tradeoff parameter between the two measurements, which in turn makes our algorithms particularly suitable for real-world applications. Extensive experiments demonstrate the effectiveness of our approach.

...read moreread less

Collapse