scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 2015"


Book
01 Jan 2015
TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.
Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

3,857 citations


Proceedings ArticleDOI
10 Dec 2015
TL;DR: Experimental results indicate that the proposed feature selection based on mutual information criterion is capable of improving the performance of the machine learning models in terms of prediction accuracy and reduction in training time.
Abstract: The application of machine learning models such as support vector machine (SVM) and artificial neural networks (ANN) in predicting reservoir properties has been effective in the recent years when compared with the traditional empirical methods. Despite that the machine learning models suffer a lot in the faces of uncertain data which is common characteristics of well log dataset. The reason for uncertainty in well log dataset includes a missing scale, data interpretation and measurement error problems. Feature Selection aimed at selecting feature subset that is relevant to the predicting property. In this paper a feature selection based on mutual information criterion is proposed, the strong point of this method relies on the choice of threshold based on statistically sound criterion for the typical greedy feedforward method of feature selection. Experimental results indicate that the proposed method is capable of improving the performance of the machine learning models in terms of prediction accuracy and reduction in training time.

825 citations


Journal ArticleDOI
TL;DR: This paper systematically examines computational intelligence-based transfer learning techniques and clusters related technique developments into four main categories and provides state-of-the-art knowledge that will directly support researchers and practice-based professionals to understand the developments in computational Intelligence- based transfer learning research and applications.
Abstract: Transfer learning aims to provide a framework to utilize previously-acquired knowledge to solve new but similar problems much more quickly and effectively. In contrast to classical machine learning methods, transfer learning methods exploit the knowledge accumulated from data in auxiliary domains to facilitate predictive modeling consisting of different data patterns in the current domain. To improve the performance of existing transfer learning methods and handle the knowledge transfer process in real-world systems, computational intelligence has recently been applied in transfer learning. This paper systematically examines computational intelligence-based transfer learning techniques and clusters related technique developments into four main categories: (a) neural network-based transfer learning; (b) Bayes-based transfer learning; (c) fuzzy transfer learning, and (d) applications of computational intelligence-based transfer learning. By providing state-of-the-art knowledge, this survey will directly support researchers and practice-based professionals to understand the developments in computational intelligence-based transfer learning research and applications.

662 citations


Journal ArticleDOI
TL;DR: A systematic overview of the emerging field of quantum machine learning can be found in this paper, which presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.
Abstract: Machine learning algorithms learn a desired input-output relation from examples in order to interpret new inputs. This is important for tasks such as image and speech recognition or strategy optimisation, with growing applications in the IT industry. In the last couple of years, researchers investigated if quantum computing can help to improve classical machine learning algorithms. Ideas range from running computationally costly algorithms or their subroutines efficiently on a quantum computer to the translation of stochastic methods into the language of quantum theory. This contribution gives a systematic overview of the emerging field of quantum machine learning. It presents the approaches as well as technical details in an accessible way, and discusses the potential of a future theory of quantum learning.

580 citations


Journal ArticleDOI
TL;DR: This paper learns a probabilistic, non-parametric Gaussian process transition model of the system and applies it to autonomous learning in real robot and control tasks, achieving an unprecedented speed of learning.
Abstract: Autonomous learning has been a promising direction in control and robotics for more than a decade since data-driven learning allows to reduce the amount of engineering knowledge, which is otherwise required. However, autonomous reinforcement learning (RL) approaches typically require many interactions with the system to learn controllers, which is a practical limitation in real systems, such as robots, where many interactions can be impractical and time consuming. To address this problem, current learning approaches typically require task-specific knowledge in form of expert demonstrations, realistic simulators, pre-shaped policies, or specific knowledge about the underlying dynamics. In this paper, we follow a different approach and speed up learning by extracting more information from data. In particular, we learn a probabilistic, non-parametric Gaussian process transition model of the system. By explicitly incorporating model uncertainty into long-term planning and controller learning our approach reduces the effects of model errors, a key problem in model-based learning. Compared to state-of-the art RL our model-based policy search method achieves an unprecedented speed of learning. We demonstrate its applicability to autonomous learning in real robot and control tasks.

575 citations


Proceedings ArticleDOI
18 Mar 2015
TL;DR: An empirical evaluation shows that Explanatory Debugging increased participants' understanding of the learning system by 52% and allowed participants to correct its mistakes up to twice as efficiently as participants using a traditional learning system.
Abstract: How can end users efficiently influence the predictions that machine learning systems make on their behalf? This paper presents Explanatory Debugging, an approach in which the system explains to users how it made each of its predictions, and the user then explains any necessary corrections back to the learning system. We present the principles underlying this approach and a prototype instantiating it. An empirical evaluation shows that Explanatory Debugging increased participants' understanding of the learning system by 52% and allowed participants to correct its mistakes up to twice as efficiently as participants using a traditional learning system.

445 citations


01 Jan 2015

422 citations


Proceedings ArticleDOI
07 Jun 2015
TL;DR: This paper attempts to model deep learning in a weakly supervised learning (multiple instance learning) framework, where each image follows a dual multi-instance assumption, where its object proposals and possible text annotations can be regarded as two instance sets.
Abstract: The recent development in learning deep representations has demonstrated its wide applications in traditional vision tasks like classification and detection. However, there has been little investigation on how we could build up a deep learning framework in a weakly supervised setting. In this paper, we attempt to model deep learning in a weakly supervised learning (multiple instance learning) framework. In our setting, each image follows a dual multi-instance assumption, where its object proposals and possible text annotations can be regarded as two instance sets. We thus design effective systems to exploit the MIL property with deep learning strategies from the two ends; we also try to jointly learn the relationship between object and annotation proposals. We conduct extensive experiments and prove that our weakly supervised deep learning framework not only achieves convincing performance in vision tasks including classification and image annotation, but also extracts reasonable region-keyword pairs with little supervision, on both widely used benchmarks like PASCAL VOC and MIT Indoor Scene 67, and also a dataset for image-and patch-level annotations.

406 citations


Journal ArticleDOI
TL;DR: This paper proposes a semi-supervised batch mode multi-class active learning algorithm for visual concept recognition that exploits the whole active pool to evaluate the uncertainty of the data, and proposes to make the selected data as diverse as possible.
Abstract: As a way to relieve the tedious work of manual annotation, active learning plays important roles in many applications of visual concept recognition. In typical active learning scenarios, the number of labelled data in the seed set is usually small. However, most existing active learning algorithms only exploit the labelled data, which often suffers from over-fitting due to the small number of labelled examples. Besides, while much progress has been made in binary class active learning, little research attention has been focused on multi-class active learning. In this paper, we propose a semi-supervised batch mode multi-class active learning algorithm for visual concept recognition. Our algorithm exploits the whole active pool to evaluate the uncertainty of the data. Considering that uncertain data are always similar to each other, we propose to make the selected data as diverse as possible, for which we explicitly impose a diversity constraint on the objective function. As a multi-class active learning algorithm, our algorithm is able to exploit uncertainty across multiple classes. An efficient algorithm is used to optimize the objective function. Extensive experiments on action recognition, object classification, scene recognition, and event detection demonstrate its advantages.

401 citations


Proceedings ArticleDOI
Xinli Yang1, David Lo, Xin Xia1, Yun Zhang1, Jianling Sun1 
03 Aug 2015
TL;DR: An approach Deeper is proposed which leverages deep learning techniques to predict defect-prone changes by leveraging a deep belief network algorithm and a machine learning classifier is built on the selected features.
Abstract: Defect prediction is a very meaningful topic, particularly at change-level. Change-level defect prediction, which is also referred as just-in-time defect prediction, could not only ensure software quality in the development process, but also make the developers check and fix the defects in time. Nowadays, deep learning is a hot topic in the machine learning literature. Whether deep learning can be used to improve the performance of just-in-time defect prediction is still uninvestigated. In this paper, to bridge this research gap, we propose an approach Deeper which leverages deep learning techniques to predict defect-prone changes. We first build a set of expressive features from a set of initial change features by leveraging a deep belief network algorithm. Next, a machine learning classifier is built on the selected features. To evaluate the performance of our approach, we use datasets from six large open source projects, i.e., Bugzilla, Columba, JDT, Platform, Mozilla, and PostgreSQL, containing a total of 137,417 changes. We compare our approach with the approach proposed by Kamei et al. The experimental results show that on average across the 6 projects, Deeper could discover 32.22% more bugs than Kamei et al's approach (51.04% versus 18.82% on average). In addition, Deeper can achieve F1-scores of 0.22-0.63, which are statistically significantly higher than those of Kamei et al.'s approach on 4 out of the 6 projects.

305 citations


Proceedings ArticleDOI
01 Dec 2015
TL;DR: This paper proposes an architecture to create a flexible and scalable machine learning as a service, using real-world sensor and weather data by running different algorithms at the same time.
Abstract: The demand for knowledge extraction has been increasing. With the growing amount of data being generated by global data sources (e.g., social media and mobile apps) and the popularization of context-specific data (e.g., the Internet of Things), companies and researchers need to connect all these data and extract valuable information. Machine learning has been gaining much attention in data mining, leveraging the birth of new solutions. This paper proposes an architecture to create a flexible and scalable machine learning as a service. An open source solution was implemented and presented. As a case study, a forecast of electricity demand was generated using real-world sensor and weather data by running different algorithms at the same time.

Posted Content
TL;DR: This paper develops a stochastic optimisation algorithm that allows for scalable information maximisation and empowerment-based reasoning directly from pixels to actions on the problem of intrinsically-motivated learning.
Abstract: The mutual information is a core statistical quantity that has applications in all areas of machine learning, whether this is in training of density models over multiple data modalities, in maximising the efficiency of noisy transmission channels, or when learning behaviour policies for exploration by artificial agents. Most learning algorithms that involve optimisation of the mutual information rely on the Blahut-Arimoto algorithm --- an enumerative algorithm with exponential complexity that is not suitable for modern machine learning applications. This paper provides a new approach for scalable optimisation of the mutual information by merging techniques from variational inference and deep learning. We develop our approach by focusing on the problem of intrinsically-motivated learning, where the mutual information forms the definition of a well-known internal drive known as empowerment. Using a variational lower bound on the mutual information, combined with convolutional networks for handling visual input streams, we develop a stochastic optimisation algorithm that allows for scalable information maximisation and empowerment-based reasoning directly from pixels to actions.

Proceedings ArticleDOI
12 Jul 2015
TL;DR: A rectified linear unit (ReLU) is proposed to speed up the learning convergence of the deep learning using a using simpler network called the soft-committee machine and the reasons for the speedup are clarified.
Abstract: Deep Learning is attracting much attention in object recognition and speech processing. A benefit of using the deep learning is that it provides automatic pre-training. Several proposed methods that include auto-encoder are being successfully used in various applications. Moreover, deep learning uses a multilayer network that consists of many layers, a huge number of units, and huge amount of data. Thus, executing deep learning requires heavy computation, so deep learning is usually utilized with parallel computation with many cores or many machines. Deep learning employs the gradient algorithm, however this traps the learning into the saddle point or local minima. To avoid this difficulty, a rectified linear unit (ReLU) is proposed to speed up the learning convergence. However, the reasons the convergence is speeded up are not well understood. In this paper, we analyze the ReLU by a using simpler network called the soft-committee machine and clarify the reason for the speedup. We also train the network in an on-line manner. The soft-committee machine provides a good test bed to analyze deep learning. The results provide some reasons for the speedup of the convergence of the deep learning.

Proceedings Article
25 Jan 2015
TL;DR: The reader's attention is drawn to machine teaching, the problem of finding an optimal training set given a machine learning algorithm and a target model, and the Socratic dialogue style aims to stimulate critical thinking.
Abstract: I draw the reader's attention to machine teaching, the problem of finding an optimal training set given a machine learning algorithm and a target model. In addition to generating fascinating mathematical questions for computer scientists to ponder, machine teaching holds the promise of enhancing education and personnel training. The Socratic dialogue style aims to stimulate critical thinking.

Journal ArticleDOI
TL;DR: Two new statistical learning methods for estimating the optimal DTR are introduced, termed backward outcome weighted learning (BOWL) and simultaneous outcome weightedlearning (SOWL), and it is proved that the resulting rules are consistent, and provide finite sample bounds for the errors using the estimated rules.
Abstract: Dynamic treatment regimes (DTRs) are sequential decision rules for individual patients that can adapt over time to an evolving illness. The goal is to accommodate heterogeneity among patients and find the DTR which will produce the best long-term outcome if implemented. We introduce two new statistical learning methods for estimating the optimal DTR, termed backward outcome weighted learning (BOWL), and simultaneous outcome weighted learning (SOWL). These approaches convert individualized treatment selection into an either sequential or simultaneous classification problem, and can thus be applied by modifying existing machine learning techniques. The proposed methods are based on directly maximizing over all DTRs a nonparametric estimator of the expected long-term outcome; this is fundamentally different than regression-based methods, for example, Q-learning, which indirectly attempt such maximization and rely heavily on the correctness of postulated regression models. We prove that the resulting rules ar...

Journal ArticleDOI
TL;DR: This work considers an attacker that aims to maximize the SVM?s classification error by flipping a number of labels in the training data, and formalizes a corresponding optimal attack strategy, and solves it by means of heuristic approaches to keep the computational complexity tractable.

Journal ArticleDOI
TL;DR: This paper presents a complete approach to a successful utilization of a high-performance extreme learning machines (ELM) Toolbox for Big Data, and summarizes recent advantages in algorithmic performance; gives a fresh view on the ELM solution in relation to the traditional linear algebraic performance; and reaps the latest software and hardware performance achievements.
Abstract: This paper presents a complete approach to a successful utilization of a high-performance extreme learning machines (ELMs) Toolbox for Big Data. It summarizes recent advantages in algorithmic performance; gives a fresh view on the ELM solution in relation to the traditional linear algebraic performance; and reaps the latest software and hardware performance achievements. The results are applicable to a wide range of machine learning problems and thus provide a solid ground for tackling numerous Big Data challenges. The included toolbox is targeted at enabling the full potential of ELMs to the widest range of users.

Book ChapterDOI
01 Jan 2015
TL;DR: The ability of machine learning algorithms to learn from current context and generalize into unseen tasks would allow improvements in both the safety and efficacy of radiotherapy practice leading to better outcomes.
Abstract: Machine learning is an evolving branch of computational algorithms that are designed to emulate human intelligence by learning from the surrounding environment. They are considered the working horse in the new era of the so-called big data. Techniques based on machine learning have been applied successfully in diverse fields ranging from pattern recognition, computer vision, spacecraft engineering, finance, entertainment, and computational biology to biomedical and medical applications. More than half of the patients with cancer receive ionizing radiation (radiotherapy) as part of their treatment, and it is the main treatment modality at advanced stages of local disease. Radiotherapy involves a large set of processes that not only span the period from consultation to treatment but also extend beyond that to ensure that the patients have received the prescribed radiation dose and are responding well. The degrees of the complexity of these processes can vary and may involve several stages of sophisticated human-machine interactions and decision making, which would naturally invite the use of machine learning algorithms into optimizing and automating these processes including but not limited to radiation physics quality assurance, contouring and treatment planning, image-guided radiotherapy, respiratory motion management, treatment response modeling, and outcomes prediction. The ability of machine learning algorithms to learn from current context and generalize into unseen tasks would allow improvements in both the safety and efficacy of radiotherapy practice leading to better outcomes.

Journal ArticleDOI
TL;DR: A new learning function based on information entropy is proposed that can help select the next point effectively and add it to the design of experiments to update the metamodel in a more efficient way.
Abstract: In structural reliability, an important challenge is to reduce the number of calling the performance function, especially a finite element model in engineering problem which usually involves complex computer codes and requires time-consuming computations. To solve this problem, one of the metamodels, Kriging is then introduced as a surrogate for the original model. Kriging presents interesting characteristics such as exact interpolation and a local index of uncertainty on the prediction which can be used as an active learning method. In this paper, a new learning function based on information entropy is proposed. The new learning criterion can help select the next point effectively and add it to the design of experiments to update the metamodel. Then it is applied in a new method constructed in this paper which combines Kriging and Line Sampling to estimate the reliability of structures in a more efficient way. In the end, several examples including non-linearity, high dimensionality and engineering problems are performed to demonstrate the efficiency of the methods with the proposed learning function.

Journal ArticleDOI
TL;DR: The first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer is reported, which can be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.
Abstract: Machine learning, a branch of artificial intelligence, learns from previous experience to optimize performance, which is ubiquitous in various fields such as computer sciences, financial analysis, robotics, and bioinformatics. A challenge is that machine learning with the rapidly growing ``big data'' could become intractable for classical computers. Recently, quantum machine learning algorithms [Lloyd, Mohseni, and Rebentrost, arXiv.1307.0411] were proposed which could offer an exponential speedup over classical algorithms. Here, we report the first experimental entanglement-based classification of two-, four-, and eight-dimensional vectors to different clusters using a small-scale photonic quantum computer, which are then used to implement supervised and unsupervised machine learning. The results demonstrate the working principle of using quantum computers to manipulate and classify high-dimensional vectors, the core mathematical routine in machine learning. The method can, in principle, be scaled to larger numbers of qubits, and may provide a new route to accelerate machine learning.

Proceedings Article
25 Jul 2015
TL;DR: This paper investigates the intersection of reinforcement learning and expert demonstrations, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping.
Abstract: Reinforcement learning describes how a learning agent can achieve optimal behaviour based on interactions with its environment and reward feedback. A limiting factor in reinforcement learning as employed in artificial intelligence is the need for an often prohibitively large number of environment samples before the agent reaches a desirable level of performance. Learning from demonstration is an approach that provides the agent with demonstrations by a supposed expert, from which it should derive suitable behaviour. Yet, one of the challenges of learning from demonstration is that no guarantees can be provided for the quality of the demonstrations, and thus the learned behavior. In this paper, we investigate the intersection of these two approaches, leveraging the theoretical guarantees provided by reinforcement learning, and using expert demonstrations to speed up this learning by biasing exploration through a process called reward shaping. This approach allows us to leverage human input without making an erroneous assumption regarding demonstration optimality. We show experimentally that this approach requires significantly fewer demonstrations, is more robust against suboptimality of demonstrations, and achieves much faster learning than the recently developed HAT algorithm.

Book
01 Oct 2015
TL;DR: The aim of this dissertation is to take inspiration from the literature of active learning for classification (regression) problems and develop new methods for the new-user problem in recommender systems.
Abstract: Recommender systems learn user preferences and provide them personalized recommendations. Evidently, the performance of recommender systems depends on the amount of information that users provide regarding items, most often in the form of ratings. This problem is amplified for new users because they have not provided any rating, which impacts negatively on the quality of generated recommendations. This problem is called new-user problem. A simple and effective way to overcome this problem is posing queries to new users so that they express their preferences about selected items, e.g., by rating them. Nevertheless, the selection of items must take into consideration that users are not willing to answer a lot of such queries. To address this problem, active learning methods have been proposed to acquire the most informative ratings, i.e., ratings from users that will help most in determining their interests. Active learning is a learning algorithm that is able to interactively query the Oracle to obtain labels for data instances. The Oracle is a user or teacher who knows the labels. The aim of this dissertation [8] is to take inspiration from the literature of active learning for classification (regression) problems and develop new methods for the new-user problem in recommender systems. In the recommender system context, new users play the role of the Oracle and provide ratings (labels) to items (data instances). Specifically, the following questions are addressed in this dissertation: (1) which recommendation model is suitable for active-learning purposes? (Sect. 2) (2) how can active learning criteria be adapted and customized for the new-user problem and which one is the best? (Sect. 3) (3) what are the specific requirements and properties of the new-user problem that do not exist in active learning and how can new active learning methods be developed based on these properties? (Sects. 4, 5).

Posted Content
TL;DR: It is shown that training with cyclical learning rates achieves near optimal classification accuracy without tuning and often in many fewer iterations.
Abstract: It is known that the learning rate is the most important hyper-parameter to tune for training deep convolutional neural networks (i.e., a "guessing game"). This report describes a new method for setting the learning rate, named cyclical learning rates, that eliminates the need to experimentally find the best values and schedule for the learning rates. Instead of setting the learning rate to fixed values, this method lets the learning rate cyclically vary within reasonable boundary values. This report shows that training with cyclical learning rates achieves near optimal classification accuracy without tuning and often in many fewer iterations. This report also describes a simple way to estimate "reasonable bounds" - by linearly increasing the learning rate in one training run of the network for only a few epochs. In addition, cyclical learning rates are demonstrated on training with the CIFAR-10 dataset and the AlexNet and GoogLeNet architectures on the ImageNet dataset. These methods are practical tools for everyone who trains convolutional neural networks.

Book
27 Apr 2015
TL;DR: Efficient Learning Machines as mentioned in this paper explores the major topics of machine learning, including knowledge discovery, classifications, genetic algorithms, neural networks, kernel methods, and biologically-inspired techniques.
Abstract: Machine learning techniques provide cost-effective alternatives to traditional methods for extracting underlying relationships between information and data and for predicting future events by processing existing information to train models. Efficient Learning Machines explores the major topics of machine learning, including knowledge discovery, classifications, genetic algorithms, neural networking, kernel methods, and biologically-inspired techniques. Mariette Awad and Rahul Khannas synthetic approach weaves together the theoretical exposition, design principles, and practical applications of efficient machine learning. Their experiential emphasis, expressed in their close analysis of sample algorithms throughout the book, aims to equip engineers, students of engineering, and system designers to design and create new and more efficient machine learning systems. Readers of Efficient Learning Machines will learn how to recognize and analyze the problems that machine learning technology can solve for them, how to implement and deploy standard solutions to sample problems, and how to design new systems and solutions. Advances in computing performance, storage, memory, unstructured information retrieval, and cloud computing have coevolved with a new generation of machine learning paradigms and big data analytics, which the authors present in the conceptual context of their traditional precursors. Awad and Khanna explore current developments in the deep learning techniques of deep neural networks, hierarchical temporal memory, and cortical algorithms. Nature suggests sophisticated learning techniques that deploy simple rules to generate highly intelligent and organized behaviors with adaptive, evolutionary, and distributed properties. The authors examine the most popular biologically-inspired algorithms, together with a sample application to distributed datacenter management. They also discuss machine learning techniques for addressing problems of multi-objective optimization in which solutions in real-world systems are constrained and evaluated based on how well they perform with respect to multiple objectives in aggregate. Two chapters on support vector machines and their extensions focus on recent improvements to the classification and regression techniques at the core of machine learning. What youll learn Efficient Learning Machines systematically guides readers to an understanding and practical mastery of the following techniques:the machine learning techniques most commonly used to solve complex real-world problemsrecent improvements to classification and regression techniquesthe application of bio-inspired techniques to real-life problemsnew deep learning techniques that exploit advances in computing performance and storagemachine learning techniques for solving multi-objective optimization problems with nondominated methods that minimize distance to the Pareto front Who this book is for Efficient Learning Machines equips engineers, students of engineering, and system designers with the knowledge and guidance to design and create new and more efficient machine learning systems.

Journal ArticleDOI
TL;DR: A general learning framework, termed multiple kernel extreme learning machines (MK-ELM), to address the lack of a general framework for ELM to integrate multiple heterogeneous data sources for classification and can achieve comparable or even better classification performance than state-of-the-art MKL algorithms, while incurring much less computational cost.

Posted Content
TL;DR: G-learning as mentioned in this paper regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning, which enables naturally incorporating prior distributions over optimal actions when available.
Abstract: Model-free reinforcement learning algorithms such as Q-learning perform poorly in the early stages of learning in noisy environments, because much effort is spent on unlearning biased estimates of the state-action function. The bias comes from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose G-learning, a new off-policy learning algorithm that regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning. Moreover, it enables naturally incorporating prior distributions over optimal actions when available. The stochastic nature of G-learning also makes it more cost-effective than Q-learning in noiseless but exploration-risky domains. We illustrate these ideas in several examples where G-learning results in significant improvements of the learning rate and the learning cost.

Journal ArticleDOI
TL;DR: Effectiveness of the application of DELM in EEG classification is confirmed and it is confirmed that MLELM approximate the complicated function but it also does not need to iterate during the training process.
Abstract: Recently, deep learning has aroused wide interest in machine learning fields. Deep learning is a multilayer perceptron artificial neural network algorithm. Deep learning has the advantage of approximating the complicated function and alleviating the optimization difficulty associated with deep models. Multilayer extreme learning machine (MLELM) is a learning algorithm of an artificial neural network which takes advantages of deep learning and extreme learning machine. Not only does MLELM approximate the complicated function but it also does not need to iterate during the training process. We combining with MLELM and extreme learning machine with kernel (KELM) put forward deep extreme learning machine (DELM) and apply it to EEG classification in this paper. This paper focuses on the application of DELM in the classification of the visual feedback experiment, using MATLAB and the second brain-computer interface (BCI) competition datasets. By simulating and analyzing the results of the experiments, effectiveness of the application of DELM in EEG classification is confirmed.

Proceedings ArticleDOI
01 Sep 2015
TL;DR: The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet and commercial companies that collect user data on a large scale have been the main beneficiaries.
Abstract: Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text. The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet. Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training.

Proceedings ArticleDOI
27 May 2015
TL;DR: A new approach named factorized learning is introduced that pushes ML computations through joins and avoids redundancy in both I/O and computations and is often substantially faster than the alternatives, but is not always the fastest, necessitating a cost-based approach.
Abstract: Enterprise data analytics is a booming area in the data management industry. Many companies are racing to develop toolkits that closely integrate statistical and machine learning techniques with data management systems. Almost all such toolkits assume that the input to a learning algorithm is a single table. However, most relational datasets are not stored as single tables due to normalization. Thus, analysts often perform key-foreign key joins before learning on the join output. This strategy of learning after joins introduces redundancy avoided by normalization, which could lead to poorer end-to-end performance and maintenance overheads due to data duplication. In this work, we take a step towards enabling and optimizing learning over joins for a common class of machine learning techniques called generalized linear models that are solved using gradient descent algorithms in an RDBMS setting. We present alternative approaches to learn over a join that are easy to implement over existing RDBMSs. We introduce a new approach named factorized learning that pushes ML computations through joins and avoids redundancy in both I/O and computations. We study the tradeoff space for all our approaches both analytically and empirically. Our results show that factorized learning is often substantially faster than the alternatives, but is not always the fastest, necessitating a cost-based approach. We also discuss extensions of all our approaches to multi-table joins as well as to Hive.

Journal ArticleDOI
TL;DR: This work discusses two strategies towards making machine learning algorithms more autonomous: automated optimization of hyperparameters (including mechanisms for feature selection, preprocessing, model selection, etc) and the development of algorithms with reduced sets ofhyperparameters.
Abstract: The success of hand-crafted machine learning systems in many applications raises the question of making machine learning algorithms more autonomous, i.e., to reduce the requirement of expert input to a minimum. We discuss two strategies towards this goal: (1) automated optimization of hyperparameters (including mechanisms for feature selection, preprocessing, model selection, etc) and (2) the development of algorithms with reduced sets of hyperparameters. Since many research directions (e.g., deep learning), show a tendency towards increasingly complex algorithms with more and more hyperparamters, the demand for both of these strategies continuously increases. We review recent hyperparameter optimization methods and discuss data-driven approaches to avoid the introduction of hyperparameters using unsupervised learning. We end in discussing how these complementary strategies can work hand-in-hand, representing a very promising approach towards autonomous machine learning.