scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 2017"


Journal ArticleDOI
13 Sep 2017-Nature
TL;DR: The field of quantum machine learning explores how to devise and implement quantum software that could enable machine learning that is faster than that of classical computers.
Abstract: Recent progress implies that a crossover between machine learning and quantum information processing benefits both fields. Traditional machine learning has dramatically improved the benchmarking an ...

2,162 citations


Proceedings ArticleDOI
27 Nov 2017
TL;DR: This paper develops an active learning framework for high dimensional data, a task which has been extremely challenging so far, with very sparse existing literature, and demonstrates its active learning techniques with image data, obtaining a significant improvement on existing active learning approaches.
Abstract: Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Deep learning poses several difficulties when used in an active learning setting. First, active learning (AL) methods generally rely on being able to learn and update models from small amounts of data. Recent advances in deep learning, on the other hand, are notorious for their dependence on large amounts of data. Second, many AL acquisition functions rely on model uncertainty, yet deep learning methods rarely represent such model uncertainty. In this paper we combine recent advances in Bayesian deep learning into the active learning framework in a practical way. We develop an active learning framework for high dimensional data, a task which has been extremely challenging so far, with very sparse existing literature. Taking advantage of specialised models such as Bayesian convolutional neural networks, we demonstrate our active learning techniques with image data, obtaining a significant improvement on existing active learning approaches. We demonstrate this on both the MNIST dataset, as well as for skin cancer diagnosis from lesion images (ISIC2016 task).

1,139 citations


Journal ArticleDOI
TL;DR: This work presents a way of thinking about machine learning that gives it its own place in the econometric toolbox, and aims to make them conceptually easier to use by providing a crisper understanding of how these algorithms work, where they excel, and where they can stumble.
Abstract: Machines are increasingly doing “intelligent” things. Face recognition algorithms use a large dataset of photos labeled as having a face or not to estimate a function that predicts the pre...

1,055 citations


Journal ArticleDOI
TL;DR: The goal is to assist the readers in refining the motivation, problem formulation, and methodology of powerful machine learning algorithms in the context of future networks in order to tap into hitherto unexplored applications and services.
Abstract: Next-generation wireless networks are expected to support extremely high data rates and radically new applications, which require a new wireless radio technology paradigm. The challenge is that of assisting the radio in intelligent adaptive learning and decision making, so that the diverse requirements of next-generation wireless networks can be satisfied. Machine learning is one of the most promising artificial intelligence tools, conceived to support smart radio terminals. Future smart 5G mobile terminals are expected to autonomously access the most meritorious spectral bands with the aid of sophisticated spectral efficiency learning and inference, in order to control the transmission power, while relying on energy efficiency learning/inference and simultaneously adjusting the transmission protocols with the aid of quality of service learning/inference. Hence we briefly review the rudimentary concepts of machine learning and propose their employment in the compelling applications of 5G networks, including cognitive radios, massive MIMOs, femto/small cells, heterogeneous networks, smart grid, energy harvesting, device-todevice communications, and so on. Our goal is to assist the readers in refining the motivation, problem formulation, and methodology of powerful machine learning algorithms in the context of future networks in order to tap into hitherto unexplored applications and services.

958 citations


Proceedings ArticleDOI
30 Oct 2017
TL;DR: In this article, the authors show that any privacy-preserving collaborative deep learning model is susceptible to a powerful attack that exploits the real-time nature of the learning process that allows the adversary to train a Generative Adversarial Network (GAN) that generates prototypical samples of the targeted training set that was meant to be private (the samples generated by the GAN are intended to come from the same distribution as the training data).
Abstract: Deep Learning has recently become hugely popular in machine learning for its ability to solve end-to-end learning systems, in which the features and the classifiers are learned simultaneously, providing significant improvements in classification accuracy in the presence of highly-structured and large databases. Its success is due to a combination of recent algorithmic breakthroughs, increasingly powerful computers, and access to significant amounts of data. Researchers have also considered privacy implications of deep learning. Models are typically trained in a centralized manner with all the data being processed by the same training algorithm. If the data is a collection of users' private data, including habits, personal pictures, geographical positions, interests, and more, the centralized server will have access to sensitive information that could potentially be mishandled. To tackle this problem, collaborative deep learning models have recently been proposed where parties locally train their deep learning structures and only share a subset of the parameters in the attempt to keep their respective training sets private. Parameters can also be obfuscated via differential privacy (DP) to make information extraction even more challenging, as proposed by Shokri and Shmatikov at CCS'15. Unfortunately, we show that any privacy-preserving collaborative deep learning is susceptible to a powerful attack that we devise in this paper. In particular, we show that a distributed, federated, or decentralized deep learning approach is fundamentally broken and does not protect the training sets of honest participants. The attack we developed exploits the real-time nature of the learning process that allows the adversary to train a Generative Adversarial Network (GAN) that generates prototypical samples of the targeted training set that was meant to be private (the samples generated by the GAN are intended to come from the same distribution as the training data). Interestingly, we show that record-level differential privacy applied to the shared parameters of the model, as suggested in previous work, is ineffective (i.e., record-level DP is not designed to address our attack).

832 citations


Posted Content
TL;DR: A graph neural network architecture is defined that generalizes several of the recently proposed few-shot learning models and provides improved numerical performance, and is easily extended to variants of few- shot learning, such as semi-supervised or active learning, demonstrating the ability of graph-based models to operate well on 'relational' tasks.
Abstract: We propose to study the problem of few-shot learning with the prism of inference on a partially observed graphical model, constructed from a collection of input images whose label can be either observed or not. By assimilating generic message-passing inference algorithms with their neural-network counterparts, we define a graph neural network architecture that generalizes several of the recently proposed few-shot learning models. Besides providing improved numerical performance, our framework is easily extended to variants of few-shot learning, such as semi-supervised or active learning, demonstrating the ability of graph-based models to operate well on 'relational' tasks.

724 citations


Journal ArticleDOI
22 Mar 2017
TL;DR: The idea of using continuous dynamical systems to model general high-dimensional nonlinear functions used in machine learning and the connection with deep learning is discussed.
Abstract: We discuss the idea of using continuous dynamical systems to model general high-dimensional nonlinear functions used in machine learning. We also discuss the connection with deep learning.

608 citations


Journal ArticleDOI
TL;DR: This paper compiles, summarizes, and organizes machine learning challenges with Big Data, highlighting the cause–effect relationship by organizing challenges according to Big Data Vs or dimensions that instigated the issue: volume, velocity, variety, or veracity.
Abstract: The Big Data revolution promises to transform how we live, work, and think by enabling process optimization, empowering insight discovery and improving decision making. The realization of this grand potential relies on the ability to extract value from such massive data through data analytics; machine learning is at its core because of its ability to learn from data and provide data driven insights, decisions, and predictions. However, traditional machine learning approaches were developed in a different era, and thus are based upon multiple assumptions, such as the data set fitting entirely into memory, what unfortunately no longer holds true in this new context. These broken assumptions, together with the Big Data characteristics, are creating obstacles for the traditional techniques. Consequently, this paper compiles, summarizes, and organizes machine learning challenges with Big Data. In contrast to other research that discusses challenges, this work highlights the cause–effect relationship by organizing challenges according to Big Data Vs or dimensions that instigated the issue: volume, velocity, variety, or veracity. Moreover, emerging machine learning approaches and techniques are discussed in terms of how they are capable of handling the various challenges with the ultimate objective of helping practitioners select appropriate solutions for their use cases. Finally, a matrix relating the challenges and approaches is presented. Through this process, this paper provides a perspective on the domain, identifies research gaps and opportunities, and provides a strong foundation and encouragement for further research in the field of machine learning with Big Data.

592 citations


Journal ArticleDOI
TL;DR: This paper proposes a novel active learning (AL) framework, which is capable of building a competitive classifier with optimal feature representation via a limited amount of labeled training instances in an incremental learning manner and incorporates deep convolutional neural networks into AL.
Abstract: Recent successes in learning-based image classification, however, heavily rely on the large number of annotated training samples, which may require considerable human effort. In this paper, we propose a novel active learning (AL) framework, which is capable of building a competitive classifier with optimal feature representation via a limited amount of labeled training instances in an incremental learning manner. Our approach advances the existing AL methods in two aspects. First, we incorporate deep convolutional neural networks into AL. Through the properly designed framework, the feature representation and the classifier can be simultaneously updated with progressively annotated informative samples. Second, we present a cost-effective sample selection strategy to improve the classification performance with less manual annotations. Unlike traditional methods focusing on only the uncertain samples of low prediction confidence, we especially discover the large amount of high-confidence samples from the unlabeled set for feature learning. Specifically, these high-confidence samples are automatically selected and iteratively assigned pseudolabels. We thus call our framework cost-effective AL (CEAL) standing for the two advantages. Extensive experiments demonstrate that the proposed CEAL framework can achieve promising results on two challenging image classification data sets, i.e., face recognition on the cross-age celebrity face recognition data set database and object categorization on Caltech-256.

581 citations


Journal ArticleDOI
TL;DR: This article surveys imitation learning methods and presents design options in different steps of the learning process, and extensively discusses combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation.
Abstract: Imitation learning techniques aim to mimic human behavior in a given task. An agent (a learning machine) is trained to perform a task from demonstrations by learning a mapping between observations and actions. The idea of teaching by imitation has been around for many years; however, the field is gaining attention recently due to advances in computing and sensing as well as rising demand for intelligent applications. The paradigm of learning by imitation is gaining popularity because it facilitates teaching complex tasks with minimal expert knowledge of the tasks. Generic imitation learning methods could potentially reduce the problem of teaching a task to that of providing demonstrations, without the need for explicit programming or designing reward functions specific to the task. Modern sensors are able to collect and transmit high volumes of data rapidly, and processors with high computational power allow fast processing that maps the sensory data to actions in a timely manner. This opens the door for many potential AI applications that require real-time perception and reaction such as humanoid robots, self-driving vehicles, human computer interaction, and computer games, to name a few. However, specialized algorithms are needed to effectively and robustly learn models as learning by imitation poses its own set of challenges. In this article, we survey imitation learning methods and present design options in different steps of the learning process. We introduce a background and motivation for the field as well as highlight challenges specific to the imitation problem. Methods for designing and evaluating imitation learning tasks are categorized and reviewed. Special attention is given to learning methods in robotics and games as these domains are the most popular in the literature and provide a wide array of problems and methodologies. We extensively discuss combining imitation learning approaches using different sources and methods, as well as incorporating other motion learning methods to enhance imitation. We also discuss the potential impact on industry, present major applications, and highlight current and future research directions.

535 citations


Book ChapterDOI
Lin Yang1, Yizhe Zhang1, Jianxu Chen1, Siyuan Zhang1, Danny Z. Chen1 
10 Sep 2017
TL;DR: A deep active learning framework that combines fully convolutional network (FCN) and active learning to significantly reduce annotation effort by making judicious suggestions on the most effective annotation areas is presented.
Abstract: Image segmentation is a fundamental problem in biomedical image analysis. Recent advances in deep learning have achieved promising results on many biomedical image segmentation benchmarks. However, due to large variations in biomedical images (different modalities, image settings, objects, noise, etc.), to utilize deep learning on a new application, it usually needs a new set of training data. This can incur a great deal of annotation effort and cost, because only biomedical experts can annotate effectively, and often there are too many instances in images (e.g., cells) to annotate. In this paper, we aim to address the following question: With limited effort (e.g., time) for annotation, what instances should be annotated in order to attain the best performance? We present a deep active learning framework that combines fully convolutional network (FCN) and active learning to significantly reduce annotation effort by making judicious suggestions on the most effective annotation areas. We utilize uncertainty and similarity information provided by FCN and formulate a generalized version of the maximum set cover problem to determine the most representative and uncertain areas for annotation. Extensive experiments using the 2015 MICCAI Gland Challenge dataset and a lymph node ultrasound image segmentation dataset show that, using annotation suggestions by our method, state-of-the-art segmentation performance can be achieved by using only 50% of training data.

Journal ArticleDOI
TL;DR: It is argued that, by employing model-based reinforcement learning, the—now limited—adaptability characteristics of robotic systems can be expanded, and model- based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods.
Abstract: Reinforcement learning is an appealing approach for allowing robots to learn new tasks. Relevant literature reveals a plethora of methods, but at the same time makes clear the lack of implementations for dealing with real life challenges. Current expectations raise the demand for adaptable robots. We argue that, by employing model-based reinforcement learning, the--now limited--adaptability characteristics of robotic systems can be expanded. Also, model-based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods. Thus, in this survey, model-based methods that have been applied in robotics are covered. We categorize them based on the derivation of an optimal policy, the definition of the returns function, the type of the transition model and the learned task. Finally, we discuss the applicability of model-based reinforcement learning approaches in new applications, taking into consideration the state of the art in both algorithms and hardware.

Journal ArticleDOI
TL;DR: It is shown that the proposed active learning approach to the fitting of machine learning interatomic potentials is highly efficient in training potentials on the fly, ensuring that no extrapolation is attempted and leading to a completely reliable atomistic simulation without any significant decrease in accuracy.

Journal ArticleDOI
TL;DR: Smart augmentation works, by creating a network that learns how to generate augmented data during the training process of a target network in a way that reduces that networks loss, and allows to learn augmentations that minimize the error of that network.
Abstract: A recurring problem faced when training neural networks is that there is typically not enough data to maximize the generalization capability of deep neural networks. There are many techniques to address this, including data augmentation, dropout, and transfer learning. In this paper, we introduce an additional method, which we call smart augmentation and we show how to use it to increase the accuracy and reduce over fitting on a target network. Smart augmentation works, by creating a network that learns how to generate augmented data during the training process of a target network in a way that reduces that networks loss. This allows us to learn augmentations that minimize the error of that network. Smart augmentation has shown the potential to increase accuracy by demonstrably significant measures on all data sets tested. In addition, it has shown potential to achieve similar or improved performance levels with significantly smaller network sizes in a number of tested cases.

Proceedings ArticleDOI
12 Jul 2017
TL;DR: This work builds on work in label ranking and proposes to learn from preferences (or comparisons) instead: the person provides the system a relative preference between two trajectories, and takes an active learning approach, in which the system decides on what preference queries to make.
Abstract: Author(s): Sadigh, D; Dragan, AD; Sastry, S; Seshia, SA | Editor(s): Amato, Nancy M; Srinivasa, Siddhartha S; Ayanian, Nora; Kuindersma, Scott | Abstract: Our goal is to efficiently learn reward functions encoding a human's preferences for how a dynamical system should act. There are two challenges with this. First, in many problems it is difficult for people to provide demonstrations of the desired system trajectory (like a high-DOF robot arm motion or an aggressive driving maneuver), or to even assign how much numerical reward an action or trajectory should get. We build on work in label ranking and propose to learn from preferences (or comparisons) instead: the person provides the system a relative preference between two trajectories. Second, the learned reward function strongly depends on what environments and trajectories were experienced during the training phase. We thus take an active learning approach, in which the system decides on what preference queries to make. A novel aspect of our work is the complexity and continuous nature of the queries: continuous trajectories of a dynamical system in environments with other moving agents (humans or robots). We contribute a method for actively synthesizing queries that satisfy the dynamics of the system. Further, we learn the reward function from a continuous hypothesis space by maximizing the volume removed from the hypothesis space by each query. We assign weights to the hypothesis space in the form of a log-concave distribution and provide a bound on the number of iterations required to converge. We show that our algorithm converges faster to the desired reward compared to approaches that are not active or that do not synthesize queries in an autonomous driving domain. We then run a user study to put our method to the test with real people.

Proceedings ArticleDOI
29 May 2017
TL;DR: A recurrent deep neural network to learn patterns from sequences of network traffic and trace network attack activities and reduces the error rate compared with conventional machine learning method in the larger data set.
Abstract: Distributed Denial of Service (DDoS) attacks grow rapidly and become one of the fatal threats to the Internet. Automatically detecting DDoS attack packets is one of the main defense mechanisms. Conventional solutions monitor network traffic and identify attack activities from legitimate network traffic based on statistical divergence. Machine learning is another method to improve identifying performance based on statistical features. However, conventional machine learning techniques are limited by the shallow representation models. In this paper, we propose a deep learning based DDoS attack detection approach (DeepDefense). Deep learning approach can automatically extract high-level features from low-level ones and gain powerful representation and inference. We design a recurrent deep neural network to learn patterns from sequences of network traffic and trace network attack activities. The experimental results demonstrate a better performance of our model compared with conventional machine learning models. We reduce the error rate from 7.517% to 2.103% compared with conventional machine learning method in the larger data set.

Journal ArticleDOI
TL;DR: This paper contributes a comprehensive survey and analysis of current methods designed for performing heterogeneous transfer learning tasks to provide an updated, centralized outlook into current methodologies.
Abstract: Transfer learning has been demonstrated to be effective for many real-world applications as it exploits knowledge present in labeled training data from a source domain to enhance a model’s performance in a target domain, which has little or no labeled target training data. Utilizing a labeled source, or auxiliary, domain for aiding a target task can greatly reduce the cost and effort of collecting sufficient training labels to create an effective model in the new target distribution. Currently, most transfer learning methods assume the source and target domains consist of the same feature spaces which greatly limits their applications. This is because it may be difficult to collect auxiliary labeled source domain data that shares the same feature space as the target domain. Recently, heterogeneous transfer learning methods have been developed to address such limitations. This, in effect, expands the application of transfer learning to many other real-world tasks such as cross-language text categorization, text-to-image classification, and many others. Heterogeneous transfer learning is characterized by the source and target domains having differing feature spaces, but may also be combined with other issues such as differing data distributions and label spaces. These can present significant challenges, as one must develop a method to bridge the feature spaces, data distributions, and other gaps which may be present in these cross-domain learning tasks. This paper contributes a comprehensive survey and analysis of current methods designed for performing heterogeneous transfer learning tasks to provide an updated, centralized outlook into current methodologies.

Journal ArticleDOI
01 Dec 2017
TL;DR: Auto-encoder and support vector machine can also perform better than SAE in some circumstances, and active learning schemes can be used to achieve high classification accuracy in both methods.
Abstract: With constant advancements in remote sensing technologies resulting in higher image resolution, there is a corresponding need to be able to mine useful data and information from remote sensing images. In this paper, we study auto-encoder (SAE) and support vector machine (SVM), and to examine their sensitivity, we include additional umber of training samples using the active learning frame. We then conduct a comparative evaluation. When classifying remote sensing images, SVM can also perform better than SAE in some circumstances, and active learning schemes can be used to achieve high classification accuracy in both methods.

Journal ArticleDOI
TL;DR: The proposed active learning algorithm based on a weighted incremental dictionary learning that trains a deep network efficiently by actively selecting training samples at each iteration is shown to be efficient and effective in classifying hyperspectral images.
Abstract: Active deep learning classification of hyperspectral images is considered in this paper. Deep learning has achieved success in many applications, but good-quality labeled samples are needed to construct a deep learning network. It is expensive getting good labeled samples in hyperspectral images for remote sensing applications. An active learning algorithm based on a weighted incremental dictionary learning is proposed for such applications. The proposed algorithm selects training samples that maximize two selection criteria, namely representative and uncertainty. This algorithm trains a deep network efficiently by actively selecting training samples at each iteration. The proposed algorithm is applied for the classification of hyperspectral images, and compared with other classification algorithms employing active learning. It is shown that the proposed algorithm is efficient and effective in classifying hyperspectral images.

Proceedings ArticleDOI
01 Aug 2017
TL;DR: In this article, the authors combine deep learning with active learning and show that they can outperform classical methods even with a significantly smaller amount of training data than a large dataset or a large budget for manually labeling data.
Abstract: Deep neural networks have advanced the state of the art in named entity recognition. However, under typical training procedures, advantages over classical methods emerge only with large datasets. As a result, deep learning is employed only when large public datasets or a large budget for manually labeling data is available. In this work, we show otherwise: by combining deep learning with active learning, we can outperform classical methods even with a significantly smaller amount of training data.

Proceedings ArticleDOI
24 May 2017
TL;DR: This paper deals with the field of computer vision, mainly for the application of deep learning in object detection task, and a new dataset is built according to those commonly used datasets.
Abstract: This paper deals with the field of computer vision, mainly for the application of deep learning in object detection task. On the one hand, there is a simple summary of the datasets and deep learning algorithms commonly used in computer vision. On the other hand, a new dataset is built according to those commonly used datasets, and choose one of the network called faster r-cnn to work on this new dataset. Through the experiment to strengthen the understanding of these networks, and through the analysis of the results learn the importance of deep learning technology, and the importance of the dataset for deep learning.

Journal ArticleDOI
TL;DR: This paper investigates the classification of the quality of wood boards based on their images with the use of deep learning, particularly Convolutional Neural Networks, with the combination of texture-based feature extraction techniques and traditional techniques: Decision tree induction algorithms, Neural networks, Nearest neighbors and Support vector machines.
Abstract: A number of industries use human inspection to visually classify the quality of their products and the raw materials used in the production process, this process could be done automatically through digital image processing. The industries are not always interested in the most accurate technique for a given problem, but most appropriate for the expected results, there must be a balance between accuracy and computational cost. This paper investigates the classification of the quality of wood boards based on their images. For such, it compares the use of deep learning, particularly Convolutional Neural Networks, with the combination of texture-based feature extraction techniques and traditional techniques: Decision tree induction algorithms, Neural Networks, Nearest neighbors and Support vector machines. Reported studies show that Deep Learning techniques applied to image processing tasks have achieved predictive performance superior to traditional classification techniques, mainly in high complex scenarios. One of the reasons pointed out is their embedded feature extraction mechanism. Deep Learning techniques directly identify and extract features, considered by them to be relevant, in a given image dataset. However, empirical results for the image data set have shown that the texture descriptor method proposed, regardless of the strategy employed is very competitive when compared with Convolutional Neural Network for all the performed experiments. The best performance of the texture descriptor method could be caused by the nature of the image dataset. Finally are pointed out some perspectives of futures developments with the application of Active learning and Semi supervised methods.

Posted Content
TL;DR: In this paper, the authors combine recent advances in Bayesian deep learning into the active learning framework in a practical way, and demonstrate this on both the MNIST dataset, as well as for skin cancer diagnosis from lesion images.
Abstract: Even though active learning forms an important pillar of machine learning, deep learning tools are not prevalent within it. Deep learning poses several difficulties when used in an active learning setting. First, active learning (AL) methods generally rely on being able to learn and update models from small amounts of data. Recent advances in deep learning, on the other hand, are notorious for their dependence on large amounts of data. Second, many AL acquisition functions rely on model uncertainty, yet deep learning methods rarely represent such model uncertainty. In this paper we combine recent advances in Bayesian deep learning into the active learning framework in a practical way. We develop an active learning framework for high dimensional data, a task which has been extremely challenging so far, with very sparse existing literature. Taking advantage of specialised models such as Bayesian convolutional neural networks, we demonstrate our active learning techniques with image data, obtaining a significant improvement on existing active learning approaches. We demonstrate this on both the MNIST dataset, as well as for skin cancer diagnosis from lesion images (ISIC2016 task).

Posted Content
TL;DR: Thompson Sampling as mentioned in this paper is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance.
Abstract: Thompson sampling is an algorithm for online decision problems where actions are taken sequentially in a manner that must balance between exploiting what is known to maximize immediate performance and investing to accumulate new information that may improve future performance. The algorithm addresses a broad range of problems in a computationally efficient manner and is therefore enjoying wide use. This tutorial covers the algorithm and its application, illustrating concepts through a range of examples, including Bernoulli bandit problems, shortest path problems, product recommendation, assortment, active learning with neural networks, and reinforcement learning in Markov decision processes. Most of these problems involve complex information structures, where information revealed by taking an action informs beliefs about other actions. We will also discuss when and why Thompson sampling is or is not effective and relations to alternative algorithms.

Journal ArticleDOI
TL;DR: It is found that record-wise CV often massively overestimates the prediction accuracy of the algorithms, and this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes.
Abstract: The availability of smartphone and wearable sensor technology is leading to a rapid accumulation of human subject data, and machine learning is emerging as a technique to map those data into clinical predictions. As machine learning algorithms are increasingly used to support clinical decision making, it is vital to reliably quantify their prediction accuracy. Cross-validation (CV) is the standard approach where the accuracy of such algorithms is evaluated on part of the data the algorithm has not seen during training. However, for this procedure to be meaningful, the relationship between the training and the validation set should mimic the relationship between the training set and the dataset expected for the clinical use. Here we compared two popular CV methods: record-wise and subject-wise. While the subject-wise method mirrors the clinically relevant use-case scenario of diagnosis in newly recruited subjects, the record-wise strategy has no such interpretation. Using both a publicly available dataset and a simulation, we found that record-wise CV often massively overestimates the prediction accuracy of the algorithms. We also conducted a systematic review of the relevant literature, and found that this overly optimistic method was used by almost half of the retrieved studies that used accelerometers, wearable sensors, or smartphones to predict clinical outcomes. As we move towards an era of machine learning-based diagnosis and treatment, using proper methods to evaluate their accuracy is crucial, as inaccurate results can mislead both clinicians and data scientists.

Journal ArticleDOI
TL;DR: A stacked ELM architecture in the CNN framework is proposed using an extreme learning machine (ELM) and the backpropagation algorithm is modified to find the targets of hidden layers and effectively learn network weights while maintaining performance.

Posted Content
TL;DR: A novel data-driven approach to active learning is suggested to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state by formulating the query selection procedure as a regression problem.
Abstract: In this paper, we suggest a novel data-driven approach to active learning (AL). The key idea is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state. By formulating the query selection procedure as a regression problem we are not restricted to working with existing AL heuristics; instead, we learn strategies based on experience from previous AL outcomes. We show that a strategy can be learnt either from simple synthetic 2D datasets or from a subset of domain-specific data. Our method yields strategies that work well on real data from a wide range of domains.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: This paper introduces a deep transfer learning scheme, called selective joint fine-tuning, for improving the performance of deep learning tasks with insufficient training data, and can improve the classification accuracy by 2% - 10% using a single model.
Abstract: Deep neural networks require a large amount of labeled training data during supervised learning. However, collecting and labeling so much data might be infeasible in many cases. In this paper, we introduce a deep transfer learning scheme, called selective joint fine-tuning, for improving the performance of deep learning tasks with insufficient training data. In this scheme, a target learning task with insufficient training data is carried out simultaneously with another source learning task with abundant training data. However, the source learning task does not use all existing training data. Our core idea is to identify and use a subset of training images from the original source learning task whose low-level characteristics are similar to those from the target learning task, and jointly fine-tune shared convolutional layers for both tasks. Specifically, we compute descriptors from linear or nonlinear filter bank responses on training images from both tasks, and use such descriptors to search for a desired subset of training samples for the source learning task. Experiments demonstrate that our deep transfer learning scheme achieves state-of-the-art performance on multiple visual classification tasks with insufficient training data for deep learning. Such tasks include Caltech 256, MIT Indoor 67, and fine-grained classification problems (Oxford Flowers 102 and Stanford Dogs 120). In comparison to fine-tuning without a source domain, the proposed method can improve the classification accuracy by 2% - 10% using a single model. Codes and models are available at https://github.com/ZYYSzj/Selective-Joint-Fine-tuning.

Proceedings Article
01 Jan 2017
TL;DR: In this article, a data-driven approach to active learning is proposed, where a regressor is trained to predict the expected error reduction for a candidate sample in a particular learning state.
Abstract: In this paper, we suggest a novel data-driven approach to active learning (AL). The key idea is to train a regressor that predicts the expected error reduction for a candidate sample in a particular learning state. By formulating the query selection procedure as a regression problem we are not restricted to working with existing AL heuristics; instead, we learn strategies based on experience from previous AL outcomes. We show that a strategy can be learnt either from simple synthetic 2D datasets or from a subset of domain-specific data. Our method yields strategies that work well on real data from a wide range of domains.

Journal ArticleDOI
TL;DR: This paper proposes an evolutionary cost-sensitive ELM, which to the best of the authors' knowledge, is the first proposal of ELM in evolutionary Cost-sensitive classification scenario, and well addresses the open issue of how to define the cost matrix in cost- sensitive learning tasks.
Abstract: Conventional extreme learning machines (ELMs) solve a Moore–Penrose generalized inverse of hidden layer activated matrix and analytically determine the output weights to achieve generalized performance, by assuming the same loss from different types of misclassification. The assumption may not hold in cost-sensitive recognition tasks, such as face recognition-based access control system, where misclassifying a stranger as a family member may result in more serious disaster than misclassifying a family member as a stranger. Though recent cost-sensitive learning can reduce the total loss with a given cost matrix that quantifies how severe one type of mistake against another, in many realistic cases, the cost matrix is unknown to users. Motivated by these concerns, this paper proposes an evolutionary cost-sensitive ELM, with the following merits: 1) to the best of our knowledge, it is the first proposal of ELM in evolutionary cost-sensitive classification scenario; 2) it well addresses the open issue of how to define the cost matrix in cost-sensitive learning tasks; and 3) an evolutionary backtracking search algorithm is induced for adaptive cost matrix optimization. Experiments in a variety of cost-sensitive tasks well demonstrate the effectiveness of the proposed approaches, with about 5%–10% improvements.