scispace - formally typeset
Search or ask a question

Showing papers on "Active learning (machine learning) published in 2019"


Journal ArticleDOI
08 Aug 2019
TL;DR: A comprehensive overview and analysis of the most recent research in machine learning principles, algorithms, descriptors, and databases in materials science, and proposes solutions and future research paths for various challenges in computational materials science.
Abstract: One of the most exciting tools that have entered the material science toolbox in recent years is machine learning. This collection of statistical methods has already proved to be capable of considerably speeding up both fundamental and applied research. At present, we are witnessing an explosion of works that develop and apply machine learning to solid-state systems. We provide a comprehensive overview and analysis of the most recent research in this topic. As a starting point, we introduce machine learning principles, algorithms, descriptors, and databases in materials science. We continue with the description of different machine learning approaches for the discovery of stable materials and the prediction of their crystal structure. Then we discuss research in numerous quantitative structure–property relationships and various approaches for the replacement of first-principle methods by machine learning. We review how active learning and surrogate-based optimization can be applied to improve the rational design process and related examples of applications. Two major questions are always the interpretability of and the physical understanding gained from machine learning models. We consider therefore the different facets of interpretability and their importance in materials science. Finally, we propose solutions and future research paths for various challenges in computational materials science.

1,301 citations


Proceedings ArticleDOI
Donggeun Yoo, In So Kweon1
15 Jun 2019
TL;DR: In this article, the authors propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks, where a small parametric module, named ''loss prediction module'' to a target network, and learn it to predict target losses of unlabeled inputs.
Abstract: The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named ``loss prediction module,'' to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks.

429 citations


Journal ArticleDOI
TL;DR: In this article, a materials design strategy combining a machine learning (ML) surrogate model with experimental design algorithms to search for high entropy alloys (HEAs) with large hardness in a model Al-Co-Cr-Cu-Fe-Ni system was proposed.

387 citations


Journal ArticleDOI
18 Feb 2019
TL;DR: How methods from the information sciences enable us to accelerate the search and discovery of new materials is reviewed and active learning allows us to effectively navigate the search space iteratively to identify promising candidates for guiding experiments and computations.
Abstract: One of the main challenges in materials discovery is efficiently exploring the vast search space for targeted properties as approaches that rely on trial-and-error are impractical. We review how methods from the information sciences enable us to accelerate the search and discovery of new materials. In particular, active learning allows us to effectively navigate the search space iteratively to identify promising candidates for guiding experiments and computations. The approach relies on the use of uncertainties and making predictions from a surrogate model together with a utility function that prioritizes the decision making process on unexplored data. We discuss several utility functions and demonstrate their use in materials science applications, impacting both experimental and computational research. We summarize by indicating generalizations to multiple properties and multifidelity data, and identify challenges, future directions and opportunities in the emerging field of materials informatics.

297 citations


Journal ArticleDOI
Linfeng Zhang1, Deye Lin, Han Wang, Roberto Car1, Weinan E1 
TL;DR: Application to the sample systems of Al, Mg and Al-Mg alloys demonstrates that DP-GEN can produce uniformly accurate PES models with a minimal number of reference data.
Abstract: An active learning procedure called deep potential generator (DP-GEN) is proposed for the construction of accurate and transferable machine learning-based models of the potential energy surface (PES) for the molecular modeling of materials. This procedure consists of three main components: exploration, generation of accurate reference data, and training. Application to the sample systems of Al, Mg, and Al-Mg alloys demonstrates that DP-GEN can produce uniformly accurate PES models with a minimal number of reference data.

282 citations


Posted Content
TL;DR: BADGE as discussed by the authors samples groups of points that are disparate and high-magnitude when represented in a hallucinated gradient space, a strategy designed to incorporate both predictive uncertainty and sample diversity into every selected batch.
Abstract: We design a new algorithm for batch active learning with deep neural network models. Our algorithm, Batch Active learning by Diverse Gradient Embeddings (BADGE), samples groups of points that are disparate and high-magnitude when represented in a hallucinated gradient space, a strategy designed to incorporate both predictive uncertainty and sample diversity into every selected batch. Crucially, BADGE trades off between diversity and uncertainty without requiring any hand-tuned hyperparameters. We show that while other approaches sometimes succeed for particular batch sizes or architectures, BADGE consistently performs as well or better, making it a versatile option for practical active learning problems.

262 citations


Proceedings ArticleDOI
31 Mar 2019
TL;DR: In this article, a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner is proposed, where the VAE tries to trick the adversarial network into predicting that all data points are from the labeled pool.
Abstract: Active learning aims to develop label-efficient algorithms by sampling the most representative queries to be labeled by an oracle. We describe a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner. Our method learns a latent space using a variational autoencoder (VAE) and an adversarial network trained to discriminate between unlabeled and labeled data. The mini-max game between the VAE and the adversarial network is played such that while the VAE tries to trick the adversarial network into predicting that all data points are from the labeled pool, the adversarial network learns how to discriminate between dissimilarities in the latent space. We extensively evaluate our method on various image classification and semantic segmentation benchmark datasets and establish a new state of the art on CIFAR10/100, Caltech-256, ImageNet, Cityscapes, and BDD100K. Our results demonstrate that our adversarial approach learns an effective low dimensional latent space in large-scale settings and provides for a computationally efficient sampling method. Our code is available at \url{https://github.com/sinhasam/vaal}.

254 citations


Posted Content
TL;DR: BatchBALD as discussed by the authors is a tractable approximation to the mutual information between a batch of points and model parameters, which is used as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning.
Abstract: We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time $1 - \frac{1}{e}$-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.

238 citations


Posted Content
TL;DR: A pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner that learns an effective low dimensional latent space in large-scale settings and provides for a computationally efficient sampling method.
Abstract: Active learning aims to develop label-efficient algorithms by sampling the most representative queries to be labeled by an oracle. We describe a pool-based semi-supervised active learning algorithm that implicitly learns this sampling mechanism in an adversarial manner. Unlike conventional active learning algorithms, our approach is task agnostic, i.e., it does not depend on the performance of the task for which we are trying to acquire labeled data. Our method learns a latent space using a variational autoencoder (VAE) and an adversarial network trained to discriminate between unlabeled and labeled data. The mini-max game between the VAE and the adversarial network is played such that while the VAE tries to trick the adversarial network into predicting that all data points are from the labeled pool, the adversarial network learns how to discriminate between dissimilarities in the latent space. We extensively evaluate our method on various image classification and semantic segmentation benchmark datasets and establish a new state of the art on $\text{CIFAR10/100}$, $\text{Caltech-256}$, $\text{ImageNet}$, $\text{Cityscapes}$, and $\text{BDD100K}$. Our results demonstrate that our adversarial approach learns an effective low dimensional latent space in large-scale settings and provides for a computationally efficient sampling method. Our code is available at this https URL.

194 citations


Journal ArticleDOI
TL;DR: The results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting.
Abstract: For many applications the collection of labeled data is expensive laborious. Exploitation of unlabeled data during training is thus a long pursued objective of machine learning. Self-supervised learning addresses this by positing an auxiliary task (different, but related to the supervised task) for which data is abundantly available. In this paper, we show how ranking can be used as a proxy task for some regression problems. As another contribution, we propose an efficient backpropagation technique for Siamese networks which prevents the redundant computation introduced by the multi-branch network architecture. We apply our framework to two regression problems: Image Quality Assessment (IQA) and Crowd Counting. For both we show how to automatically generate ranked image sets from unlabeled data. Our results show that networks trained to regress to the ground truth targets for labeled data and to simultaneously learn to rank unlabeled data obtain significantly better, state-of-the-art results for both IQA and crowd counting. In addition, we show that measuring network uncertainty on the self-supervised proxy task is a good measure of informativeness of unlabeled data. This can be used to drive an algorithm for active learning and we show that this reduces labeling effort by up to 50 percent.

161 citations


Proceedings ArticleDOI
01 Oct 2019
TL;DR: This paper proposes a novel noisy label detection approach, named O2U-net, for deep neural networks without human annotations, which only requires adjusting the hyper-parameters of the deep network to make its status transfer from overfitting to underfitting (O2U) cyclically.
Abstract: This paper proposes a novel noisy label detection approach, named O2U-net, for deep neural networks without human annotations. Different from prior work which requires specifically designed noise-robust loss functions or networks, O2U-net is easy to implement but effective. It only requires adjusting the hyper-parameters of the deep network to make its status transfer from overfitting to underfitting (O2U) cyclically. The losses of each sample are recorded during iterations. The higher the normalized average loss of a sample, the higher the probability of being noisy labels. O2U-net is naturally compatible with active learning and other human annotation approaches. This introduces extra flexibility for learning with noisy labels. We conduct sufficient experiments on multiple datasets in various settings. The experimental results prove the state-of-the-art of O2S-net.

Proceedings Article
19 Jun 2019
TL;DR: BatchBALD is a tractable approximation to the mutual information between a batch of points and model parameters, which is used as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning.
Abstract: We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian active learning. BatchBALD is a greedy linear-time $1 - icefrac{1}{e}$-approximate algorithm amenable to dynamic programming and efficient caching. We compare BatchBALD to the commonly used approach for batch data acquisition and find that the current approach acquires similar and redundant points, sometimes performing worse than randomly acquiring data. We finish by showing that, using BatchBALD to consider dependencies within an acquisition batch, we achieve new state of the art performance on standard benchmarks, providing substantial data efficiency improvements in batch acquisition.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry.
Abstract: Machine learning (ML) models, such as artificial neural networks, have emerged as a complement to high-throughput screening, enabling characterization of new compounds in seconds instead of hours. The promise of ML models to enable large-scale chemical space exploration can only be realized if it is straightforward to identify when molecules and materials are outside the model's domain of applicability. Established uncertainty metrics for neural network models are either costly to obtain (e.g., ensemble models) or rely on feature engineering (e.g., feature space distances), and each has limitations in estimating prediction errors for chemical space exploration. We introduce the distance to available data in the latent space of a neural network ML model as a low-cost, quantitative uncertainty metric that works for both inorganic and organic chemistry. The calibrated performance of this approach exceeds widely used uncertainty metrics and is readily applied to models of increasing complexity at no additional cost. Tightening latent distance cutoffs systematically drives down predicted model errors below training errors, thus enabling predictive error control in chemical discovery or identification of useful data points for active learning.

Proceedings ArticleDOI
27 Jan 2019
TL;DR: This work proposes the novel framework of explanatory interactive learning where, in each step, the learner explains its query to the user, and the user interacts by both answering the query and correcting the explanation.
Abstract: Although interactive learning puts the user into the loop, the learner remains mostly a black box for the user. Understanding the reasons behind predictions and queries is important when assessing how the learner works and, in turn, trust. Consequently, we propose the novel framework of explanatory interactive learning where, in each step, the learner explains its query to the user, and the user interacts by both answering the query and correcting the explanation. We demonstrate that this can boost the predictive and explanatory powers of, and the trust into, the learned model, using text (e.g. SVMs) and image classification (e.g. neural networks) experiments as well as a user study.

Posted Content
TL;DR: In this article, an ensemble of dynamics models is used to incentivize the agent to explore such that the disagreement of those ensembles is maximized, which results in a sample-efficient exploration.
Abstract: Efficient exploration is a long-standing problem in sensorimotor learning. Major advances have been demonstrated in noise-free, non-stochastic domains such as video games and simulation. However, most of these formulations either get stuck in environments with stochastic dynamics or are too inefficient to be scalable to real robotics setups. In this paper, we propose a formulation for exploration inspired by the work in active learning literature. Specifically, we train an ensemble of dynamics models and incentivize the agent to explore such that the disagreement of those ensembles is maximized. This allows the agent to learn skills by exploring in a self-supervised manner without any external reward. Notably, we further leverage the disagreement objective to optimize the agent's policy in a differentiable manner, without using reinforcement learning, which results in a sample-efficient exploration. We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at this https URL

Journal ArticleDOI
TL;DR: The results demonstrate that the proposed model can prevent overfitting effectively and aggregate the labeled samples to train the parameters of the deep computation model with crowdsouring for industrial IoT big data feature learning.
Abstract: Deep computation, as an advanced machine learning model, has achieved the state-of-the-art performance for feature learning on big data in industrial Internet of Things (IoT). However, the current deep computation model usually suffers from overfitting due to the lack of public available labeled training samples, limiting its performance for big data feature learning. Motivated by the idea of active learning, an adaptive dropout deep computation model (ADDCM) with crowdsourcing to cloud is proposed for industrial IoT big data feature learning in this paper. First, a distribution function is designed to set the dropout rate for each hidden layer to prevent overfitting for the deep computation model. Furthermore, the outsourcing selection algorithm based on the maximum entropy is employed to choose appropriate samples from the training set to crowdsource on the cloud platform. Finally, an improved supervised learning from multiple experts scheme is presented to aggregate answers given by human workers and to update the parameters of the ADDCM simultaneously. Extensive experiments are conducted to evaluate the performance of the presented model by comparing with the dropout deep computation model and other state-of-the-art crowdsourcing algorithms. The results demonstrate that the proposed model can prevent overfitting effectively and aggregate the labeled samples to train the parameters of the deep computation model with crowdsouring for industrial IoT big data feature learning.

Journal ArticleDOI
27 Jun 2019
TL;DR: It is demonstrated that it is possible to significantly reduce human labeling effort without compromising final model performance by using a semitrained CNN model (i.e., trained with limited labeled data) to perform synthetic annotation.
Abstract: The yield of cereal crops such as sorghum (Sorghum bicolor L. Moench) depends on the distribution of crop-heads in varying branching arrangements. Therefore, counting the head number per unit area is critical for plant breeders to correlate with the genotypic variation in a specific breeding field. However, measuring such phenotypic traits manually is an extremely labor-intensive process and suffers from low efficiency and human errors. Moreover, the process is almost infeasible for large-scale breeding plantations or experiments. Machine learning-based approaches like deep convolutional neural network (CNN) based object detectors are promising tools for efficient object detection and counting. However, a significant limitation of such deep learning-based approaches is that they typically require a massive amount of hand-labeled images for training, which is still a tedious process. Here, we propose an active learning inspired weakly supervised deep learning framework for sorghum head detection and counting from UAV-based images. We demonstrate that it is possible to significantly reduce human labeling effort without compromising final model performance ( between human count and machine count is 0.88) by using a semitrained CNN model (i.e., trained with limited labeled data) to perform synthetic annotation. In addition, we also visualize key features that the network learns. This improves trustworthiness by enabling users to better understand and trust the decisions that the trained deep learning model makes.

Proceedings Article
24 May 2019
TL;DR: This paper proposes a formulation for exploration inspired by the work in active learning literature and trains an ensemble of dynamics models and incentivizes the agent to explore such that the disagreement of those ensembles is maximized, which results in a sample-efficient exploration.
Abstract: Efficient exploration is a long-standing problem in sensorimotor learning. Major advances have been demonstrated in noise-free, non-stochastic domains such as video games and simulation. However, most of these formulations either get stuck in environments with stochastic dynamics or are too inefficient to be scalable to real robotics setups. In this paper, we propose a formulation for exploration inspired by the work in active learning literature. Specifically, we train an ensemble of dynamics models and incentivize the agent to explore such that the disagreement of those ensembles is maximized. This allows the agent to learn skills by exploring in a self-supervised manner without any external reward. Notably, we further leverage the disagreement objective to optimize the agent's policy in a differentiable manner, without using reinforcement learning, which results in a sample-efficient exploration. We demonstrate the efficacy of this formulation across a variety of benchmark environments including stochastic-Atari, Mujoco and Unity. Finally, we implement our differentiable exploration on a real robot which learns to interact with objects completely from scratch. Project videos and code are at this https URL

Journal ArticleDOI
TL;DR: In this article, an active learning strategy for robotic systems that takes into account task information, enables fast learning, and allows control to be readily synthesized by taking advantage of the Koopman operator representation is presented.
Abstract: This paper presents an active learning strategy for robotic systems that takes into account task information, enables fast learning, and allows control to be readily synthesized by taking advantage of the Koopman operator representation. We first motivate the use of representing nonlinear systems as linear Koopman operator systems by illustrating the improved model-based control performance with an actuated Van der Pol system. Information-theoretic methods are then applied to the Koopman operator formulation of dynamical systems where we derive a controller for active learning of robot dynamics. The active learning controller is shown to increase the rate of information about the Koopman operator. In addition, our active learning controller can readily incorporate policies built on the Koopman dynamics, enabling the benefits of fast active learning and improved control. Results using a quadcopter illustrate single-execution active learning and stabilization capabilities during free fall. The results for active learning are extended for automating Koopman observables and we implement our method on real robotic systems.

Journal ArticleDOI
TL;DR: This work proposes a failure-pursuing sampling framework, which is able to adopt various surrogate models or active learning strategies, and takes into account the joint probability density function of random variables, the individual information at candidate points and the improvement of the accuracy of predicted failure probability.

Posted Content
TL;DR: Experimental results show the proposed batch mode active learning algorithm, Discriminative Active Learning, to be on par with state of the art methods in medium and large query batch sizes, while being simple to implement and also extend to other domains besides classification tasks.
Abstract: We propose a new batch mode active learning algorithm designed for neural networks and large query batch sizes. The method, Discriminative Active Learning (DAL), poses active learning as a binary classification task, attempting to choose examples to label in such a way as to make the labeled set and the unlabeled pool indistinguishable. Experimenting on image classification tasks, we empirically show our method to be on par with state of the art methods in medium and large query batch sizes, while being simple to implement and also extend to other domains besides classification tasks. Our experiments also show that none of the state of the art methods of today are clearly better than uncertainty sampling when the batch size is relatively large, negating some of the reported results in the recent literature.

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a unified deep network, combined with active transfer learning (TL) that can be well-trained for hyperspectral images classification using only minimally labeled training data.
Abstract: Deep learning has recently attracted significant attention in the field of hyperspectral images (HSIs) classification. However, the construction of an efficient deep neural network mostly relies on a large number of labeled samples being available. To address this problem, this paper proposes a unified deep network, combined with active transfer learning (TL) that can be well-trained for HSIs classification using only minimally labeled training data. More specifically, deep joint spectral–spatial feature is first extracted through hierarchical stacked sparse autoencoder (SSAE) networks. Active TL is then exploited to transfer the pretrained SSAE network and the limited training samples from the source domain to the target domain, where the SSAE network is subsequently fine-tuned using the limited labeled samples selected from both source and target domains by the corresponding active learning (AL) strategies. The advantages of our proposed method are threefold: 1) the network can be effectively trained using only limited labeled samples with the help of novel AL strategies; 2) the network is flexible and scalable enough to function across various transfer situations, including cross data set and intraimage; and 3) the learned deep joint spectral–spatial feature representation is more generic and robust than many joint spectral–spatial feature representations. Extensive comparative evaluations demonstrate that our proposed method significantly outperforms many state-of-the-art approaches, including both traditional and deep network-based methods, on three popular data sets.

Journal ArticleDOI
TL;DR: In this paper, Bayesian semi-supervised graph convolutional neural networks are used to estimate uncertainty in a statistically principled way through sampling from the posterior distribution, which disentangles representation learning and regression, keeping uncertainty estimates accurate in the low data limit.
Abstract: Predicting bioactivity and physical properties of small molecules is a central challenge in drug discovery. Deep learning is becoming the method of choice but studies to date focus on mean accuracy as the main metric. However, to replace costly and mission-critical experiments by models, a high mean accuracy is not enough: outliers can derail a discovery campaign, thus models need to reliably predict when it will fail, even when the training data is biased; experiments are expensive, thus models need to be data-efficient and suggest informative training sets using active learning. We show that uncertainty quantification and active learning can be achieved by Bayesian semi-supervised graph convolutional neural networks. The Bayesian approach estimates uncertainty in a statistically principled way through sampling from the posterior distribution. Semi-supervised learning disentangles representation learning and regression, keeping uncertainty estimates accurate in the low data limit and allowing the model to start active learning from a small initial pool of training data. Our study highlights the promise of Bayesian deep learning for chemistry.

Posted Content
TL;DR: This work shows that it can significantly improve the computational efficiency of data selection in deep learning by using a much smaller proxy model to perform data selection for tasks that will eventually require a large target model (e.g., selecting data points to label for active learning).
Abstract: Data selection methods, such as active learning and core-set selection, are useful tools for machine learning on large datasets. However, they can be prohibitively expensive to apply in deep learning because they depend on feature representations that need to be learned. In this work, we show that we can greatly improve the computational efficiency by using a small proxy model to perform data selection (e.g., selecting data points to label for active learning). By removing hidden layers from the target model, using smaller architectures, and training for fewer epochs, we create proxies that are an order of magnitude faster to train. Although these small proxy models have higher error rates, we find that they empirically provide useful signals for data selection. We evaluate this "selection via proxy" (SVP) approach on several data selection tasks across five datasets: CIFAR10, CIFAR100, ImageNet, Amazon Review Polarity, and Amazon Review Full. For active learning, applying SVP can give an order of magnitude improvement in data selection runtime (i.e., the time it takes to repeatedly train and select points) without significantly increasing the final error (often within 0.1%). For core-set selection on CIFAR10, proxies that are over 10x faster to train than their larger, more accurate targets can remove up to 50% of the data without harming the final accuracy of the target, leading to a 1.6x end-to-end training time improvement.

Journal ArticleDOI
TL;DR: In this article, various machine learning (ML) implementations and plans to make the maximal use of the large data set by taking advantage of the temporal nature of the data, and further combining it with other data sets.
Abstract: The Zwicky Transient Facility is a large optical survey in multiple filters producing hundreds of thousands of transient alerts per night. We describe here various machine learning (ML) implementations and plans to make the maximal use of the large data set by taking advantage of the temporal nature of the data, and further combining it with other data sets. We start with the initial steps of separating bogus candidates from real ones, separating stars and galaxies, and go on to the classification of real objects into various classes. Besides the usual methods (e.g., based on features extracted from light curves) we also describe early plans for alternate methods including the use of domain adaptation, and deep learning. In a similar fashion we describe efforts to detect fast moving asteroids. We also describe the use of the Zooniverse platform for helping with classifications through the creation of training samples, and active learning. Finally we mention the synergistic aspects of ZTF and LSST from the ML perspective.

Posted Content
Donggeun Yoo, In So Kweon1
TL;DR: A novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks, by attaching a small parametric module, named ``loss prediction module,'' to a target network, and learning it to predict target losses of unlabeled inputs.
Abstract: The performance of deep neural networks improves with more annotated data. The problem is that the budget for annotation is limited. One solution to this is active learning, where a model asks human to annotate data that it perceived as uncertain. A variety of recent methods have been proposed to apply active learning to deep networks but most of them are either designed specific for their target tasks or computationally inefficient for large networks. In this paper, we propose a novel active learning method that is simple but task-agnostic, and works efficiently with the deep networks. We attach a small parametric module, named "loss prediction module," to a target network, and learn it to predict target losses of unlabeled inputs. Then, this module can suggest data that the target model is likely to produce a wrong prediction. This method is task-agnostic as networks are learned from a single loss regardless of target tasks. We rigorously validate our method through image classification, object detection, and human pose estimation, with the recent network architectures. The results demonstrate that our method consistently outperforms the previous methods over the tasks.

Proceedings Article
01 Jan 2019
TL;DR: A novel Bayesian batch active learning approach that mitigates standard greedy procedures for large-scale regression and classification tasks and derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and generalize to arbitrary models using random projections.
Abstract: Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the most informative data points to be labeled. However, for many large-scale problems standard greedy procedures become computationally infeasible and suffer from negligible model change. In this paper, we introduce a novel Bayesian batch active learning approach that mitigates these issues. Our approach is motivated by approximating the complete data posterior of the model parameters. While naive batch construction methods result in correlated queries, our algorithm produces diverse batches that enable efficient active learning at scale. We derive interpretable closed-form solutions akin to existing active learning procedures for linear models, and generalize to arbitrary models using random projections. We demonstrate the benefits of our approach on several large-scale regression and classification tasks.

Posted Content
TL;DR: This work studies the problem of reducing the amount of labeled training data required to train supervised classification models by leveraging Active Learning, through sequential selection of examples which benefit the model most, and considers the mini-batch Active Learning setting, where several examples are selected at once.
Abstract: We study the problem of reducing the amount of labeled training data required to train supervised classification models. We approach it by leveraging Active Learning, through sequential selection of examples which benefit the model most. Selecting examples one by one is not practical for the amount of training examples required by the modern Deep Learning models. We consider the mini-batch Active Learning setting, where several examples are selected at once. We present an approach which takes into account both informativeness of the examples for the model, as well as the diversity of the examples in a mini-batch. By using the well studied K-means clustering algorithm, this approach scales better than the previously proposed approaches, and achieves comparable or better performance.

Journal ArticleDOI
TL;DR: A novel importance learning method (ILM) is proposed on the basis of active learning technique using Kriging metamodel, which builds the Kriged model accurately and efficiently by considering the influence of the most concerned point.
Abstract: With the time-consuming computations incurred by nested double-loop strategy and multiple performance functions, the enhancement of computational efficiency for the non-probabilistic reliability estimation and optimization is a challenging problem in the assessment of structural safety. In this study, a novel importance learning method (ILM) is proposed on the basis of active learning technique using Kriging metamodel, which builds the Kriging model accurately and efficiently by considering the influence of the most concerned point. To further accelerate the convergence rate of non-probabilistic reliability analysis, a new stopping criterion is constructed to ensure accuracy of the Kriging model. For solving the non-probabilistic reliability-based design optimization (NRBDO) problems with multiple non-probabilistic constraints, a new active learning function is further developed based upon the ILM for dealing with this problem efficiently. The proposed ILM is verified by two non-probabilistic reliability estimation examples and three NRBDO examples. Comparing with the existing active learning methods, the optimal results calculated by the proposed ILM show high performance in terms of efficiency and accuracy.

Posted Content
Jungo Kasai1, Kun Qian2, Sairam Gurajada2, Yunyao Li2, Lucian Popa2 
TL;DR: This paper develops a deep learning-based method that targets low-resource settings for ER through a novel combination of transfer learning and active learning and designs an architecture that allows us to learn a transferable model from a high-resource setting to a low- resource one.
Abstract: Entity resolution (ER) is the task of identifying different representations of the same real-world entities across databases. It is a key step for knowledge base creation and text mining. Recent adaptation of deep learning methods for ER mitigates the need for dataset-specific feature engineering by constructing distributed representations of entity records. While these methods achieve state-of-the-art performance over benchmark data, they require large amounts of labeled data, which are typically unavailable in realistic ER applications. In this paper, we develop a deep learning-based method that targets low-resource settings for ER through a novel combination of transfer learning and active learning. We design an architecture that allows us to learn a transferable model from a high-resource setting to a low-resource one. To further adapt to the target dataset, we incorporate active learning that carefully selects a few informative examples to fine-tune the transferred model. Empirical evaluation demonstrates that our method achieves comparable, if not better, performance compared to state-of-the-art learning-based methods while using an order of magnitude fewer labels.