Showing papers on "MNIST database published in 2018"

PDF

Open Access

Journal Article•DOI•

A systematic study of the class imbalance problem in convolutional neural networks

[...]

Mateusz Buda¹, Atsuto Maki², Maciej A. Mazurowski¹•Institutions (2)

Duke University¹, Royal Institute of Technology²

01 Oct 2018-Neural Networks

TL;DR: The effect of class imbalance on classification performance is detrimental; the method of addressing class imbalance that emerged as dominant in almost all analyzed scenarios was oversampling; and thresholding should be applied to compensate for prior class probabilities when overall number of properly classified cases is of interest.

...read moreread less

1,777 citations

Posted Content•

Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels

[...]

Bo Han¹, Quanming Yao², Xingrui Yu³, Gang Niu⁴, Miao Xu⁵, Weihua Hu⁴, Ivor W. Tsang¹, Masashi Sugiyama⁴ - Show less +4 more•Institutions (5)

University of Technology, Sydney¹, Hong Kong University of Science and Technology², China University of Petroleum³, University of Tokyo⁴, University of Queensland⁵

18 Apr 2018-arXiv: Learning

TL;DR: Co-teaching as discussed by the authors trains two deep neural networks simultaneously, and let them teach each other given every mini-batch: first, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this minibatch should be used for training; finally, each networks back propagates the data selected by its peer network and updates itself.

...read moreread less

Abstract: Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called Co-teaching for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.

...read moreread less

866 citations

Proceedings Article•

Certified Defenses against Adversarial Examples

[...]

Aditi Raghunathan¹, Jacob Steinhardt¹, Percy Liang¹•Institutions (1)

Stanford University¹

29 Jan 2018

TL;DR: This work proposes a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value, providing an adaptive regularizer that encourages robustness against all attacks.

...read moreread less

Abstract: While neural networks have achieved high accuracy on standard image classification benchmarks, their accuracy drops to nearly zero in the presence of small adversarial perturbations to test inputs. Defenses based on regularization and adversarial training have been proposed, but often followed by new, stronger attacks that defeat these defenses. Can we somehow end this arms race? In this work, we study this problem for neural networks with one hidden layer. We first propose a method based on a semidefinite relaxation that outputs a certificate that for a given network and test input, no attack can force the error to exceed a certain value. Second, as this certificate is differentiable, we jointly optimize it with the network parameters, providing an adaptive regularizer that encourages robustness against all attacks. On MNIST, our approach produces a network and a certificate that no attack that perturbs each pixel by at most \epsilon = 0.1 can cause more than 35% test error.

...read moreread less

758 citations

Proceedings Article•

Deep Neural Networks as Gaussian Processes

[...]

Jaehoon Lee¹, Yasaman Bahri², Roman Novak², Samuel S. Schoenholz², Jeffrey Pennington², Jascha Sohl-Dickstein² - Show less +2 more•Institutions (2)

University of British Columbia¹, Google²

15 Feb 2018

TL;DR: The exact equivalence between infinitely wide deep networks and GPs is derived and it is found that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite- width networks.

...read moreread less

Abstract: It has long been known that a single-layer fully-connected neural network with an i.i.d. prior over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network width. This correspondence enables exact Bayesian inference for infinite width neural networks on regression tasks by means of evaluating the corresponding GP. Recently, kernel functions which mimic multi-layer random neural networks have been developed, but only outside of a Bayesian framework. As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network. In this work, we derive the exact equivalence between infinitely wide deep networks and GPs. We further develop a computationally efficient pipeline to compute the covariance function for these GPs. We then use the resulting GPs to perform Bayesian inference for wide deep neural networks on MNIST and CIFAR-10. We observe that trained neural network accuracy approaches that of the corresponding GP with increasing layer width, and that the GP uncertainty is strongly correlated with trained network prediction error. We further find that test performance increases as finite-width trained networks are made wider and more similar to a GP, and thus that GP predictions typically outperform those of finite-width networks. Finally we connect the performance of these GPs to the recent theory of signal propagation in random neural networks.

...read moreread less

757 citations

Journal Article•DOI•

Equivalent-accuracy accelerated neural-network training using analogue memory

[...]

Stefano Ambrogio¹, Pritish Narayanan¹, Hsinyu Tsai¹, Robert M. Shelby¹, Irem Boybat¹, Irem Boybat², Carmelo di Nolfo¹, Carmelo di Nolfo², Severin Sidler¹, Severin Sidler², Massimo Giordano¹, Martina Bodini², Martina Bodini¹, Nathan C. P. Farinha¹, Benjamin Killeen¹, Christina Cheng¹, Yassine Jaoudi¹, Geoffrey W. Burr¹ - Show less +14 more•Institutions (2)

IBM¹, École Polytechnique Fédérale de Lausanne²

06 Jun 2018-Nature

TL;DR: Mixed hardware–software neural-network implementations that involve up to 204,900 synapses and that combine long-term storage in phase-change memory, near-linear updates of volatile capacitors and weight-data transfer with ‘polarity inversion’ to cancel out inherent device-to-device variations are demonstrated.

...read moreread less

Abstract: Neural-network training can be slow and energy intensive, owing to the need to transfer the weight data for the network between conventional digital memory chips and processor chips. Analogue non-volatile memory can accelerate the neural-network training algorithm known as backpropagation by performing parallelized multiply-accumulate operations in the analogue domain at the location of the weight data. However, the classification accuracies of such in situ training using non-volatile-memory hardware have generally been less than those of software-based training, owing to insufficient dynamic range and excessive weight-update asymmetry. Here we demonstrate mixed hardware-software neural-network implementations that involve up to 204,900 synapses and that combine long-term storage in phase-change memory, near-linear updates of volatile capacitors and weight-data transfer with 'polarity inversion' to cancel out inherent device-to-device variations. We achieve generalization accuracies (on previously unseen data) equivalent to those of software-based training on various commonly used machine-learning test datasets (MNIST, MNIST-backrand, CIFAR-10 and CIFAR-100). The computational energy efficiency of 28,065 billion operations per second per watt and throughput per area of 3.6 trillion operations per second per square millimetre that we calculate for our implementation exceed those of today's graphical processing units by two orders of magnitude. This work provides a path towards hardware accelerators that are both fast and energy efficient, particularly on fully connected neural-network layers.

...read moreread less

693 citations

Proceedings Article•DOI•

Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels

[...]

Bo Han¹, Quanming Yao², Xingrui Yu³, Gang Niu⁴, Miao Xu⁵, Weihua Hu⁴, Ivor W. Tsang¹, Masashi Sugiyama⁴ - Show less +4 more•Institutions (5)

University of Technology, Sydney¹, Hong Kong University of Science and Technology², China University of Petroleum³, University of Tokyo⁴, University of Queensland⁵

01 Jan 2018

TL;DR: Empirical results on noisy versions of MNIST, CIFar-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.

...read moreread less

Abstract: Deep learning with noisy labels is practically challenging, as the capacity of deep models is so high that they can totally memorize these noisy labels sooner or later during training. Nonetheless, recent studies on the memorization effects of deep neural networks show that they would first memorize training data of clean labels and then those of noisy labels. Therefore in this paper, we propose a new deep learning paradigm called ''Co-teaching'' for combating with noisy labels. Namely, we train two deep neural networks simultaneously, and let them teach each other given every mini-batch: firstly, each network feeds forward all data and selects some data of possibly clean labels; secondly, two networks communicate with each other what data in this mini-batch should be used for training; finally, each network back propagates the data selected by its peer network and updates itself. Empirical results on noisy versions of MNIST, CIFAR-10 and CIFAR-100 demonstrate that Co-teaching is much superior to the state-of-the-art methods in the robustness of trained deep models.

...read moreread less

657 citations

Proceedings Article•

Attention-based Deep Multiple Instance Learning

[...]

Maximilian Ilse¹, Jakub M. Tomczak¹, Max Welling¹•Institutions (1)

University of Amsterdam¹

03 Jul 2018

TL;DR: In this paper, a neural network-based permutation-invariant aggregation operator is proposed to learn the Bernoulli distribution of the bag label, where the bag-label probability is fully parameterized by neural networks.

...read moreread less

Abstract: Multiple instance learning (MIL) is a variation of supervised learning where a single class label is assigned to a bag of instances. In this paper, we state the MIL problem as learning the Bernoulli distribution of the bag label where the bag label probability is fully parameterized by neural networks. Furthermore, we propose a neural network-based permutation-invariant aggregation operator that corresponds to the attention mechanism. Notably, an application of the proposed attention-based operator provides insight into the contribution of each instance to the bag label. We show empirically that our approach achieves comparable performance to the best MIL methods on benchmark MIL datasets and it outperforms other methods on a MNIST-based MIL dataset and two real-life histopathology datasets without sacrificing interpretability.

...read moreread less

621 citations

Proceedings Article•DOI•

Generate to Adapt: Aligning Domains Using Generative Adversarial Networks

[...]

Swami Sankaranarayanan¹, Yogesh Balaji¹, Carlos D. Castillo¹, Rama Chellappa¹•Institutions (1)

University of Maryland, College Park¹

18 Jun 2018

TL;DR: This work proposes an approach that leverages unsupervised data to bring the source and target distributions closer in a learned joint feature space by inducing a symbiotic relationship between the learned embedding and a generative adversarial network.

...read moreread less

Abstract: Domain Adaptation is an actively researched problem in Computer Vision. In this work, we propose an approach that leverages unsupervised data to bring the source and target distributions closer in a learned joint feature space. We accomplish this by inducing a symbiotic relationship between the learned embedding and a generative adversarial network. This is in contrast to methods which use the adversarial framework for realistic data generation and retraining deep models with such data. We demonstrate the strength and generality of our approach by performing experiments on three different tasks with varying levels of difficulty: (1) Digit classification (MNIST, SVHN and USPS datasets) (2) Object recognition using OFFICE dataset and (3) Domain adaptation from synthetic to real data. Our method achieves state-of-the art performance in most experimental settings and by far the only GAN-based method that has been shown to work well across different datasets such as OFFICE and DIGITS.

...read moreread less

616 citations

Proceedings Article•

Evaluating Robustness of Neural Networks with Mixed Integer Programming

[...]

Vincent Tjeng, Kai Xiao¹, Russ Tedrake¹•Institutions (1)

Massachusetts Institute of Technology¹

27 Sep 2018

TL;DR: Verification of piecewise-linear neural networks as a mixed integer program that is able to certify more samples than the state-of-the-art and find more adversarial examples than a strong first-order attack for every network.

...read moreread less

Abstract: Neural networks have demonstrated considerable success on a wide variety of real-world problems. However, neural networks can be fooled by adversarial examples – slightly perturbed inputs that are misclassified with high confidence. Verification of networks enables us to gauge their vulnerability to such adversarial examples. We formulate verification of piecewise-linear neural networks as a mixed integer program. Our verifier finds minimum adversarial distortions two to three orders of magnitude more quickly than the state-of-the-art. We achieve this via tight formulations for non-linearities, as well as a novel presolve algorithm that makes full use of all information available. The computational speedup enables us to verify properties on convolutional networks with an order of magnitude more ReLUs than had been previously verified by any complete verifier, and we determine for the first time the exact adversarial accuracy of an MNIST classifier to perturbations with bounded l[infinity] norm e = 0:1. On this network, we find an adversarial example for 4.38% of samples, and a certificate of robustness for the remainder. Across a variety of robust training procedures, we are able to certify more samples than the state-of-the-art and find more adversarial examples than a strong first-order attack for every network.

...read moreread less

600 citations

Proceedings Article•

Thermometer Encoding: One Hot Way To Resist Adversarial Examples

[...]

Jacob Buckman¹, Aurko Roy¹, Colin Raffel¹, Ian Goodfellow¹•Institutions (1)

Google¹

15 Feb 2018

TL;DR: A simple modification to standard neural network ar3 chitectures, thermometer encoding is proposed, which significantly increases the robustness of the network to adversarial examples, and the proper ties of these networks are explored, providing evidence that thermometer encodings help neural networks to find more-non-linear decision boundaries.

...read moreread less

Abstract: It is well known that it is possible to construct "adversarial examples" for neural networks: inputs which are misclassified by the network yet indistinguishable from true data We propose a simple modification to standard neural network architectures, thermometer encoding, which significantly increases the robustness of the network to adversarial examples We demonstrate this robustness with experiments on the MNIST, CIFAR-10, CIFAR-100, and SVHN datasets, and show that models with thermometer-encoded inputs consistently have higher accuracy on adversarial examples, without decreasing generalization State-of-the-art accuracy under the strongest known white-box attack was increased from 9320% to 9430% on MNIST and 5000% to 7916% on CIFAR-10 We explore the properties of these networks, providing evidence that thermometer encodings help neural networks to find more-non-linear decision boundaries

...read moreread less

548 citations

Proceedings Article•DOI•

The Power of Ensembles for Active Learning in Image Classification

[...]

William H. Beluch¹, Tim Genewein, Andreas Nürnberger², Jan M. Kohler•Institutions (2)

Bosch¹, Otto-von-Guericke University Magdeburg²

18 Jun 2018

TL;DR: It is found that ensembles perform better and lead to more calibrated predictive uncertainties, which are the basis for many active learning algorithms, and Monte-Carlo Dropout uncertainties perform worse.

...read moreread less

Abstract: Deep learning methods have become the de-facto standard for challenging image processing tasks such as image classification. One major hurdle of deep learning approaches is that large sets of labeled data are necessary, which can be prohibitively costly to obtain, particularly in medical image diagnosis applications. Active learning techniques can alleviate this labeling effort. In this paper we investigate some recently proposed methods for active learning with high-dimensional data and convolutional neural network classifiers. We compare ensemble-based methods against Monte-Carlo Dropout and geometric approaches. We find that ensembles perform better and lead to more calibrated predictive uncertainties, which are the basis for many active learning algorithms. To investigate why Monte-Carlo Dropout uncertainties perform worse, we explore potential differences in isolation in a series of experiments. We show results for MNIST and CIFAR-10, on which we achieve a test set accuracy of 90% with roughly 12,200 labeled images, and initial results on ImageNet. Additionally, we show results on a large, highly class-imbalanced diabetic retinopathy dataset. We observe that the ensemble-based active learning effectively counteracts this imbalance during acquisition.

...read moreread less

Book Chapter•DOI•

Structured Sequence Modeling with Graph Convolutional Recurrent Networks

[...]

Youngjoo Seo¹, Michaël Defferrard¹, Pierre Vandergheynst¹, Xavier Bresson²•Institutions (2)

École Polytechnique Fédérale de Lausanne¹, Nanyang Technological University²

13 Dec 2018

TL;DR: The proposed model combines convolutional neural networks on graphs to identify spatial structures and RNN to find dynamic patterns in data structured by an arbitrary graph.

...read moreread less

Abstract: This paper introduces Graph Convolutional Recurrent Network (GCRN), a deep learning model able to predict structured sequences of data. Precisely, GCRN is a generalization of classical recurrent neural networks (RNN) to data structured by an arbitrary graph. The structured sequences can represent series of frames in videos, spatio-temporal measurements on a network of sensors, or random walks on a vocabulary graph for natural language modeling. The proposed model combines convolutional neural networks (CNN) on graphs to identify spatial structures and RNN to find dynamic patterns. We study two possible architectures of GCRN, and apply the models to two practical problems: predicting moving MNIST data, and modeling natural language with the Penn Treebank dataset. Experiments show that exploiting simultaneously graph spatial and dynamic information about data can improve both precision and learning speed.

...read moreread less

Proceedings Article•DOI•

Adversarially Learned One-Class Classifier for Novelty Detection

[...]

Mohammad Sabokrou, Mohammad Khalooei¹, Mahmood Fathy, Ehsan Adeli²•Institutions (2)

Amirkabir University of Technology¹, Stanford University²

18 Jun 2018

TL;DR: In this paper, the authors proposed an end-to-end architecture for one-class classification, which consists of two deep networks, each of which trained by competing with each other while collaborating to understand the underlying concept in the target class, and then classify the testing samples.

...read moreread less

Abstract: Novelty detection is the process of identifying the observation(s) that differ in some respect from the training observations (the target class). In reality, the novelty class is often absent during training, poorly sampled or not well defined. Therefore, one-class classifiers can efficiently model such problems. However, due to the unavailability of data from the novelty class, training an end-to-end deep network is a cumbersome task. In this paper, inspired by the success of generative adversarial networks for training deep models in unsupervised and semi-supervised settings, we propose an end-to-end architecture for one-class classification. Our architecture is composed of two deep networks, each of which trained by competing with each other while collaborating to understand the underlying concept in the target class, and then classify the testing samples. One network works as the novelty detector, while the other supports it by enhancing the inlier samples and distorting the outliers. The intuition is that the separability of the enhanced inliers and distorted outliers is much better than deciding on the original samples. The proposed framework applies to different related applications of anomaly and outlier detection in images and videos. The results on MNIST and Caltech-256 image datasets, along with the challenging UCSD Ped2 dataset for video anomaly detection illustrate that our proposed method learns the target class effectively and is superior to the baseline and state-of-the-art methods.

...read moreread less

Journal Article•DOI•

STDP-based spiking deep convolutional neural networks for object recognition

[...]

Saeed Reza Kheradpisheh¹, Saeed Reza Kheradpisheh², Mohammad Ganjtabesh², Simon J. Thorpe¹, Timothée Masquelier¹ - Show less +1 more•Institutions (2)

University of Toulouse¹, University of Tehran²

01 Mar 2018-Neural Networks

TL;DR: The results suggest that the combination of STDP with latency coding may be a key to understanding the way that the primate visual system learns, its remarkable processing speed and its low energy consumption.

...read moreread less

Proceedings Article•DOI•

Image to Image Translation for Domain Adaptation

[...]

Zak Murez¹, Soheil Kolouri², David J. Kriegman¹, Ravi Ramamoorthi¹, Kyungnam Kim² - Show less +1 more•Institutions (2)

University of California, San Diego¹, HRL Laboratories²

18 Jun 2018

TL;DR: This work proposes the novel use of the recently proposed unpaired image-to-image translation framework to constrain the features extracted by the backbone encoder network, and applies it to domain adaptation between MNIST, USPS, and SVHN datasets, and Amazon, Webcam and DSLR Office datasets in classification tasks, and also between GTA5 and Cityscapes datasets for a segmentation task.

...read moreread less

Abstract: We propose a general framework for unsupervised domain adaptation, which allows deep neural networks trained on a source domain to be tested on a different target domain without requiring any training annotations in the target domain. This is achieved by adding extra networks and losses that help regularize the features extracted by the backbone encoder network. To this end we propose the novel use of the recently proposed unpaired image-to-image translation framework to constrain the features extracted by the encoder network. Specifically, we require that the features extracted are able to reconstruct the images in both domains. In addition we require that the distribution of features extracted from images in the two domains are indistinguishable. Many recent works can be seen as specific cases of our general framework. We apply our method for domain adaptation between MNIST, USPS, and SVHN datasets, and Amazon, Webcam and DSLR Office datasets in classification tasks, and also between GTA5 and Cityscapes datasets for a segmentation task. We demonstrate state of the art performance on each of these datasets.

...read moreread less

Proceedings Article•DOI•

Predictive uncertainty estimation via prior networks

[...]

Andrey Malinin¹, Mark J. F. Gales¹•Institutions (1)

University of Cambridge¹

03 Dec 2018

TL;DR: This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models distributional uncertainty by parameterizing a prior distribution over predictive distributions and evaluates PNs on the tasks of identifying out-of-distribution samples and detecting misclassification on the MNIST dataset, where they are found to outperform previous methods.

...read moreread less

Abstract: Estimating how uncertain an AI system is in its predictions is important to improve the safety of such systems. Uncertainty in predictive can result from uncertainty in model parameters, irreducible data uncertainty and uncertainty due to distributional mismatch between the test and training data distributions. Different actions might be taken depending on the source of the uncertainty so it is important to be able to distinguish between them. Recently, baseline tasks and metrics have been defined and several practical methods to estimate uncertainty developed. These methods, however, attempt to model uncertainty due to distributional mismatch either implicitly through model uncertainty or as data uncertainty. This work proposes a new framework for modeling predictive uncertainty called Prior Networks (PNs) which explicitly models distributional uncertainty. PNs do this by parameterizing a prior distribution over predictive distributions. This work focuses on uncertainty for classification and evaluates PNs on the tasks of identifying out-of-distribution (OOD) samples and detecting misclassification on the MNIST and CIFAR-10 datasets, where they are found to outperform previous methods. Experiments on synthetic and MNIST and CIFAR-10 data show that unlike previous non-Bayesian methods PNs are able to distinguish between data and distributional uncertainty.

...read moreread less

Proceedings Article•

SLAYER: Spike Layer Error Reassignment in Time

[...]

Sumit Bam Shrestha¹, Garrick Orchard²•Institutions (2)

Nanyang Technological University¹, National University of Singapore²

05 Sep 2018

TL;DR: A new general back Propagation mechanism for learning synaptic weights and axonal delays which overcomes the problem of non-differentiability of the spike function and uses a temporal credit assignment policy for backpropagating error to preceding layers is introduced.

...read moreread less

Abstract: Configuring deep Spiking Neural Networks (SNNs) is an exciting research avenue for low power spike event based computation. However, the spike generation function is non-differentiable and therefore not directly compatible with the standard error backpropagation algorithm. In this paper, we introduce a new general backpropagation mechanism for learning synaptic weights and axonal delays which overcomes the problem of non-differentiability of the spike function and uses a temporal credit assignment policy for backpropagating error to preceding layers. We describe and release a GPU accelerated software implementation of our method which allows training both fully connected and convolutional neural network (CNN) architectures. Using our software, we compare our method against existing SNN based learning approaches and standard ANN to SNN conversion techniques and show that our method achieves state of the art performance for an SNN on the MNIST, NMNIST, DVS Gesture, and TIDIGITS datasets.

...read moreread less

Proceedings Article•

An intriguing failing of convolutional neural networks and the CoordConv solution

[...]

Rosanne Liu¹, Joel Lehman², Piero Molino¹, Felipe Petroski Such³, Eric Frank, Alex Sergeev, Jason Yosinski⁴ - Show less +3 more•Institutions (4)

Uber ¹, IT University of Copenhagen², Rochester Institute of Technology³, Cornell University⁴

01 Jan 2018

TL;DR: CoordConv as discussed by the authors proposes to give convolution access to its own input coordinates through the use of extra coordinate channels, allowing networks to learn either complete translation invariance or varying degrees of translation dependence, as required by the end task.

...read moreread less

Abstract: Few ideas have enjoyed as large an impact on deep learning as convolution. For any problem involving pixels or spatial representations, common intuition holds that convolutional neural networks may be appropriate. In this paper we show a striking counterexample to this intuition via the seemingly trivial coordinate transform problem, which simply requires learning a mapping between coordinates in (x,y) Cartesian space and coordinates in one-hot pixel space. Although convolutional networks would seem appropriate for this task, we show that they fail spectacularly. We demonstrate and carefully analyze the failure first on a toy problem, at which point a simple fix becomes obvious. We call this solution CoordConv, which works by giving convolution access to its own input coordinates through the use of extra coordinate channels. Without sacrificing the computational and parametric efficiency of ordinary convolution, CoordConv allows networks to learn either complete translation invariance or varying degrees of translation dependence, as required by the end task. CoordConv solves the coordinate transform problem with perfect generalization and 150 times faster with 10--100 times fewer parameters than convolution. This stark contrast raises the question: to what extent has this inability of convolution persisted insidiously inside other tasks, subtly hampering performance from within? A complete answer to this question will require further investigation, but we show preliminary evidence that swapping convolution for CoordConv can improve models on a diverse set of tasks. Using CoordConv in a GAN produced less mode collapse as the transform between high-level spatial latents and pixels becomes easier to learn. A Faster R-CNN detection model trained on MNIST detection showed 24% better IOU when using CoordConv, and in the Reinforcement Learning (RL) domain agents playing Atari games benefit significantly from the use of CoordConv layers.

...read moreread less

Proceedings Article•DOI•

Generating Adversarial Examples with Adversarial Networks

[...]

Chaowei Xiao¹, Bo Li², Jun-Yan Zhu³, Jun-Yan Zhu², Warren He², Mingyan Liu¹, Dawn Song² - Show less +3 more•Institutions (3)

University of Michigan¹, University of California, Berkeley², Massachusetts Institute of Technology³

15 Feb 2018

TL;DR: In this paper, the authors proposed AdvGAN to generate adversarial examples with Generative Adversarial Networks (GANs), which can learn and approximate the distribution of original instances.

...read moreread less

Abstract: Deep neural networks (DNNs) have been found to be vulnerable to adversarial examples resulting from adding small-magnitude perturbations to inputs. Such adversarial examples can mislead DNNs to produce adversary-selected results. Different attack strategies have been proposed to generate adversarial examples, but how to produce them with high perceptual quality and more efficiently requires more research efforts. In this paper, we propose AdvGAN to generate adversarial examples with generative adversarial networks (GANs), which can learn and approximate the distribution of original instances. For AdvGAN, once the generator is trained, it can generate adversarial perturbations efficiently for any instance, so as to potentially accelerate adversarial training as defenses. We apply AdvGAN in both semi-whitebox and black-box attack settings. In semi-whitebox attacks, there is no need to access the original target model after the generator is trained, in contrast to traditional white-box attacks. In black-box attacks, we dynamically train a distilled model for the black-box model and optimize the generator accordingly. Adversarial examples generated by AdvGAN on different target models have high attack success rate under state-of-the-art defenses compared to other attacks. Our attack has placed the first with 92.76% accuracy on a public MNIST black-box attack challenge.

...read moreread less

Journal Article•DOI•

Spatio-Temporal Backpropagation for Training High-Performance Spiking Neural Networks.

[...]

Yujie Wu¹, Lei Deng², Lei Deng¹, Guoqi Li¹, Jun Zhu¹, Luping Shi¹ - Show less +2 more•Institutions (2)

Tsinghua University¹, University of California, Santa Barbara²

23 May 2018-Frontiers in Neuroscience

TL;DR: A spatio-temporal backpropagation (STBP) algorithm for training high-performance SNNs is proposed, which combines the layer-by-layer spatial domain (SD) and the timing-dependent temporal domain (TD), and does not require any additional complicated skill.

...read moreread less

Abstract: Spiking neural networks (SNNs) are promising in ascertaining brain-like behaviors since spikes are capable of encoding spatio-temporal information. Recent schemes, e.g. pre-training from artificial neural networks (ANNs) or direct training based on backpropagation (BP), make the high-performance supervised training of SNNs possible. However, these methods primarily fasten more attention on its spatial domain information, and the dynamics in temporal domain are attached less significance. Consequently, this might lead to the performance bottleneck, and scores of training techniques shall be additionally required. Another underlying problem is that the spike activity is naturally non-differentiable, raising more difficulties in supervised training of SNNs. In this paper, we propose a spatio-temporal backpropagation (STBP) algorithm for training high-performance spiking neural networks. In order to solve the non-differentiable problem of SNNs, an approximated derivative for spike activity is proposed, being appropriate for gradient descent training. The STBP algorithm combines the layer-by-layer spatial domain (SD) and the timing-dependent temporal domain (TD), and does not require any additional complicated skill. We evaluate this method through adopting both the fully connected and convolutional architecture on the static MNIST dataset, a custom object detection dataset, and the dynamic N-MNIST dataset. Results bespeak that our approach achieves the best accuracy compared with existing state-of-the-art algorithms on spiking networks. This work provides a new perspective to investigate the high-performance SNNs for future brain-like computing paradigm with rich spatio-temporal dynamics.

...read moreread less

Posted Content•

On the Effectiveness of Interval Bound Propagation for Training Verifiably Robust Models

[...]

Sven Gowal, Krishnamurthy Dvijotham, Robert Stanforth, Rudy Bunel, Chongli Qin, Jonathan Uesato, Relja Arandjelovic, Timothy A. Mann, Pushmeet Kohli - Show less +5 more

30 Oct 2018-arXiv: Learning

TL;DR: This work shows how a simple bounding technique, interval bound propagation (IBP), can be exploited to train large provably robust neural networks that beat the state-of-the-art in verified accuracy and allows the largest model to be verified beyond vacuous bounds on a downscaled version of ImageNet.

...read moreread less

Abstract: Recent work has shown that it is possible to train deep neural networks that are provably robust to norm-bounded adversarial perturbations. Most of these methods are based on minimizing an upper bound on the worst-case loss over all possible adversarial perturbations. While these techniques show promise, they often result in difficult optimization procedures that remain hard to scale to larger networks. Through a comprehensive analysis, we show how a simple bounding technique, interval bound propagation (IBP), can be exploited to train large provably robust neural networks that beat the state-of-the-art in verified accuracy. While the upper bound computed by IBP can be quite weak for general networks, we demonstrate that an appropriate loss and clever hyper-parameter schedule allow the network to adapt such that the IBP bound is tight. This results in a fast and stable learning algorithm that outperforms more sophisticated methods and achieves state-of-the-art results on MNIST, CIFAR-10 and SVHN. It also allows us to train the largest model to be verified beyond vacuous bounds on a downscaled version of ImageNet.

...read moreread less

Posted Content•

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

[...]

Yuanzhi Li¹, Yingyu Liang²•Institutions (2)

Stanford University¹, University of Wisconsin-Madison²

03 Aug 2018-arXiv: Learning

TL;DR: In this article, the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization is studied.

...read moreread less

Abstract: Neural networks have many successful applications, while much less theoretical understanding has been gained. Towards bridging this gap, we study the problem of learning a two-layer overparameterized ReLU neural network for multi-class classification via stochastic gradient descent (SGD) from random initialization. In the overparameterized setting, when the data comes from mixtures of well-separated distributions, we prove that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels. Furthermore, the analysis provides interesting insights into several aspects of learning neural networks and can be verified based on empirical studies on synthetic data and on the MNIST dataset.

...read moreread less

Proceedings Article•

VAE with a VampPrior

[...]

Jakub M. Tomczak¹, Max Welling¹•Institutions (1)

University of Amsterdam¹

01 Jan 2018

TL;DR: In this article, the VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs.

...read moreread less

Abstract: Many different methods to train deep generative models have been introduced in the past. In this paper, we propose to extend the variational auto-encoder (VAE) framework with a new type of prior which we call "Variational Mixture of Posteriors" prior, or VampPrior for short. The VampPrior consists of a mixture distribution (e.g., a mixture of Gaussians) with components given by variational posteriors conditioned on learnable pseudo-inputs. We further extend this prior to a two layer hierarchical model and show that this architecture with a coupled prior and posterior, learns significantly better models. The model also avoids the usual local optima issues related to useless latent dimensions that plague VAEs. We provide empirical studies on six datasets, namely, static and binary MNIST, OMNIGLOT, Caltech 101 Silhouettes, Frey Faces and Histopathology patches, and show that applying the hierarchical VampPrior delivers state-of-the-art results on all datasets in the unsupervised permutation invariant setting and the best results or comparable to SOTA methods for the approach with convolutional networks.

...read moreread less

Proceedings Article•

Learning Overparameterized Neural Networks via Stochastic Gradient Descent on Structured Data

[...]

Yuanzhi Li¹, Yingyu Liang²•Institutions (2)

Stanford University¹, University of Wisconsin-Madison²

03 Aug 2018

TL;DR: It is proved that SGD learns a network with a small generalization error, albeit the network has enough capacity to fit arbitrary labels, when the data comes from mixtures of well-separated distributions.

...read moreread less

Proceedings Article•DOI•

Learning Steerable Filters for Rotation Equivariant CNNs

[...]

Maurice Weiler¹, Fred A. Hamprecht², Martin Storath³•Institutions (3)

University of Amsterdam¹, Heidelberg University², Interdisciplinary Center for Scientific Computing³

01 Jun 2018

TL;DR: In this article, steerable filter convolutional neural networks (SFCNNs) are proposed to achieve joint equivariance under translations and rotations by design, which achieves state-of-the-art performance on the rotated MNIST benchmark and on the ISBI 2012 2D EM segmentation challenge.

...read moreread less

Abstract: In many machine learning tasks it is desirable that a model's prediction transforms in an equivariant way under transformations of its input. Convolutional neural networks (CNNs) implement translational equivariance by construction; for other transformations, however, they are compelled to learn the proper mapping. In this work, we develop Steerable Filter CNNs (SFCNNs) which achieve joint equivariance under translations and rotations by design. The proposed architecture employs steerable filters to efficiently compute orientation dependent responses for many orientations without suffering interpolation artifacts from filter rotation. We utilize group convolutions which guarantee an equivariant mapping. In addition, we generalize He's weight initialization scheme to filters which are defined as a linear combination of a system of atomic filters. Numerical experiments show a substantial enhancement of the sample complexity with a growing number of sampled filter orientations and confirm that the network generalizes learned patterns over orientations. The proposed approach achieves state-of-the-art on the rotated MNIST benchmark and on the ISBI 2012 2D EM segmentation challenge.

...read moreread less

Journal Article•DOI•

An Analysis Of Convolutional Neural Networks For Image Classification

[...]

Neha Sharma¹, Vibhor Jain¹, Anju Mishra¹•Institutions (1)

Amity University¹

01 Jan 2018-Procedia Computer Science

TL;DR: The analysis of the performance of popular convolutional neural networks for identifying objects in real time video feeds shows that GoogLeNet and ResNet50 are able to recognize objects with better precision compared to Alex Net.

...read moreread less

Proceedings Article•

Snip: single-shot network pruning based on connection sensitivity

[...]

Namhoon Lee¹, Thalaiyasingam Ajanthan¹, Philip H. S. Torr¹•Institutions (1)

University of Oxford¹

27 Sep 2018

TL;DR: This work presents a new approach that prunes a given network once at initialization prior to training, and introduces a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task.

...read moreread less

Abstract: Pruning large neural networks while maintaining their performance is often desirable due to the reduced space and time complexity. In existing methods, pruning is done within an iterative optimization procedure with either heuristically designed pruning schedules or additional hyperparameters, undermining their utility. In this work, we present a new approach that prunes a given network once at initialization prior to training. To achieve this, we introduce a saliency criterion based on connection sensitivity that identifies structurally important connections in the network for the given task. This eliminates the need for both pretraining and the complex pruning schedule while making it robust to architecture variations. After pruning, the sparse network is trained in the standard way. Our method obtains extremely sparse networks with virtually the same accuracy as the reference network on the MNIST, CIFAR-10, and Tiny-ImageNet classification tasks and is broadly applicable to various architectures including convolutional, residual and recurrent networks. Unlike existing methods, our approach enables us to demonstrate that the retained connections are indeed relevant to the given task.

...read moreread less

Journal Article•DOI•

Supervised Learning Based on Temporal Coding in Spiking Neural Networks

[...]

Hesham Mostafa¹•Institutions (1)

University of California, San Diego¹

01 Jul 2018-IEEE Transactions on Neural Networks

TL;DR: This work shows that in a feedforward spiking network that uses a temporal coding scheme where information is encoded in spike times instead of spike rates, the network input–output relation is differentiable almost everywhere and this relation is piecewise linear after a transformation of variables.

...read moreread less

Abstract: Gradient descent training techniques are remarkably successful in training analog-valued artificial neural networks (ANNs). Such training techniques, however, do not transfer easily to spiking networks due to the spike generation hard nonlinearity and the discrete nature of spike communication. We show that in a feedforward spiking network that uses a temporal coding scheme where information is encoded in spike times instead of spike rates, the network input–output relation is differentiable almost everywhere. Moreover, this relation is piecewise linear after a transformation of variables. Methods for training ANNs thus carry directly to the training of such spiking networks as we show when training on the permutation invariant MNIST task. In contrast to rate-based spiking networks that are often used to approximate the behavior of ANNs, the networks we present spike much more sparsely and their behavior cannot be directly approximated by conventional ANNs. Our results highlight a new approach for controlling the behavior of spiking networks with realistic temporal dynamics, opening up the potential for using these networks to process spike patterns with complex temporal information.

...read moreread less

Proceedings Article•DOI•

DeepMutation: Mutation Testing of Deep Learning Systems

[...]

Lei Ma¹, Lei Ma², Fuyuan Zhang¹, Jiyuan Sun³, Minhui Xue¹, Bo Li⁴, Felix Juefei-Xu⁵, Chao Xie³, Li Li⁶, Yang Liu¹, Jianjun Zhao³, Yadong Wang² - Show less +8 more•Institutions (6)

Nanyang Technological University¹, Harbin Institute of Technology², Kyushu University³, University of Illinois at Urbana–Champaign⁴, Carnegie Mellon University⁵, Monash University⁶

16 Nov 2018

TL;DR: This paper proposes a mutation testing framework specialized for DL systems to measure the quality of test data, and designs a set of model-level mutation operators that directly inject faults into DL models without a training process.

...read moreread less

Abstract: Deep learning (DL) defines a new data-driven programming paradigm where the internal system logic is largely shaped by the training data. The standard way of evaluating DL models is to examine their performance on a test dataset. The quality of the test dataset is of great importance to gain confidence of the trained models. Using an inadequate test dataset, DL models that have achieved high test accuracy may still lack generality and robustness. In traditional software testing, mutation testing is a well-established technique for quality evaluation of test suites, which analyzes to what extent a test suite detects the injected faults. However, due to the fundamental difference between traditional software and deep learning-based software, traditional mutation testing techniques cannot be directly applied to DL systems. In this paper, we propose a mutation testing framework specialized for DL systems to measure the quality of test data. To do this, by sharing the same spirit of mutation testing in traditional software, we first define a set of source-level mutation operators to inject faults to the source of DL (i.e., training data and training programs). Then we design a set of model-level mutation operators that directly inject faults into DL models without a training process. Eventually, the quality of test data could be evaluated from the analysis on to what extent the injected faults could be detected. The usefulness of the proposed mutation testing techniques is demonstrated on two public datasets, namely MNIST and CIFAR-10, with three DL models.

...read moreread less