Showing papers on "Artificial neural network published in 2015"

PDF

Open Access

Journal Article•DOI•

Human-level control through deep reinforcement learning

[...]

Volodymyr Mnih¹, Koray Kavukcuoglu¹, David Silver¹, Andrei Rusu¹, Joel Veness¹, Marc G. Bellemare¹, Alex Graves¹, Martin Riedmiller¹, Andreas K. Fidjeland¹, Georg Ostrovski¹, Stig Petersen¹, Charles Beattie¹, Amir Sadik¹, Ioannis Antonoglou¹, Helen King¹, Dharshan Kumaran¹, Daan Wierstra¹, Shane Legg¹, Demis Hassabis¹ - Show less +15 more•Institutions (1)

Google¹

26 Feb 2015-Nature

TL;DR: This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

...read moreread less

Abstract: The theory of reinforcement learning provides a normative account, deeply rooted in psychological and neuroscientific perspectives on animal behaviour, of how agents may optimize their control of an environment. To use reinforcement learning successfully in situations approaching real-world complexity, however, agents are confronted with a difficult task: they must derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience to new situations. Remarkably, humans and other animals seem to solve this problem through a harmonious combination of reinforcement learning and hierarchical sensory processing systems, the former evidenced by a wealth of neural data revealing notable parallels between the phasic signals emitted by dopaminergic neurons and temporal difference reinforcement learning algorithms. While reinforcement learning agents have achieved some successes in a variety of domains, their applicability has previously been limited to domains in which useful features can be handcrafted, or to domains with fully observed, low-dimensional state spaces. Here we use recent advances in training deep neural networks to develop a novel artificial agent, termed a deep Q-network, that can learn successful policies directly from high-dimensional sensory inputs using end-to-end reinforcement learning. We tested this agent on the challenging domain of classic Atari 2600 games. We demonstrate that the deep Q-network agent, receiving only the pixels and the game score as inputs, was able to surpass the performance of all previous algorithms and achieve a level comparable to that of a professional human games tester across a set of 49 games, using the same algorithm, network architecture and hyperparameters. This work bridges the divide between high-dimensional sensory inputs and actions, resulting in the first artificial agent that is capable of learning to excel at a diverse array of challenging tasks.

...read moreread less

23,074 citations

Proceedings Article•

Neural Machine Translation by Jointly Learning to Align and Translate

[...]

Dzmitry Bahdanau¹, Kyunghyun Cho², Yoshua Bengio²•Institutions (2)

Jacobs University Bremen¹, Université de Montréal²

01 Jan 2015

TL;DR: It is conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and it is proposed to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

...read moreread less

20,027 citations

Journal Article•DOI•

Deep learning in neural networks

[...]

Jürgen Schmidhuber¹•Institutions (1)

University of Lugano¹

01 Jan 2015-Neural Networks

TL;DR: This historical survey compactly summarizes relevant work, much of it from the previous millennium, review deep supervised learning, unsupervised learning, reinforcement learning & evolutionary computation, and indirect search for short programs encoding deep and large networks.

...read moreread less

14,635 citations

Posted Content•

Distilling the Knowledge in a Neural Network

[...]

Geoffrey E. Hinton, Oriol Vinyals, Jeffrey Dean

09 Mar 2015-arXiv: Machine Learning

TL;DR: This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.

...read moreread less

Abstract: A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Unfortunately, making predictions using a whole ensemble of models is cumbersome and may be too computationally expensive to allow deployment to a large number of users, especially if the individual models are large neural nets. Caruana and his collaborators have shown that it is possible to compress the knowledge in an ensemble into a single model which is much easier to deploy and we develop this approach further using a different compression technique. We achieve some surprising results on MNIST and we show that we can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model. We also introduce a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Unlike a mixture of experts, these specialist models can be trained rapidly and in parallel.

...read moreread less

12,857 citations

Posted Content•

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

[...]

01 Jan 2015-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.

...read moreread less

Abstract: TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at www.tensorflow.org.

...read moreread less

10,447 citations

Proceedings Article•

Spatial transformer networks

[...]

Max Jaderberg¹, Karen Simonyan¹, Andrew Zisserman¹, Koray Kavukcuoglu¹•Institutions (1)

Google¹

07 Dec 2015

TL;DR: This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.

...read moreread less

Abstract: Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We show that the use of spatial transformers results in models which learn invariance to translation, scale, rotation and more generic warping, resulting in state-of-the-art performance on several benchmarks, and for a number of classes of transformations.

...read moreread less

6,150 citations

Proceedings Article•

Learning both weights and connections for efficient neural networks

[...]

Song Han¹, Jeff Pool², John Tran², William J. Dally¹•Institutions (2)

Stanford University¹, Nvidia²

07 Dec 2015

TL;DR: In this paper, the authors proposed a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections using a three-step method.

...read moreread less

Abstract: Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the total number of parameters can be reduced by 13x, from 138 million to 10.3 million, again with no loss of accuracy.

...read moreread less

3,967 citations

Book•

Understanding Machine Learning: From Theory To Algorithms

[...]

Shai Shalev-Shwartz¹, Shai Ben-David²•Institutions (2)

Hebrew University of Jerusalem¹, University of Waterloo²

01 Jan 2015

TL;DR: The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way in an advanced undergraduate or beginning graduate course.

...read moreread less

Abstract: Machine learning is one of the fastest growing areas of computer science, with far-reaching applications. The aim of this textbook is to introduce machine learning, and the algorithmic paradigms it offers, in a principled way. The book provides an extensive theoretical account of the fundamental ideas underlying machine learning and the mathematical derivations that transform these principles into practical algorithms. Following a presentation of the basics of the field, the book covers a wide array of central topics that have not been addressed by previous textbooks. These include a discussion of the computational complexity of learning and the concepts of convexity and stability; important algorithmic paradigms including stochastic gradient descent, neural networks, and structured output learning; and emerging theoretical concepts such as the PAC-Bayes approach and compression-based bounds. Designed for an advanced undergraduate or beginning graduate course, the text makes the fundamentals and algorithms of machine learning accessible to students and non-expert readers in statistics, computer science, mathematics, and engineering.

...read moreread less

3,857 citations

Proceedings Article•DOI•

FlowNet: Learning Optical Flow with Convolutional Networks

[...]

Alexey Dosovitskiy¹, Philipp Fischery, Eddy Ilg¹, Philip Häusser², Caner Hazirbas², Vladimir Golkov², Patrick van der Smagt², Daniel Cremers², Thomas Brox¹ - Show less +5 more•Institutions (2)

University of Freiburg¹, Technische Universität München²

07 Dec 2015

TL;DR: In this paper, the authors propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations, and show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI.

...read moreread less

Abstract: Convolutional neural networks (CNNs) have recently been very successful in a variety of computer vision tasks, especially on those linked to recognition. Optical flow estimation has not been among the tasks CNNs succeeded at. In this paper we construct CNNs which are capable of solving the optical flow estimation problem as a supervised learning task. We propose and compare two architectures: a generic architecture and another one including a layer that correlates feature vectors at different image locations. Since existing ground truth data sets are not sufficiently large to train a CNN, we generate a large synthetic Flying Chairs dataset. We show that networks trained on this unrealistic data still generalize very well to existing datasets such as Sintel and KITTI, achieving competitive accuracy at frame rates of 5 to 10 fps.

...read moreread less

3,833 citations

Posted Content•

Learning Transferable Features with Deep Adaptation Networks

[...]

Mingsheng Long¹, Mingsheng Long², Yue Cao², Jianmin Wang², Michael I. Jordan¹ - Show less +1 more•Institutions (2)

University of California, Berkeley¹, Tsinghua University²

10 Feb 2015-arXiv: Learning

TL;DR: A new Deep Adaptation Network (DAN) architecture is proposed, which generalizes deep convolutional neural network to the domain adaptation scenario and can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding.

...read moreread less

Abstract: Recent studies reveal that a deep neural network can learn transferable features which generalize well to novel tasks for domain adaptation. However, as deep features eventually transition from general to specific along the network, the feature transferability drops significantly in higher layers with increasing domain discrepancy. Hence, it is important to formally reduce the dataset bias and enhance the transferability in task-specific layers. In this paper, we propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. In DAN, hidden representations of all task-specific layers are embedded in a reproducing kernel Hilbert space where the mean embeddings of different domain distributions can be explicitly matched. The domain discrepancy is further reduced using an optimal multi-kernel selection method for mean embedding matching. DAN can learn transferable features with statistical guarantees, and can scale linearly by unbiased estimate of kernel embedding. Extensive empirical evidence shows that the proposed architecture yields state-of-the-art image classification error rates on standard domain adaptation benchmarks.

...read moreread less

3,351 citations

Journal Article•DOI•

On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation.

[...]

Sebastian Bach¹, Alexander Binder², Grégoire Montavon¹, Frederick Klauschen³, Klaus-Robert Müller¹, Wojciech Samek¹ - Show less +2 more•Institutions (3)

Technical University of Berlin¹, Singapore University of Technology and Design², Charité³

10 Jul 2015-PLOS ONE

TL;DR: This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers by introducing a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks.

...read moreread less

Abstract: Understanding and interpreting classification decisions of automated image classification systems is of high value in many applications, as it allows to verify the reasoning of the system and provides additional information to the human expert. Although machine learning methods are solving very successfully a plethora of tasks, they have in most cases the disadvantage of acting as a black box, not providing any information about what made them arrive at a particular decision. This work proposes a general solution to the problem of understanding classification decisions by pixel-wise decomposition of nonlinear classifiers. We introduce a methodology that allows to visualize the contributions of single pixels to predictions for kernel-based classifiers over Bag of Words features and for multilayered neural networks. These pixel contributions can be visualized as heatmaps and are provided to a human expert who can intuitively not only verify the validity of the classification decision, but also focus further analysis on regions of potential interest. We evaluate our method for classifiers trained on PASCAL VOC 2009 images, synthetic image data containing geometric shapes, the MNIST handwritten digits data set and for the pre-trained ImageNet model available as part of the Caffe open source package.

...read moreread less

Posted Content•

Trust Region Policy Optimization

[...]

John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel - Show less +1 more

19 Feb 2015-arXiv: Learning

TL;DR: Trust Region Policy Optimization (TRPO) as mentioned in this paper is an iterative procedure for optimizing policies, with guaranteed monotonic improvement, which is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks.

...read moreread less

Abstract: We describe an iterative procedure for optimizing policies, with guaranteed monotonic improvement. By making several approximations to the theoretically-justified procedure, we develop a practical algorithm, called Trust Region Policy Optimization (TRPO). This algorithm is similar to natural policy gradient methods and is effective for optimizing large nonlinear policies such as neural networks. Our experiments demonstrate its robust performance on a wide variety of tasks: learning simulated robotic swimming, hopping, and walking gaits; and playing Atari games using images of the screen as input. Despite its approximations that deviate from the theory, TRPO tends to give monotonic improvement, with little tuning of hyperparameters.

...read moreread less

Proceedings Article•DOI•

MatConvNet: Convolutional Neural Networks for MATLAB

[...]

Andrea Vedaldi¹, Karel Lenc¹•Institutions (1)

University of Oxford¹

13 Oct 2015

TL;DR: MatConvNet exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing convolutions with filter banks, feature pooling, normalisation, and much more.

...read moreread less

Abstract: MatConvNet is an open source implementation of Convolutional Neural Networks (CNNs) with a deep integration in the MATLAB environment. The toolbox is designed with an emphasis on simplicity and flexibility. It exposes the building blocks of CNNs as easy-to-use MATLAB functions, providing routines for computing convolutions with filter banks, feature pooling, normalisation, and much more. MatConvNet can be easily extended, often using only MATLAB code, allowing fast prototyping of new CNN architectures. At the same time, it supports efficient computation on CPU and GPU, allowing to train complex models on large datasets such as ImageNet ILSVRC containing millions of training examples

...read moreread less

Siamese Neural Networks for One-shot Image Recognition

[...]

Gregory Koch¹, Richard S. Zemel¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

01 Jan 2015

TL;DR: A method for learning siamese neural networks which employ a unique structure to naturally rank similarity between inputs and is able to achieve strong results which exceed those of other deep learning models with near state-of-the-art performance on one-shot classification tasks.

...read moreread less

Abstract: The process of learning good features for machine learning applications can be very computationally expensive and may prove difficult in cases where little data is available. A prototypical example of this is the one-shot learning setting, in which we must correctly make predictions given only a single example of each new class. In this paper, we explore a method for learning siamese neural networks which employ a unique structure to naturally rank similarity between inputs. Once a network has been tuned, we can then capitalize on powerful discriminative features to generalize the predictive power of the network not just to new data, but to entirely new classes from unknown distributions. Using a convolutional architecture, we are able to achieve strong results which exceed those of other deep learning models with near state-of-the-art performance on one-shot classification tasks.

...read moreread less

Proceedings Article•

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

[...]

Liang-Chieh Chen¹, George Papandreou², Iasonas Kokkinos³, Kevin Murphy², Alan L. Yuille¹ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, Google², CentraleSupélec³

07 May 2015

TL;DR: DeepLab as mentioned in this paper combines the responses at the final layer with a fully connected CRF to localize segment boundaries at a level of accuracy beyond previous methods, achieving 71.6% IOU accuracy in the test set.

...read moreread less

Abstract: Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called "semantic image segmentation"). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our "DeepLab" system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantitatively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6% IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the 'hole' algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU.

...read moreread less

Proceedings Article•

Learning structured output representation using deep conditional generative models

[...]

Kihyuk Sohn¹, Xinchen Yan¹, Honglak Lee¹•Institutions (1)

University of Michigan¹

07 Dec 2015

TL;DR: A deep conditional generative model for structured output prediction using Gaussian latent variables is developed, trained efficiently in the framework of stochastic gradient variational Bayes, and allows for fast prediction using Stochastic feed-forward inference.

...read moreread less

Abstract: Supervised deep learning has been successfully applied to many recognition problems. Although it can approximate a complex many-to-one function well when a large amount of training data is provided, it is still challenging to model complex structured output representations that effectively perform probabilistic inference and make diverse predictions. In this work, we develop a deep conditional generative model for structured output prediction using Gaussian latent variables. The model is trained efficiently in the framework of stochastic gradient variational Bayes, and allows for fast prediction using stochastic feed-forward inference. In addition, we provide novel strategies to build robust structured prediction algorithms, such as input noise-injection and multi-scale prediction objective at training. In experiments, we demonstrate the effectiveness of our proposed algorithm in comparison to the deterministic deep neural network counterparts in generating diverse but realistic structured output predictions using stochastic inference. Furthermore, the proposed training methods are complimentary, which leads to strong pixel-level object segmentation and semantic labeling performance on Caltech-UCSD Birds 200 and the subset of Labeled Faces in the Wild dataset.

...read moreread less

Posted Content•

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

[...]

Tianqi Chen, Mu Li¹, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, Zheng Zhang - Show less +6 more•Institutions (1)

Carnegie Mellon University¹

03 Dec 2015-arXiv: Distributed, Parallel, and Cluster Computing

TL;DR: The API design and the system implementation of MXNet are described, and it is explained how embedding of both symbolic expression and tensor operation is handled in a unified fashion.

...read moreread less

Abstract: MXNet is a multi-language machine learning (ML) library to ease the development of ML algorithms, especially for deep neural networks. Embedded in the host language, it blends declarative symbolic expression with imperative tensor computation. It offers auto differentiation to derive gradients. MXNet is computation and memory efficient and runs on various heterogeneous systems, ranging from mobile devices to distributed GPU clusters. This paper describes both the API design and the system implementation of MXNet, and explains how embedding of both symbolic expression and tensor operation is handled in a unified fashion. Our preliminary experiments reveal promising results on large scale deep neural network applications using multiple GPU machines.

...read moreread less

Proceedings Article•DOI•

Beyond short snippets: Deep networks for video classification

[...]

Joe Yue-Hei Ng¹, Matthew Hausknecht², Sudheendra Vijayanarasimhan³, Oriol Vinyals³, Rajat Monga³, George Toderici³ - Show less +2 more•Institutions (3)

University of Maryland, College Park¹, University of Texas at Austin², Google³

07 Jun 2015

TL;DR: In this article, a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN was proposed to model the video as an ordered sequence of frames.

...read moreread less

Abstract: Convolutional neural networks (CNNs) have been extensively applied for image recognition problems giving state-of-the-art results on recognition, detection, segmentation and retrieval. In this work we propose and evaluate several deep neural network architectures to combine image information across a video over longer time periods than previously attempted. We propose two methods capable of handling full length videos. The first method explores various convolutional temporal feature pooling architectures, examining the various design choices which need to be made when adapting a CNN for this task. The second proposed method explicitly models the video as an ordered sequence of frames. For this purpose we employ a recurrent neural network that uses Long Short-Term Memory (LSTM) cells which are connected to the output of the underlying CNN. Our best networks exhibit significant performance improvements over previously published results on the Sports 1 million dataset (73.1% vs. 60.9%) and the UCF-101 datasets with (88.6% vs. 88.0%) and without additional optical flow information (82.6% vs. 73.0%).

...read moreread less

Proceedings Article•

Recurrent convolutional neural networks for text classification

[...]

Siwei Lai¹, Liheng Xu¹, Kang Liu¹, Jun Zhao¹•Institutions (1)

Chinese Academy of Sciences¹

25 Jan 2015

TL;DR: A recurrent convolutional neural network is introduced for text classification without human-designed features to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks.

...read moreread less

Abstract: Text classification is a foundational task in many NLP applications. Traditional text classifiers often rely on many human-designed features, such as dictionaries, knowledge bases and special tree kernels. In contrast to traditional methods, we introduce a recurrent convolutional neural network for text classification without human-designed features. In our model, we apply a recurrent structure to capture contextual information as far as possible when learning word representations, which may introduce considerably less noise compared to traditional window-based neural networks. We also employ a max-pooling layer that automatically judges which words play key roles in text classification to capture the key components in texts. We conduct experiments on four commonly used datasets. The experimental results show that the proposed method outperforms the state-of-the-art methods on several datasets, particularly on document-level datasets.

...read moreread less

Proceedings Article•DOI•

Privacy-Preserving Deep Learning

[...]

Reza Shokri¹, Vitaly Shmatikov²•Institutions (2)

University of Texas at Austin¹, Cornell University²

12 Oct 2015

TL;DR: This paper presents a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets, and exploits the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously.

...read moreread less

Abstract: Deep learning based on artificial neural networks is a very popular approach to modeling, classifying, and recognizing complex data such as images, speech, and text The unprecedented accuracy of deep learning methods has turned them into the foundation of new AI-based services on the Internet Commercial companies that collect user data on a large scale have been the main beneficiaries of this trend since the success of deep learning techniques is directly proportional to the amount of data available for training Massive data collection required for deep learning presents obvious privacy issues Users' personal, highly sensitive data such as photos and voice recordings is kept indefinitely by the companies that collect it Users can neither delete it, nor restrict the purposes for which it is used Furthermore, centrally kept data is subject to legal subpoenas and extra-judicial surveillance Many data owners--for example, medical institutions that may want to apply deep learning methods to clinical records--are prevented by privacy and confidentiality concerns from sharing the data and thus benefitting from large-scale deep learning In this paper, we design, implement, and evaluate a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets We exploit the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously Our system lets participants train independently on their own datasets and selectively share small subsets of their models' key parameters during training This offers an attractive point in the utility/privacy tradeoff space: participants preserve the privacy of their respective data while still benefitting from other participants' models and thus boosting their learning accuracy beyond what is achievable solely on their own inputs We demonstrate the accuracy of our privacy-preserving deep learning on benchmark datasets

...read moreread less

Proceedings Article•

End-to-end memory networks

[...]

Sainbayar Sukhbaatar¹, Arthur Szlam², Jason Weston², Rob Fergus²•Institutions (2)

New York University¹, Facebook²

07 Dec 2015

TL;DR: This paper proposed an end-to-end memory network with a recurrent attention model over a possibly large external memory, which can be seen as an extension of RNNsearch to the case where multiple computational steps (hops) are performed per output symbol.

...read moreread less

Abstract: We introduce a neural network with a recurrent attention model over a possibly large external memory. The architecture is a form of Memory Network [23] but unlike the model in that work, it is trained end-to-end, and hence requires significantly less supervision during training, making it more generally applicable in realistic settings. It can also be seen as an extension of RNNsearch [2] to the case where multiple computational steps (hops) are performed per output symbol. The flexibility of the model allows us to apply it to tasks as diverse as (synthetic) question answering [22] and to language modeling. For the former our approach is competitive with Memory Networks, but with less supervision. For the latter, on the Penn TreeBank and Text8 datasets our approach demonstrates comparable performance to RNNs and LSTMs. In both cases we show that the key concept of multiple computational hops yields improved results.

...read moreread less

Posted Content•

Learning both Weights and Connections for Efficient Neural Networks

[...]

Song Han¹, Jeff Pool², John Tran², William J. Dally¹•Institutions (2)

Stanford University¹, Nvidia²

08 Jun 2015-arXiv: Neural and Evolutionary Computing

TL;DR: A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.

...read moreread less

Abstract: Neural networks are both computationally intensive and memory intensive, making them difficult to deploy on embedded systems. Also, conventional networks fix the architecture before training starts; as a result, training cannot improve the architecture. To address these limitations, we describe a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Our method prunes redundant connections using a three-step method. First, we train the network to learn which connections are important. Next, we prune the unimportant connections. Finally, we retrain the network to fine tune the weights of the remaining connections. On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6.7 million, without incurring accuracy loss. Similar experiments with VGG-16 found that the number of parameters can be reduced by 13x, from 138 million to 10.3 million, again with no loss of accuracy.

...read moreread less

Posted Content•

Weight Uncertainty in Neural Networks

[...]

Charles Blundell, Julien Cornebise, Koray Kavukcuoglu, Daan Wierstra

20 May 2015-arXiv: Machine Learning

TL;DR: This work introduces a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop, and shows how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems.

...read moreread less

Abstract: We introduce a new, efficient, principled and backpropagation-compatible algorithm for learning a probability distribution on the weights of a neural network, called Bayes by Backprop. It regularises the weights by minimising a compression cost, known as the variational free energy or the expected lower bound on the marginal likelihood. We show that this principled kind of regularisation yields comparable performance to dropout on MNIST classification. We then demonstrate how the learnt uncertainty in the weights can be used to improve generalisation in non-linear regression problems, and how this weight uncertainty can be used to drive the exploration-exploitation trade-off in reinforcement learning.

...read moreread less

Journal Article•DOI•

Long short-term memory neural network for traffic speed prediction using remote microwave sensor data

[...]

Xiaolei Ma¹, Zhimin Tao¹, Yinhai Wang², Haiyang Yu¹, Yunpeng Wang¹ - Show less +1 more•Institutions (2)

Beihang University¹, University of Washington²

01 May 2015-Transportation Research Part C-emerging Technologies

TL;DR: A comparison with different topologies of dynamic neural networks as well as other prevailing parametric and nonparametric algorithms suggests that LSTM NN can achieve the best prediction performance in terms of both accuracy and stability.

...read moreread less

Abstract: Neural networks have been extensively applied to short-term traffic prediction in the past years. This study proposes a novel architecture of neural networks, Long Short-Term Neural Network (LSTM NN), to capture nonlinear traffic dynamic in an effective manner. The LSTM NN can overcome the issue of back-propagated error decay through memory blocks, and thus exhibits the superior capability for time series prediction with long temporal dependency. In addition, the LSTM NN can automatically determine the optimal time lags. To validate the effectiveness of LSTM NN, travel speed data from traffic microwave detectors in Beijing are used for model training and testing. A comparison with different topologies of dynamic neural networks as well as other prevailing parametric and nonparametric algorithms suggests that LSTM NN can achieve the best prediction performance in terms of both accuracy and stability.

...read moreread less

Proceedings Article•DOI•

Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks

[...]

Tara N. Sainath¹, Oriol Vinyals¹, Andrew W. Senior¹, Hasim Sak¹•Institutions (1)

Google¹

08 Sep 2015

TL;DR: This paper takes advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture, and finds that the CLDNN provides a 4-6% relative improvement in WER over an LSTM, the strongest of the three individual models.

...read moreread less

Abstract: Both Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) have shown improvements over Deep Neural Networks (DNNs) across a wide variety of speech recognition tasks. CNNs, LSTMs and DNNs are complementary in their modeling capabilities, as CNNs are good at reducing frequency variations, LSTMs are good at temporal modeling, and DNNs are appropriate for mapping features to a more separable space. In this paper, we take advantage of the complementarity of CNNs, LSTMs and DNNs by combining them into one unified architecture. We explore the proposed architecture, which we call CLDNN, on a variety of large vocabulary tasks, varying from 200 to 2,000 hours. We find that the CLDNN provides a 4–6% relative improvement in WER over an LSTM, the strongest of the three individual models.

...read moreread less

Proceedings Article•DOI•

Document Modeling with Gated Recurrent Neural Network for Sentiment Classification

[...]

Duyu Tang¹, Bing Qin¹, Ting Liu¹•Institutions (1)

Harbin Institute of Technology¹

01 Sep 2015

TL;DR: A neural network model is introduced to learn vector-based document representation in a unified, bottom-up fashion and dramatically outperforms standard recurrent neural network in document modeling for sentiment classification.

...read moreread less

Abstract: Document level sentiment classification remains a challenge: encoding the intrinsic relations between sentences in the semantic meaning of a document. To address this, we introduce a neural network model to learn vector-based document representation in a unified, bottom-up fashion. The model first learns sentence representation with convolutional neural network or long short-term memory. Afterwards, semantics of sentences and their relations are adaptively encoded in document representation with gated recurrent neural network. We conduct document level sentiment classification on four large-scale review datasets from IMDB and Yelp Dataset Challenge. Experimental results show that: (1) our neural model shows superior performances over several state-of-the-art algorithms; (2) gated recurrent neural network dramatically outperforms standard recurrent neural network in document modeling for sentiment classification. 1

...read moreread less

Proceedings Article•DOI•

Learning to compare image patches via convolutional neural networks

[...]

Sergey Zagoruyko¹, Nikos Komodakis¹•Institutions (1)

École des ponts ParisTech¹

07 Jun 2015

TL;DR: This paper shows how to learn directly from image data a general similarity function for comparing image patches, which is a task of fundamental importance for many computer vision problems, and opts for a CNN-based model that is trained to account for a wide variety of changes in image appearance.

...read moreread less

Abstract: In this paper we show how to learn directly from image data (i.e., without resorting to manually-designed features) a general similarity function for comparing image patches, which is a task of fundamental importance for many computer vision problems. To encode such a function, we opt for a CNN-based model that is trained to account for a wide variety of changes in image appearance. To that end, we explore and study multiple neural network architectures, which are specifically adapted to this task. We show that such an approach can significantly outperform the state-of-the-art on several problems and benchmark datasets.

...read moreread less

Posted Content•

Cyclical Learning Rates for Training Neural Networks

[...]

Leslie N. Smith¹•Institutions (1)

United States Naval Research Laboratory¹

03 Jun 2015-arXiv: Computer Vision and Pattern Recognition

TL;DR: Cyclical learning rates as discussed by the authors allows the learning rate cyclically vary between reasonable boundary values, which has been shown to improve classification accuracy without a need to tune and often in fewer iterations.

...read moreread less

Abstract: It is known that the learning rate is the most important hyper-parameter to tune for training deep neural networks This paper describes a new method for setting the learning rate, named cyclical learning rates, which practically eliminates the need to experimentally find the best values and schedule for the global learning rates Instead of monotonically decreasing the learning rate, this method lets the learning rate cyclically vary between reasonable boundary values Training with cyclical learning rates instead of fixed values achieves improved classification accuracy without a need to tune and often in fewer iterations This paper also describes a simple way to estimate "reasonable bounds" -- linearly increasing the learning rate of the network for a few epochs In addition, cyclical learning rates are demonstrated on the CIFAR-10 and CIFAR-100 datasets with ResNets, Stochastic Depth networks, and DenseNets, and the ImageNet dataset with the AlexNet and GoogLeNet architectures These are practical tools for everyone who trains neural networks

...read moreread less

Posted Content•

An Introduction to Convolutional Neural Networks

[...]

Keiron O'Shea¹, Ryan Nash²•Institutions (2)

Aberystwyth University¹, Lancaster University²

26 Nov 2015-arXiv: Neural and Evolutionary Computing

TL;DR: This document provides a brief introduction to CNNs, discussing recently published papers and newly formed techniques in developing these brilliantly fantastic image recognition models.

...read moreread less

Abstract: The field of machine learning has taken a dramatic twist in recent times, with the rise of the Artificial Neural Network (ANN). These biologically inspired computational models are able to far exceed the performance of previous forms of artificial intelligence in common machine learning tasks. One of the most impressive forms of ANN architecture is that of the Convolutional Neural Network (CNN). CNNs are primarily used to solve difficult image-driven pattern recognition tasks and with their precise yet simple architecture, offers a simplified method of getting started with ANNs. This document provides a brief introduction to CNNs, discussing recently published papers and newly formed techniques in developing these brilliantly fantastic image recognition models. This introduction assumes you are familiar with the fundamentals of ANNs and machine learning.

...read moreread less

Proceedings Article•

BinaryConnect: training deep neural networks with binary weights during propagations

[...]

Matthieu Courbariaux¹, Yoshua Bengio², Jean-Pierre David¹•Institutions (2)

École Polytechnique de Montréal¹, Université de Montréal²

07 Dec 2015

TL;DR: BinaryConnect is introduced, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated, and near state-of-the-art results with BinaryConnect are obtained on the permutation-invariant MNIST, CIFAR-10 and SVHN.

...read moreread less

Abstract: Deep Neural Networks (DNN) have achieved state-of-the-art results in a wide range of tasks, with the best results obtained with large training sets and large models. In the past, GPUs enabled these breakthroughs because of their greater computational speed. In the future, faster computation at both training and test time is likely to be crucial for further progress and for consumer applications on low-power devices. As a result, there is much interest in research and development of dedicated hardware for Deep Learning (DL). Binary weights, i.e., weights which are constrained to only two possible values (e.g. -1 or 1), would bring great benefits to specialized DL hardware by replacing many multiply-accumulate operations by simple accumulations, as multipliers are the most space and power-hungry components of the digital implementation of neural networks. We introduce BinaryConnect, a method which consists in training a DNN with binary weights during the forward and backward propagations, while retaining precision of the stored weights in which gradients are accumulated. Like other dropout schemes, we show that BinaryConnect acts as regularizer and we obtain near state-of-the-art results with BinaryConnect on the permutation-invariant MNIST, CIFAR-10 and SVHN.

...read moreread less

Collapse