Showing papers on "Recurrent neural network published in 2016"

PDF

Open Access

Posted Content•

[...]

Jimmy Ba, Jamie Ryan Kiros, Geoffrey E. Hinton

21 Jul 2016-arXiv: Machine Learning

TL;DR: In this paper, layer normalization is applied to recurrent neural networks by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case.

...read moreread less

Abstract: Training state-of-the-art, deep neural networks is computationally expensive One way to reduce the training time is to normalize the activities of the neurons A recently introduced technique called batch normalization uses the distribution of the summed input to a neuron over a mini-batch of training cases to compute a mean and variance which are then used to normalize the summed input to that neuron on each training case This significantly reduces the training time in feed-forward neural networks However, the effect of batch normalization is dependent on the mini-batch size and it is not obvious how to apply it to recurrent neural networks In this paper, we transpose batch normalization into layer normalization by computing the mean and variance used for normalization from all of the summed inputs to the neurons in a layer on a single training case Like batch normalization, we also give each neuron its own adaptive bias and gain which are applied after the normalization but before the non-linearity Unlike batch normalization, layer normalization performs exactly the same computation at training and test times It is also straightforward to apply to recurrent neural networks by computing the normalization statistics separately at each time step Layer normalization is very effective at stabilizing the hidden state dynamics in recurrent networks Empirically, we show that layer normalization can substantially reduce the training time compared with previously published techniques

...read moreread less

3,780 citations

Proceedings Article•DOI•

Listen, attend and spell: A neural network for large vocabulary conversational speech recognition

[...]

William Chan¹, Navdeep Jaitly², Quoc V. Le², Oriol Vinyals²•Institutions (2)

Carnegie Mellon University¹, Google²

20 Mar 2016

TL;DR: Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional speech recognizers is presented.

...read moreread less

Abstract: We present Listen, Attend and Spell (LAS), a neural speech recognizer that transcribes speech utterances directly to characters without pronunciation models, HMMs or other components of traditional speech recognizers. In LAS, the neural network architecture subsumes the acoustic, pronunciation and language models making it not only an end-to-end trained system but an end-to-end model. In contrast to DNN-HMM, CTC and most other models, LAS makes no independence assumptions about the probability distribution of the output character sequences given the acoustic sequence. Our system has two components: a listener and a speller. The listener is a pyramidal recurrent network encoder that accepts filter bank spectra as inputs. The speller is an attention-based recurrent network decoder that emits each character conditioned on all previous characters, and the entire acoustic sequence. On a Google voice search task, LAS achieves a WER of 14.1% without a dictionary or an external language model and 10.3% with language model rescoring over the top 32 beams. In comparison, the state-of-the-art CLDNN-HMM model achieves a WER of 8.0% on the same set.

...read moreread less

2,279 citations

Journal Article•DOI•

Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition

[...]

Fco. Javier Ordóñez¹, Daniel Roggen¹•Institutions (1)

University of Sussex¹

18 Jan 2016-Sensors

TL;DR: A generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which is suitable for multimodal wearable sensors, does not require expert knowledge in designing features, and explicitly models the temporal dynamics of feature activations is proposed.

...read moreread less

Abstract: Human activity recognition (HAR) tasks have traditionally been solved using engineered features obtained by heuristic processes. Current research suggests that deep convolutional neural networks are suited to automate feature extraction from raw sensor inputs. However, human activities are made of complex sequences of motor movements, and capturing this temporal dynamics is fundamental for successful HAR. Based on the recent success of recurrent neural networks for time series domains, we propose a generic deep framework for activity recognition based on convolutional and LSTM recurrent units, which: (i) is suitable for multimodal wearable sensors; (ii) can perform sensor fusion naturally; (iii) does not require expert knowledge in designing features; and (iv) explicitly models the temporal dynamics of feature activations. We evaluate our framework on two datasets, one of which has been used in a public activity recognition challenge. Our results show that our framework outperforms competing deep non-recurrent networks on the challenge dataset by 4% on average; outperforming some of the previous reported results by up to 9%. Our results show that the framework can be applied to homogeneous sensor modalities, but can also fuse multimodal sensors to improve performance. We characterise key architectural hyperparameters’ influence on performance to provide insights about their optimisation.

...read moreread less

1,896 citations

Proceedings Article•

Pixel recurrent neural networks

[...]

Aaron van den Oord¹, Nal Kalchbrenner¹, Koray Kavukcuoglu¹•Institutions (1)

Google¹

19 Jun 2016

TL;DR: A deep neural network is presented that sequentially predicts the pixels in an image along the two spatial dimensions and encodes the complete set of dependencies in the image to achieve log-likelihood scores on natural images that are considerably better than the previous state of the art.

...read moreread less

Abstract: Modeling the distribution of natural images is a landmark problem in unsupervised learning. This task requires an image model that is at once expressive, tractable and scalable. We present a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions. Our method models the discrete probability of the raw pixel values and encodes the complete set of dependencies in the image. Architectural novelties include fast two-dimensional recurrent layers and an effective use of residual connections in deep recurrent networks. We achieve log-likelihood scores on natural images that are considerably better than the previous state of the art. Our main results also provide benchmarks on the diverse ImageNet dataset. Samples generated from the model appear crisp, varied and globally coherent.

...read moreread less

1,801 citations

Proceedings Article•

A theoretically grounded application of dropout in recurrent neural networks

[...]

Yarin Gal¹, Zoubin Ghahramani¹•Institutions (1)

University of Cambridge¹

05 Dec 2016

TL;DR: The authors apply this variational inference based dropout technique in LSTM and GRU models, assessing it on language modelling and sentiment analysis tasks, and to the best of their knowledge improve on the single model state-of-the-art in language modelling with the Penn Treebank (73.4 test perplexity).

...read moreread less

Abstract: Recurrent neural networks (RNNs) stand at the forefront of many recent developments in deep learning. Yet a major difficulty with these models is their tendency to overfit, with dropout shown to fail when applied to recurrent layers. Recent results at the intersection of Bayesian modelling and deep learning offer a Bayesian interpretation of common deep learning techniques such as dropout. This grounding of dropout in approximate Bayesian inference suggests an extension of the theoretical results, offering insights into the use of dropout with RNN models. We apply this new variational inference based dropout technique in LSTM and GRU models, assessing it on language modelling and sentiment analysis tasks. The new approach outperforms existing techniques, and to the best of our knowledge improves on the single model state-of-the-art in language modelling with the Penn Treebank (73.4 test perplexity). This extends our arsenal of variational tools in deep learning.

...read moreread less

1,557 citations

Proceedings Article•

Character-aware neural language models

[...]

Yoon Kim¹, Yacine Jernite², David Sontag², Alexander M. Rush¹•Institutions (2)

Harvard University¹, Courant Institute of Mathematical Sciences²

12 Feb 2016

TL;DR: A simple neural language model that relies only on character-level inputs that is able to encode, from characters only, both semantic and orthographic information and suggests that on many languages, character inputs are sufficient for language modeling.

...read moreread less

Abstract: We describe a simple neural language model that relies only on character-level inputs. Predictions are still made at the word-level. Our model employs a convolutional neural network (CNN) and a highway network over characters, whose output is given to a long short-term memory (LSTM) recurrent neural network language model (RNN-LM). On the English Penn Treebank the model is on par with the existing state-of-the-art despite having 60% fewer parameters. On languages with rich morphology (Arabic, Czech, French, German, Spanish, Russian), the model outperforms word-level/morpheme-level LSTM baselines, again with fewer parameters. The results suggest that on many languages, character inputs are sufficient for language modeling. Analysis of word representations obtained from the character composition part of the model reveals that the model is able to encode, from characters only, both semantic and orthographic information.

...read moreread less

1,499 citations

Journal Article•DOI•

Hybrid computing using a neural network with dynamic external memory

[...]

Alex Graves¹, Greg Wayne¹, Malcolm Reynolds¹, Tim Harley¹, Ivo Danihelka¹, Agnieszka Grabska-Barwinska¹, Sergio Gomez Colmenarejo¹, Edward Grefenstette¹, Tiago Ramalho¹, John P. Agapiou¹, Adrià Puigdomènech Badia¹, Karl Moritz Hermann¹, Yori Zwols¹, Georg Ostrovski¹, Adam Cain¹, Helen King¹, Christopher Summerfield¹, Phil Blunsom¹, Koray Kavukcuoglu¹, Demis Hassabis¹ - Show less +16 more•Institutions (1)

Google¹

27 Oct 2016-Nature

TL;DR: A machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer.

...read moreread less

Abstract: Artificial neural networks are remarkably adept at sensory processing, sequence learning and reinforcement learning, but are limited in their ability to represent variables and data structures and to store data over long timescales, owing to the lack of an external memory. Here we introduce a machine learning model called a differentiable neural computer (DNC), which consists of a neural network that can read from and write to an external memory matrix, analogous to the random-access memory in a conventional computer. Like a conventional computer, it can use its memory to represent and manipulate complex data structures, but, like a neural network, it can learn to do so from data. When trained with supervised learning, we demonstrate that a DNC can successfully answer synthetic questions designed to emulate reasoning and inference problems in natural language. We show that it can learn tasks such as finding the shortest path between specified points and inferring the missing links in randomly generated graphs, and then generalize these tasks to specific graphs such as transport networks and family trees. When trained with reinforcement learning, a DNC can complete a moving blocks puzzle in which changing goals are specified by sequences of symbols. Taken together, our results demonstrate that DNCs have the capacity to solve complex, structured tasks that are inaccessible to neural networks without external read-write memory.

...read moreread less

1,413 citations

Proceedings Article•DOI•

Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond

[...]

Ramesh Nallapati¹, Bowen Zhou¹, Cicero Nogueira dos Santos¹, Caglar Gulcehre², Bing Xiang¹ - Show less +1 more•Institutions (2)

IBM¹, Université de Montréal²

19 Feb 2016

TL;DR: This paper proposed several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-toword structure, and emitting words that are rare or unseen at training time.

...read moreread less

Abstract: In this work, we model abstractive text summarization using Attentional EncoderDecoder Recurrent Neural Networks, and show that they achieve state-of-the-art performance on two different corpora. We propose several novel models that address critical problems in summarization that are not adequately modeled by the basic architecture, such as modeling key-words, capturing the hierarchy of sentence-toword structure, and emitting words that are rare or unseen at training time. Our work shows that many of our proposed models contribute to further improvement in performance. We also propose a new dataset consisting of multi-sentence summaries, and establish performance benchmarks for further research.

...read moreread less

1,405 citations

Proceedings Article•

Deep multi-scale video prediction beyond mean square error

[...]

Michael Mathieu¹, Michael Mathieu², Camille Couprie¹, Yann LeCun², Yann LeCun¹ - Show less +1 more•Institutions (2)

Facebook¹, New York University²

01 Jan 2016

TL;DR: This work trains a convolutional network to generate future frames given an input sequence and proposes three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function.

...read moreread less

Abstract: Learning to predict future images from a video sequence involves the construction of an internal representation that models the image evolution accurately, and therefore, to some degree, its content and dynamics. This is why pixel-space video prediction may be viewed as a promising avenue for unsupervised feature learning. In addition, while optical flow has been a very studied problem in computer vision for a long time, future frame prediction is rarely approached. Still, many vision applications could benefit from the knowledge of the next frames of videos, that does not require the complexity of tracking every pixel trajectories. In this work, we train a convolutional network to generate future frames given an input sequence. To deal with the inherently blurry predictions obtained from the standard Mean Squared Error (MSE) loss function, we propose three different and complementary feature learning strategies: a multi-scale architecture, an adversarial training method, and an image gradient difference loss function. We compare our predictions to different published results based on recurrent neural networks on the UCF101 dataset

...read moreread less

1,369 citations

Book Chapter•DOI•

3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction

[...]

Christopher Choy¹, Danfei Xu¹, JunYoung Gwak¹, Kevin Chen¹, Silvio Savarese¹ - Show less +1 more•Institutions (1)

Stanford University¹

08 Oct 2016

TL;DR: 3D-R2N2 as discussed by the authors proposes a 3D Recurrent Reconstruction Neural Network that learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data.

...read moreread less

Abstract: Inspired by the recent success of methods that employ shape priors to achieve robust 3D reconstructions, we propose a novel recurrent neural network architecture that we call the 3D Recurrent Reconstruction Neural Network (3D-R2N2). The network learns a mapping from images of objects to their underlying 3D shapes from a large collection of synthetic data [13]. Our network takes in one or more images of an object instance from arbitrary viewpoints and outputs a reconstruction of the object in the form of a 3D occupancy grid. Unlike most of the previous works, our network does not require any image annotations or object class labels for training or testing. Our extensive experimental analysis shows that our reconstruction framework (i) outperforms the state-of-the-art methods for single view reconstruction, and (ii) enables the 3D reconstruction of objects in situations when traditional SFM/SLAM methods fail (because of lack of texture and/or wide baseline).

...read moreread less

1,336 citations

Posted Content•

Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations

[...]

Itay Hubara¹, Matthieu Courbariaux², Daniel Soudry³, Ran El-Yaniv¹, Yoshua Bengio² - Show less +1 more•Institutions (3)

Technion – Israel Institute of Technology¹, Université de Montréal², Columbia University³

22 Sep 2016-arXiv: Neural and Evolutionary Computing

TL;DR: A binary matrix multiplication GPU kernel is programmed with which it is possible to run the MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.

...read moreread less

Abstract: We introduce a method to train Quantized Neural Networks (QNNs) --- neural networks with extremely low precision (e.g., 1-bit) weights and activations, at run-time. At train-time the quantized weights and activations are used for computing the parameter gradients. During the forward pass, QNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations. As a result, power consumption is expected to be drastically reduced. We trained QNNs over the MNIST, CIFAR-10, SVHN and ImageNet datasets. The resulting QNNs achieve prediction accuracy comparable to their 32-bit counterparts. For example, our quantized version of AlexNet with 1-bit weights and 2-bit activations achieves $51\%$ top-1 accuracy. Moreover, we quantize the parameter gradients to 6-bits as well which enables gradients computation using only bit-wise operation. Quantized recurrent neural networks were tested over the Penn Treebank dataset, and achieved comparable accuracy as their 32-bit counterparts using only 4-bits. Last but not least, we programmed a binary matrix multiplication GPU kernel with which it is possible to run our MNIST QNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The QNN code is available online.

...read moreread less

Book Chapter•DOI•

Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition

[...]

Jun Liu¹, Amir Shahroudy¹, Dong Xu², Gang Wang¹•Institutions (2)

Nanyang Technological University¹, University of Sydney²

08 Oct 2016

TL;DR: This paper introduces new gating mechanism within LSTM to learn the reliability of the sequential input data and accordingly adjust its effect on updating the long-term context information stored in the memory cell, and proposes a more powerful tree-structure based traversal method.

...read moreread less

Abstract: 3D action recognition – analysis of human actions based on 3D skeleton data – becomes popular recently due to its succinctness, robustness, and view-invariant representation. Recent attempts on this problem suggested to develop RNN-based learning methods to model the contextual dependency in the temporal domain. In this paper, we extend this idea to spatio-temporal domains to analyze the hidden sources of action-related information within the input data over both domains concurrently. Inspired by the graphical structure of the human skeleton, we further propose a more powerful tree-structure based traversal method. To handle the noise and occlusion in 3D skeleton data, we introduce new gating mechanism within LSTM to learn the reliability of the sequential input data and accordingly adjust its effect on updating the long-term context information stored in the memory cell. Our method achieves state-of-the-art performance on 4 challenging benchmark datasets for 3D human action analysis.

...read moreread less

Proceedings Article•DOI•

End-to-end attention-based large vocabulary speech recognition

[...]

Dzmitry Bahdanau¹, Jan Chorowski², Dmitriy Serdyuk¹, Philemon Brakel¹, Yoshua Bengio¹ - Show less +1 more•Institutions (2)

Université de Montréal¹, University of Wrocław²

20 Mar 2016

TL;DR: This work investigates an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels.

...read moreread less

Abstract: Many state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) Systems are hybrids of neural networks and Hidden Markov Models (HMMs). Recently, more direct end-to-end methods have been investigated, in which neural architectures were trained to model sequences of characters [1,2]. To our knowledge, all these approaches relied on Connectionist Temporal Classification [3] modules. We investigate an alternative method for sequence modelling based on an attention mechanism that allows a Recurrent Neural Network (RNN) to learn alignments between sequences of input frames and output labels. We show how this setup can be applied to LVCSR by integrating the decoding RNN with an n-gram language model and by speeding up its operation by constraining selections made by the attention mechanism and by reducing the source sequence lengths by pooling information over time. Recognition accuracies similar to other HMM-free RNN-based approaches are reported for the Wall Street Journal corpus.

...read moreread less

Posted Content•

Exploring the limits of language modeling

[...]

Rafal Jozefowicz¹, Oriol Vinyals¹, Mike Schuster¹, Noam Shazeer¹, Yonghui Wu¹ - Show less +1 more•Institutions (1)

Google¹

07 Feb 2016-arXiv: Computation and Language

TL;DR: This work explores recent advances in Recurrent Neural Networks for large scale Language Modeling, and extends current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language.

...read moreread less

Abstract: In this work we explore recent advances in Recurrent Neural Networks for large scale Language Modeling, a task central to language understanding. We extend current models to deal with two key challenges present in this task: corpora and vocabulary sizes, and complex, long term structure of language. We perform an exhaustive study on techniques such as character Convolutional Neural Networks or Long-Short Term Memory, on the One Billion Word Benchmark. Our best single model significantly improves state-of-the-art perplexity from 51.3 down to 30.0 (whilst reducing the number of parameters by a factor of 20), while an ensemble of models sets a new record by improving perplexity from 41.0 down to 23.7. We also release these models for the NLP and ML community to study and improve upon.

...read moreread less

Proceedings Article•DOI•

End-to-End Relation Extraction using LSTMs on Sequences and Tree Structures

[...]

Makoto Miwa¹, Mohit Bansal²•Institutions (2)

Toyota Technological Institute¹, Toyota Technological Institute at Chicago²

05 Jan 2016

TL;DR: A novel end-to-end neural model to extract entities and relations between them and compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8).

...read moreread less

Abstract: We present a novel end-to-end neural model to extract entities and relations between them. Our recurrent neural network based model captures both word sequence and dependency tree substructure information by stacking bidirectional treestructured LSTM-RNNs on bidirectional sequential LSTM-RNNs. This allows our model to jointly represent both entities and relations with shared parameters in a single model. We further encourage detection of entities during training and use of entity information in relation extraction via entity pretraining and scheduled sampling. Our model improves over the stateof-the-art feature-based model on end-toend relation extraction, achieving 12.1% and 5.7% relative error reductions in F1score on ACE2005 and ACE2004, respectively. We also show that our LSTMRNN based model compares favorably to the state-of-the-art CNN based model (in F1-score) on nominal relation classification (SemEval-2010 Task 8). Finally, we present an extensive ablation analysis of several model components.

...read moreread less

Proceedings Article•DOI•

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

[...]

Justin Johnson¹, Andrej Karpathy¹, Li Fei-Fei¹•Institutions (1)

Stanford University¹

01 Jun 2016

TL;DR: In this paper, a Fully Convolutional Localization Network (FCLN) is proposed to address the localization and description task jointly, which can be trained end-to-end with a single round of optimization.

...read moreread less

Abstract: We introduce the dense captioning task, which requires a computer vision system to both localize and describe salient regions in images in natural language. The dense captioning task generalizes object detection when the descriptions consist of a single word, and Image Captioning when one predicted region covers the full image. To address the localization and description task jointly we propose a Fully Convolutional Localization Network (FCLN) architecture that processes an image with a single, efficient forward pass, requires no external regions proposals, and can be trained end-to-end with a single round of optimization. The architecture is composed of a Convolutional Network, a novel dense localization layer, and Recurrent Neural Network language model that generates the label sequences. We evaluate our network on the Visual Genome dataset, which comprises 94,000 images and 4,100,000 region-grounded captions. We observe both speed and accuracy improvements over baselines based on current state of the art approaches in both generation and retrieval settings.

...read moreread less

Proceedings Article•

Session-based Recommendations with Recurrent Neural Networks

[...]

Balázs Hidasi, Alexandros Karatzoglou¹, Linas Baltrunas², Domonkos Tikk•Institutions (2)

Telefónica¹, Netflix²

01 Jan 2016

TL;DR: In this article, the authors apply recurrent neural networks (RNN) on a new domain, namely recommender systems, and propose an RNN-based approach for session-based recommendations.

...read moreread less

Abstract: We apply recurrent neural networks (RNN) on a new domain, namely recommender systems. Real-life recommender systems often face the problem of having to base recommendations only on short session-based data (e.g. a small sportsware website) instead of long user histories (as in the case of Netflix). In this situation the frequently praised matrix factorization approaches are not accurate. This problem is usually overcome in practice by resorting to item-to-item recommendations, i.e. recommending similar items. We argue that by modeling the whole session, more accurate recommendations can be provided. We therefore propose an RNN-based approach for session-based recommendations. Our approach also considers practical aspects of the task and introduces several modifications to classic RNNs such as a ranking loss function that make it more viable for this specific problem. Experimental results on two data-sets show marked improvements over widely used approaches.

...read moreread less

Posted Content•

Pixel Recurrent Neural Networks

[...]

Aaron van den Oord¹, Nal Kalchbrenner¹, Koray Kavukcuoglu¹•Institutions (1)

Google¹

25 Jan 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a deep neural network that sequentially predicts the pixels in an image along the two spatial dimensions is presented. But the model is not able to model the discrete probability of the raw pixel values and encodes the complete set of dependencies.

...read moreread less

Proceedings Article•DOI•

CNN-RNN: A Unified Framework for Multi-label Image Classification

[...]

Jiang Wang¹, Yi Yang¹, Junhua Mao², Zhiheng Huang³, Chang Huang, Wei Xu¹ - Show less +2 more•Institutions (3)

Baidu¹, University of California, Berkeley², Facebook³

27 Jun 2016

TL;DR: In this article, a CNN-RNN framework is proposed to learn a joint image-label embedding to characterize the semantic label dependency as well as the image label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework.

...read moreread less

Abstract: While deep convolutional neural networks (CNNs) have shown a great success in single-label image classification, it is important to note that real world images generally contain multiple labels, which could correspond to different objects, scenes, actions and attributes in an image. Traditional approaches to multi-label image classification learn independent classifiers for each category and employ ranking or thresholding on the classification results. These techniques, although working well, fail to explicitly exploit the label dependencies in an image. In this paper, we utilize recurrent neural networks (RNNs) to address this problem. Combined with CNNs, the proposed CNN-RNN framework learns a joint image-label embedding to characterize the semantic label dependency as well as the image-label relevance, and it can be trained end-to-end from scratch to integrate both information in a unified framework. Experimental results on public benchmark datasets demonstrate that the proposed architecture achieves better performance than the state-of-the-art multi-label classification models.

...read moreread less

Proceedings Article•DOI•

Abstractive Sentence Summarization with Attentive Recurrent Neural Networks

[...]

Sumit Chopra¹, Michael Auli¹, Alexander M. Rush²•Institutions (2)

Facebook¹, Harvard University²

01 Jun 2016

TL;DR: A conditional recurrent neural network (RNN) which generates a summary of an input sentence which significantly outperforms the recently proposed state-of-the-art method on the Gigaword corpus while performing competitively on the DUC-2004 shared task.

...read moreread less

Abstract: Abstractive Sentence Summarization generates a shorter version of a given sentence while attempting to preserve its meaning. We introduce a conditional recurrent neural network (RNN) which generates a summary of an input sentence. The conditioning is provided by a novel convolutional attention-based encoder which ensures that the decoder focuses on the appropriate input words at each step of generation. Our model relies only on learned features and is easy to train in an end-to-end fashion on large data sets. Our experiments show that the model significantly outperforms the recently proposed state-of-the-art method on the Gigaword corpus while performing competitively on the DUC-2004 shared task.ive Sentence Summarization generates a shorter version of a given sentence while attempting to preserve its meaning. We introduce a conditional recurrent neural network (RNN) which generates a summary of an input sentence. The conditioning is provided by a novel convolutional attention-based encoder which ensures that the decoder focuses on the appropriate input words at each step of generation. Our model relies only on learned features and is easy to train in an end-to-end fashion on large data sets. Our experiments show that the model significantly outperforms the recently proposed state-of-the-art method on the Gigaword corpus while performing competitively on the DUC-2004 shared task.

...read moreread less

Proceedings Article•DOI•

Using LSTM and GRU neural network methods for traffic flow prediction

[...]

Rui Fu¹, Zuo Zhang¹, Li Li¹•Institutions (1)

Tsinghua University¹

01 Nov 2016

TL;DR: This paper uses Long Short Term Memory and Gated Recurrent Units (GRU) neural network methods to predict short-term traffic flow, and experiments demonstrate that Recurrent Neural Network (RNN) based deep learning methods such as LSTM and GRU perform better than auto regressive integrated moving average (ARIMA) model.

...read moreread less

Abstract: Accurate and real-time traffic flow prediction is important in Intelligent Transportation System (ITS), especially for traffic control. Existing models such as ARMA, ARIMA are mainly linear models and cannot describe the stochastic and nonlinear nature of traffic flow. In recent years, deep-learning-based methods have been applied as novel alternatives for traffic flow prediction. However, which kind of deep neural networks is the most appropriate model for traffic flow prediction remains unsolved. In this paper, we use Long Short Term Memory (LSTM) and Gated Recurrent Units (GRU) neural network (NN) methods to predict short-term traffic flow, and experiments demonstrate that Recurrent Neural Network (RNN) based deep learning methods such as LSTM and GRU perform better than auto regressive integrated moving average (ARIMA) model. To the best of our knowledge, this is the first time that GRU is applied to traffic flow prediction.

...read moreread less

Posted Content•

Language Modeling with Gated Convolutional Networks

[...]

Yann N. Dauphin¹, Angela Fan¹, Michael Auli¹, David Grangier¹•Institutions (1)

Facebook¹

23 Dec 2016-arXiv: Computation and Language

TL;DR: The authors proposed a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens and achieved state-of-the-art results on the WikiText-103 benchmark.

...read moreread less

Abstract: The pre-dominant approach to language modeling to date is based on recurrent neural networks. Their success on this task is often linked to their ability to capture unbounded context. In this paper we develop a finite context approach through stacked convolutions, which can be more efficient since they allow parallelization over sequential tokens. We propose a novel simplified gating mechanism that outperforms Oord et al (2016) and investigate the impact of key architectural decisions. The proposed approach achieves state-of-the-art on the WikiText-103 benchmark, even though it features long-term dependencies, as well as competitive results on the Google Billion Words benchmark. Our model reduces the latency to score a sentence by an order of magnitude compared to a recurrent baseline. To our knowledge, this is the first time a non-recurrent approach is competitive with strong recurrent models on these large scale language tasks.

...read moreread less

Proceedings Article•

Sequence Level Training with Recurrent Neural Networks

[...]

Marc'Aurelio Ranzato¹, Sumit Chopra¹, Michael Auli¹, Wojciech Zaremba¹•Institutions (1)

Facebook¹

01 Jan 2016

TL;DR: This paper proposed a sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE, to generate the next word in a sequence.

...read moreread less

Abstract: Many natural language processing applications use language models to generate text. These models are typically trained to predict the next word in a sequence, given the previous words and some context such as an image. However, at test time the model is expected to generate the entire sequence from scratch. This discrepancy makes generation brittle, as errors may accumulate along the way. We address this issue by proposing a novel sequence level training algorithm that directly optimizes the metric used at test time, such as BLEU or ROUGE. On three different tasks, our approach outperforms several strong baselines for greedy generation. The method is also competitive when these baselines employ beam search, while being several times faster.

...read moreread less

Posted Content•

Temporal Convolutional Networks for Action Segmentation and Detection

[...]

Colin Lea¹, Michael D. Flynn¹, René Vidal¹, Austin Reiter¹, Gregory D. Hager¹ - Show less +1 more•Institutions (1)

Johns Hopkins University¹

16 Nov 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: Temporal Convolutional Networks (TCNs) as mentioned in this paper use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection, which can capture action compositions, segment durations, and long-range dependencies.

...read moreread less

Abstract: The ability to identify and temporally segment fine-grained human actions throughout a video is crucial for robotics, surveillance, education, and beyond. Typical approaches decouple this problem by first extracting local spatiotemporal features from video frames and then feeding them into a temporal classifier that captures high-level temporal patterns. We introduce a new class of temporal models, which we call Temporal Convolutional Networks (TCNs), that use a hierarchy of temporal convolutions to perform fine-grained action segmentation or detection. Our Encoder-Decoder TCN uses pooling and upsampling to efficiently capture long-range temporal patterns whereas our Dilated TCN uses dilated convolutions. We show that TCNs are capable of capturing action compositions, segment durations, and long-range dependencies, and are over a magnitude faster to train than competing LSTM-based Recurrent Neural Networks. We apply these models to three challenging fine-grained datasets and show large improvements over the state of the art.

...read moreread less

Proceedings Article•DOI•

Structural-RNN: Deep Learning on Spatio-Temporal Graphs

[...]

Ashesh Jain¹, Amir Roshan Zamir², Silvio Savarese², Ashutosh Saxena•Institutions (2)

Cornell University¹, Stanford University²

27 Jun 2016

TL;DR: In this article, a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable is proposed.

...read moreread less

Abstract: Deep Recurrent Neural Network architectures, though remarkably capable at modeling sequences, lack an intuitive high-level spatio-temporal structure. That is while many problems in computer vision inherently have an underlying high-level structure and can benefit from it. Spatiotemporal graphs are a popular tool for imposing such high-level intuitions in the formulation of real world problems. In this paper, we propose an approach for combining the power of high-level spatio-temporal graphs and sequence learning success of Recurrent Neural Networks (RNNs). We develop a scalable method for casting an arbitrary spatio-temporal graph as a rich RNN mixture that is feedforward, fully differentiable, and jointly trainable. The proposed method is generic and principled as it can be used for transforming any spatio-temporal graph through employing a certain set of well defined steps. The evaluations of the proposed approach on a diverse set of problems, ranging from modeling human motion to object interactions, shows improvement over the state-of-the-art with a large margin. We expect this method to empower new approaches to problem formulation through high-level spatio-temporal graphs and Recurrent Neural Networks.

...read moreread less

Posted Content•

Co-occurrence Feature Learning for Skeleton based Action Recognition using Regularized Deep LSTM Networks

[...]

Wentao Zhu¹, Cuiling Lan², Junliang Xing³, Wenjun Zeng², Yanghao Li⁴, Li Shen³, Xiaohui Xie¹ - Show less +3 more•Institutions (4)

University of California, Irvine¹, Microsoft², Chinese Academy of Sciences³, Peking University⁴

24 Mar 2016-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work takes the skeleton as the input at each time slot and introduces a novel regularization scheme to learn the co-occurrence features of skeleton joints, and proposes a new dropout algorithm which simultaneously operates on the gates, cells, and output responses of the LSTM neurons.

...read moreread less

Abstract: Skeleton based action recognition distinguishes human actions using the trajectories of skeleton joints, which provide a very good representation for describing actions. Considering that recurrent neural networks (RNNs) with Long Short-Term Memory (LSTM) can learn feature representations and model long-term temporal dependencies automatically, we propose an end-to-end fully connected deep LSTM network for skeleton based action recognition. Inspired by the observation that the co-occurrences of the joints intrinsically characterize human actions, we take the skeleton as the input at each time slot and introduce a novel regularization scheme to learn the co-occurrence features of skeleton joints. To train the deep LSTM network effectively, we propose a new dropout algorithm which simultaneously operates on the gates, cells, and output responses of the LSTM neurons. Experimental results on three human action recognition datasets consistently demonstrate the effectiveness of the proposed model.

...read moreread less

Proceedings Article•

Predicting the next location: a recurrent model with spatial and temporal contexts

[...]

Qiang Liu¹, Shu Wu¹, Liang Wang¹, Tieniu Tan¹•Institutions (1)

Chinese Academy of Sciences¹

12 Feb 2016

TL;DR: RNN is extended and a novel method called Spatial Temporal Recurrent Neural Networks (ST-RNN) is proposed, which can model local temporal and spatial contexts in each layer with time-specific transition matrices for different time intervals and distance-specific transitions for different geographical distances.

...read moreread less

Abstract: Spatial and temporal contextual information plays a key role for analyzing user behaviors, and is helpful for predicting where he or she will go next. With the growing ability of collecting information, more and more temporal and spatial contextual information is collected in systems, and the location prediction problem becomes crucial and feasible. Some works have been proposed to address this problem, but they all have their limitations. Factorizing Personalized Markov Chain (FPMC) is constructed based on a strong independence assumption among different factors, which limits its performance. Tensor Factorization (TF) faces the cold start problem in predicting future actions. Recurrent Neural Networks (RNN) model shows promising performance comparing with PFMC and TF, but all these methods have problem in modeling continuous time interval and geographical distance. In this paper, we extend RNN and propose a novel method called Spatial Temporal Recurrent Neural Networks (ST-RNN). ST-RNN can model local temporal and spatial contexts in each layer with time-specific transition matrices for different time intervals and distance-specific transition matrices for different geographical distances. Experimental results show that the proposed ST-RNN model yields significant improvements over the competitive compared methods on two typical datasets, i.e., Global Terrorism Database (GTD) and Gowalla dataset.

...read moreread less

Posted Content•

Training Deep Nets with Sublinear Memory Cost.

[...]

Tianqi Chen, Bing Xu, Chiyuan Zhang, Carlos Guestrin

21 Apr 2016-arXiv: Learning

TL;DR: This work designs an algorithm that costs O( √ n) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch, and shows that it is possible to trade computation for memory giving a more memory efficient training algorithm with a little extra computation cost.

...read moreread less

Abstract: We propose a systematic approach to reduce the memory consumption of deep neural network training. Specifically, we design an algorithm that costs O(sqrt(n)) memory to train a n layer network, with only the computational cost of an extra forward pass per mini-batch. As many of the state-of-the-art models hit the upper bound of the GPU memory, our algorithm allows deeper and more complex models to be explored, and helps advance the innovations in deep learning research. We focus on reducing the memory cost to store the intermediate feature maps and gradients during training. Computation graph analysis is used for automatic in-place operation and memory sharing optimizations. We show that it is possible to trade computation for memory - giving a more memory efficient training algorithm with a little extra computation cost. In the extreme case, our analysis also shows that the memory consumption can be reduced to O(log n) with as little as O(n log n) extra cost for forward computation. Our experiments show that we can reduce the memory cost of a 1,000-layer deep residual network from 48G to 7G with only 30 percent additional running time cost on ImageNet problems. Similarly, significant memory cost reduction is observed in training complex recurrent neural networks on very long sequences.

...read moreread less

Posted Content•

RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning

[...]

Yan Duan, John Schulman, Xi Chen, Peter L. Bartlett, Ilya Sutskever, Pieter Abbeel - Show less +2 more

04 Nov 2016-arXiv: Artificial Intelligence

TL;DR: This paper proposes to represent a "fast" reinforcement learning algorithm as a recurrent neural network (RNN) and learn it from data, encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm.

...read moreread less

Abstract: Deep reinforcement learning (deep RL) has been successful in learning sophisticated behaviors automatically; however, the learning process requires a huge number of trials. In contrast, animals can learn new tasks in just a few trials, benefiting from their prior knowledge about the world. This paper seeks to bridge this gap. Rather than designing a "fast" reinforcement learning algorithm, we propose to represent it as a recurrent neural network (RNN) and learn it from data. In our proposed method, RL$^2$, the algorithm is encoded in the weights of the RNN, which are learned slowly through a general-purpose ("slow") RL algorithm. The RNN receives all information a typical RL algorithm would receive, including observations, actions, rewards, and termination flags; and it retains its state across episodes in a given Markov Decision Process (MDP). The activations of the RNN store the state of the "fast" RL algorithm on the current (previously unseen) MDP. We evaluate RL$^2$ experimentally on both small-scale and large-scale problems. On the small-scale side, we train it to solve randomly generated multi-arm bandit problems and finite MDPs. After RL$^2$ is trained, its performance on new MDPs is close to human-designed algorithms with optimality guarantees. On the large-scale side, we test RL$^2$ on a vision-based navigation task and show that it scales up to high-dimensional problems.

...read moreread less

Book Chapter•DOI•

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

[...]

Huijuan Xu¹, Kate Saenko¹•Institutions (1)

Boston University¹

08 Oct 2016

TL;DR: The Spatial Memory Network, a novel spatial attention architecture that aligns words with image patches in the first hop, is proposed and improved results are obtained compared to a strong deep baseline model which concatenates image and question features to predict the answer.

...read moreread less

Abstract: We address the problem of Visual Question Answering (VQA), which requires joint image and language understanding to answer a question about a given photograph. Recent approaches have applied deep image captioning methods based on convolutional-recurrent networks to this problem, but have failed to model spatial inference. To remedy this, we propose a model we call the Spatial Memory Network and apply it to the VQA task. Memory networks are recurrent neural networks with an explicit attention mechanism that selects certain parts of the information stored in memory. Our Spatial Memory Network stores neuron activations from different spatial regions of the image in its memory, and uses attention to choose regions relevant for computing the answer. We propose a novel question-guided spatial attention architecture that looks for regions relevant to either individual words or the entire question, repeating the process over multiple recurrent steps, or “hops”. To better understand the inference process learned by the network, we design synthetic questions that specifically require spatial inference and visualize the network’s attention. We evaluate our model on two available visual question answering datasets and obtain improved results.

...read moreread less

Collapse