Showing papers on "Artificial neural network published in 2014"

PDF

Open Access

Journal Article•

Dropout: a simple way to prevent neural networks from overfitting

[...]

Nitish Srivastava¹, Geoffrey E. Hinton¹, Alex Krizhevsky¹, Ilya Sutskever¹, Ruslan Salakhutdinov¹ - Show less +1 more•Institutions (1)

University of Toronto¹

01 Jan 2014-Journal of Machine Learning Research

TL;DR: It is shown that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

Abstract: Deep neural nets with a large number of parameters are very powerful machine learning systems. However, overfitting is a serious problem in such networks. Large networks are also slow to use, making it difficult to deal with overfitting by combining the predictions of many different large neural nets at test time. Dropout is a technique for addressing this problem. The key idea is to randomly drop units (along with their connections) from the neural network during training. This prevents units from co-adapting too much. During training, dropout samples from an exponential number of different "thinned" networks. At test time, it is easy to approximate the effect of averaging the predictions of all these thinned networks by simply using a single unthinned network that has smaller weights. This significantly reduces overfitting and gives major improvements over other regularization methods. We show that dropout improves the performance of neural networks on supervised learning tasks in vision, speech recognition, document classification and computational biology, obtaining state-of-the-art results on many benchmark data sets.

...read moreread less

33,597 citations

Proceedings Article•DOI•

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

[...]

Kyunghyun Cho¹, Bart van Merriënboer², Caglar Gulcehre², Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio³, Yoshua Bengio⁴, Yoshua Bengio⁵ - Show less +5 more•Institutions (5)

Aalto University¹, Université de Montréal², Alcatel-Lucent³, AT&T⁴, École Polytechnique de Montréal⁵

01 Jan 2014

TL;DR: In this paper, the encoder and decoder of the RNN Encoder-Decoder model are jointly trained to maximize the conditional probability of a target sequence given a source sequence.

...read moreread less

Abstract: In this paper, we propose a novel neural network model called RNN Encoder‐ Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixedlength vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder‐Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

...read moreread less

19,998 citations

Posted Content•

Neural Machine Translation by Jointly Learning to Align and Translate

[...]

Dzmitry Bahdanau¹, Kyunghyun Cho², Yoshua Bengio²•Institutions (2)

Jacobs University Bremen¹, Université de Montréal²

01 Sep 2014-arXiv: Computation and Language

TL;DR: In this paper, the authors propose to use a soft-searching model to find the parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly.

...read moreread less

Abstract: Neural machine translation is a recently proposed approach to machine translation. Unlike the traditional statistical machine translation, the neural machine translation aims at building a single neural network that can be jointly tuned to maximize the translation performance. The models proposed recently for neural machine translation often belong to a family of encoder-decoders and consists of an encoder that encodes a source sentence into a fixed-length vector from which a decoder generates a translation. In this paper, we conjecture that the use of a fixed-length vector is a bottleneck in improving the performance of this basic encoder-decoder architecture, and propose to extend this by allowing a model to automatically (soft-)search for parts of a source sentence that are relevant to predicting a target word, without having to form these parts as a hard segment explicitly. With this new approach, we achieve a translation performance comparable to the existing state-of-the-art phrase-based system on the task of English-to-French translation. Furthermore, qualitative analysis reveals that the (soft-)alignments found by the model agree well with our intuition.

...read moreread less

14,077 citations

Proceedings Article•

Intriguing properties of neural networks

[...]

Christian Szegedy¹, Wojciech Zaremba², Ilya Sutskever¹, Joan Bruna², Dumitru Erhan¹, Ian Goodfellow³, Rob Fergus², Rob Fergus⁴ - Show less +4 more•Institutions (4)

Google¹, New York University², Université de Montréal³, Facebook⁴

01 Jan 2014

TL;DR: It is found that there is no distinction between individual highlevel units and random linear combinations of high level units, according to various methods of unit analysis, and it is suggested that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks.

...read moreread less

Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

...read moreread less

9,561 citations

Proceedings Article•DOI•

On the Properties of Neural Machine Translation: Encoder--Decoder Approaches

[...]

Kyunghyun Cho¹, Bart van Merriënboer¹, Dzmitry Bahdanau², Yoshua Bengio³, Yoshua Bengio⁴, Yoshua Bengio⁵ - Show less +2 more•Institutions (5)

Université de Montréal¹, Jacobs University Bremen², Alcatel-Lucent³, AT&T⁴, École Polytechnique de Montréal⁵

03 Sep 2014

TL;DR: In this paper, a gated recursive convolutional neural network (GRNN) was proposed to learn a grammatical structure of a sentence automatically, which performed well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase.

...read moreread less

Abstract: Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder‐Decoder and a newly proposed gated recursive convolutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase. Furthermore, we find that the proposed gated recursive convolutional network learns a grammatical structure of a sentence automatically.

...read moreread less

4,702 citations

Posted Content•

Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs

[...]

Liang-Chieh Chen¹, George Papandreou², Iasonas Kokkinos³, Kevin Murphy², Alan L. Yuille¹ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, Google², CentraleSupélec³

22 Dec 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF).

...read moreread less

Abstract: Deep Convolutional Neural Networks (DCNNs) have recently shown state of the art performance in high level vision tasks, such as image classification and object detection. This work brings together methods from DCNNs and probabilistic graphical models for addressing the task of pixel-level classification (also called "semantic image segmentation"). We show that responses at the final layer of DCNNs are not sufficiently localized for accurate object segmentation. This is due to the very invariance properties that make DCNNs good for high level tasks. We overcome this poor localization property of deep networks by combining the responses at the final DCNN layer with a fully connected Conditional Random Field (CRF). Qualitatively, our "DeepLab" system is able to localize segment boundaries at a level of accuracy which is beyond previous methods. Quantitatively, our method sets the new state-of-art at the PASCAL VOC-2012 semantic image segmentation task, reaching 71.6% IOU accuracy in the test set. We show how these results can be obtained efficiently: Careful network re-purposing and a novel application of the 'hole' algorithm from the wavelet community allow dense computation of neural net responses at 8 frames per second on a modern GPU.

...read moreread less

3,389 citations

Book•

Deep Learning: Methods and Applications

[...]

Li Deng¹, Dong Yu¹•Institutions (1)

Microsoft¹

12 Jun 2014

TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

Abstract: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks. The application areas are chosen with the following three criteria in mind: (1) expertise or knowledge of the authors; (2) the application areas that have already been transformed by the successful use of deep learning technology, such as speech recognition and computer vision; and (3) the application areas that have the potential to be impacted significantly by deep learning and that have been experiencing research growth, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.

...read moreread less

2,817 citations

Posted Content•

Generative Adversarial Networks

[...]

Ian Goodfellow¹, Jean Pouget-Abadie², Mehdi Mirza², Bing Xu², David Warde-Farley², Sherjil Ozair², Aaron Courville², Yoshua Bengio² - Show less +4 more•Institutions (2)

Google¹, Université de Montréal²

10 Jun 2014-arXiv: Machine Learning

TL;DR: In this article, a generative adversarial network (GAN) is proposed to estimate generative models via an adversarial process, in which two models are simultaneously trained: a generator G and a discriminator D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Abstract: We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax two-player game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples.

...read moreread less

2,657 citations

Posted Content•

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

[...]

Aalto University¹, Université de Montréal², École Polytechnique de Montréal³, AT&T⁴, Alcatel-Lucent⁵

03 Jun 2014-arXiv: Computation and Language

TL;DR: Qualitatively, the proposed RNN Encoder‐Decoder model learns a semantically and syntactically meaningful representation of linguistic phrases.

...read moreread less

Abstract: In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN). One RNN encodes a sequence of symbols into a fixed-length vector representation, and the other decodes the representation into another sequence of symbols. The encoder and decoder of the proposed model are jointly trained to maximize the conditional probability of a target sequence given a source sequence. The performance of a statistical machine translation system is empirically found to improve by using the conditional probabilities of phrase pairs computed by the RNN Encoder-Decoder as an additional feature in the existing log-linear model. Qualitatively, we show that the proposed model learns a semantically and syntactically meaningful representation of linguistic phrases.

...read moreread less

2,510 citations

Posted Content•

Recurrent Neural Network Regularization

[...]

Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals

08 Sep 2014-arXiv: Neural and Evolutionary Computing

TL;DR: This paper shows how to correctly apply dropout to LSTMs, and shows that it substantially reduces overfitting on a variety of tasks.

...read moreread less

Abstract: We present a simple regularization technique for Recurrent Neural Networks (RNNs) with Long Short-Term Memory (LSTM) units Dropout, the most successful technique for regularizing neural networks, does not work well with RNNs and LSTMs In this paper, we show how to correctly apply dropout to LSTMs, and show that it substantially reduces overfitting on a variety of tasks These tasks include language modeling, speech recognition, image caption generation, and machine translation

...read moreread less

2,364 citations

Posted Content•

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

[...]

Kyunghyun Cho¹, Bart van Merriënboer¹, Dzmitry Bahdanau², Yoshua Bengio³, Yoshua Bengio⁴, Yoshua Bengio⁵ - Show less +2 more•Institutions (5)

Université de Montréal¹, Jacobs University Bremen², École Polytechnique de Montréal³, AT&T⁴, Alcatel-Lucent⁵

03 Sep 2014-arXiv: Computation and Language

TL;DR: It is shown that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase.

...read moreread less

Abstract: Neural machine translation is a relatively new approach to statistical machine translation based purely on neural networks. The neural machine translation models often consist of an encoder and a decoder. The encoder extracts a fixed-length representation from a variable-length input sentence, and the decoder generates a correct translation from this representation. In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network. We show that the neural machine translation performs relatively well on short sentences without unknown words, but its performance degrades rapidly as the length of the sentence and the number of unknown words increase. Furthermore, we find that the proposed gated recursive convolutional network learns a grammatical structure of a sentence automatically.

...read moreread less

Journal Article•DOI•

Deep Learning-Based Classification of Hyperspectral Data

[...]

Yushi Chen¹, Zhouhan Lin¹, Xing Zhao¹, Gang Wang², Yanfeng Gu¹ - Show less +1 more•Institutions (2)

Harbin Institute of Technology¹, Nanyang Technological University²

26 Jun 2014-IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing

TL;DR: The concept of deep learning is introduced into hyperspectral data classification for the first time, and a new way of classifying with spatial-dominated information is proposed, which is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression.

...read moreread less

Abstract: Classification is one of the most popular topics in hyperspectral remote sensing. In the last two decades, a huge number of methods were proposed to deal with the hyperspectral data classification problem. However, most of them do not hierarchically extract deep features. In this paper, the concept of deep learning is introduced into hyperspectral data classification for the first time. First, we verify the eligibility of stacked autoencoders by following classical spectral information-based classification. Second, a new way of classifying with spatial-dominated information is proposed. We then propose a novel deep learning framework to merge the two features, from which we can get the highest classification accuracy. The framework is a hybrid of principle component analysis (PCA), deep learning architecture, and logistic regression. Specifically, as a deep learning architecture, stacked autoencoders are aimed to get useful high-level features. Experimental results with widely-used hyperspectral data indicate that classifiers built in this deep learning-based framework provide competitive performance. In addition, the proposed joint spectral-spatial deep neural network opens a new window for future research, showcasing the deep learning-based methods' huge potential for accurate hyperspectral data classification.

...read moreread less

Proceedings Article•

Network In Network

[...]

Min Lin¹, Qiang Chen¹, Shuicheng Yan¹•Institutions (1)

National University of Singapore¹

04 Mar 2014

TL;DR: In this paper, a Network in Network (NIN) architecture is proposed to enhance model discriminability for local patches within the receptive field, where the feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN, and then fed into the next layer.

...read moreread less

Abstract: We propose a novel deep network structure called "Network In Network" (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the state-of-the-art classification performances with NIN on CIFAR-10 and CIFAR-100, and reasonable performances on SVHN and MNIST datasets.

...read moreread less

Journal Article•DOI•

Convolutional neural networks for speech recognition

[...]

Ossama Abdel-Hamid¹, Abdelrahman Mohamed², Hui Jiang¹, Li Deng³, Gerald Penn², Dong Yu³ - Show less +2 more•Institutions (3)

York University¹, University of Toronto², Microsoft³

01 Oct 2014-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: It is shown that further error rate reduction can be obtained by using convolutional neural networks (CNNs), and a limited-weight-sharing scheme is proposed that can better model speech features.

...read moreread less

Abstract: Recently, the hybrid deep neural network (DNN)- hidden Markov model (HMM) has been shown to significantly improve speech recognition performance over the conventional Gaussian mixture model (GMM)-HMM. The performance improvement is partially attributed to the ability of the DNN to model complex correlations in speech features. In this paper, we show that further error rate reduction can be obtained by using convolutional neural networks (CNNs). We first present a concise description of the basic CNN and explain how it can be used for speech recognition. We further propose a limited-weight-sharing scheme that can better model speech features. The special structure such as local connectivity, weight sharing, and pooling in CNNs exhibits some degree of invariance to small shifts of speech features along the frequency axis, which is important to deal with speaker and environment variations. Experimental results show that CNNs reduce the error rate by 6%-10% compared with DNNs on the TIMIT phone recognition and the voice search large vocabulary speech recognition tasks.

...read moreread less

Proceedings Article•DOI•

A Fast and Accurate Dependency Parser using Neural Networks

[...]

Danqi Chen¹, Christopher D. Manning¹•Institutions (1)

Stanford University¹

01 Jan 2014

TL;DR: This work proposes a novel way of learning a neural network classifier for use in a greedy, transition-based dependency parser that can work very fast, while achieving an about 2% improvement in unlabeled and labeled attachment scores on both English and Chinese datasets.

...read moreread less

Abstract: Almost all current dependency parsers classify based on millions of sparse indicator features. Not only do these features generalize poorly, but the cost of feature computation restricts parsing speed significantly. In this work, we propose a novel way of learning a neural network classifier for use in a greedy, transition-based dependency parser. Because this classifier learns and uses just a small number of dense features, it can work very fast, while achieving an about 2% improvement in unlabeled and labeled attachment scores on both English and Chinese datasets. Concretely, our parser is able to parse more than 1000 sentences per second at 92.2% unlabeled attachment score on the English Penn Treebank.

...read moreread less

Journal Article•DOI•

Performance-optimized hierarchical models predict neural responses in higher visual cortex

[...]

Daniel L. K. Yamins¹, Ha Hong¹, Charles F. Cadieu¹, Ethan A. Solomon¹, Darren Seibert¹, James J. DiCarlo¹ - Show less +2 more•Institutions (1)

McGovern Institute for Brain Research¹

10 Jun 2014-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: This work uses computational techniques to identify a high-performing neural network model that matches human performance on challenging object categorization tasks and shows that performance optimization—applied in a biologically appropriate model class—can be used to build quantitative predictive models of neural processing.

...read moreread less

Abstract: The ventral visual stream underlies key human visual object recognition abilities. However, neural encoding in the higher areas of the ventral stream remains poorly understood. Here, we describe a modeling approach that yields a quantitatively accurate model of inferior temporal (IT) cortex, the highest ventral cortical area. Using high-throughput computational techniques, we discovered that, within a class of biologically plausible hierarchical neural network models, there is a strong correlation between a model’s categorization performance and its ability to predict individual IT neural unit response data. To pursue this idea, we then identified a high-performing neural network that matches human performance on a range of recognition tasks. Critically, even though we did not constrain this model to match neural data, its top output layer turns out to be highly predictive of IT spiking responses to complex naturalistic images at both the single site and population levels. Moreover, the model’s intermediate layers are highly predictive of neural responses in the V4 cortex, a midlevel visual area that provides the dominant cortical input to IT. These results show that performance optimization—applied in a biologically appropriate model class—can be used to build quantitative predictive models of neural processing.

...read moreread less

Proceedings Article•

Relation Classification via Convolutional Deep Neural Network

[...]

Daojian Zeng¹, Kang Liu¹, Siwei Lai¹, Guangyou Zhou¹, Jun Zhao¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

01 Aug 2014

TL;DR: This paper exploits a convolutional deep neural network (DNN) to extract lexical and sentence level features from the output of pre-existing natural language processing systems and significantly outperforms the state-of-the-art methods.

...read moreread less

Abstract: The state-of-the-art methods used for relation classification are primarily based on statistical machine learning, and their performance strongly depends on the quality of the extracted features. The extracted features are often derived from the output of pre-existing natural language processing (NLP) systems, which leads to the propagation of the errors in the existing tools and hinders the performance of these systems. In this paper, we exploit a convolutional deep neural network (DNN) to extract lexical and sentence level features. Our method takes all of the word tokens as input without complicated pre-processing. First, the word tokens are transformed to vectors by looking up word embeddings 1 . Then, lexical level features are extracted according to the given nouns. Meanwhile, sentence level features are learned using a convolutional approach. These two level features are concatenated to form the final extracted feature vector. Finally, the features are fed into a softmax classifier to predict the relationship between two marked nouns. The experimental results demonstrate that our approach significantly outperforms the state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

DaDianNao: A Machine-Learning Supercomputer

[...]

Yunji Chen¹, Luo Tao¹, Liu Shaoli¹, Zhang Shijin¹, Liqiang He², Jia Wang¹, Ling Li¹, Tianshi Chen¹, Zhiwei Xu¹, Ninghui Sun¹, Olivier Temam³ - Show less +7 more•Institutions (3)

Chinese Academy of Sciences¹, Inner Mongolia University², French Institute for Research in Computer Science and Automation³

13 Dec 2014

TL;DR: This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.

...read moreread less

Abstract: Many companies are deploying services, either for consumers or industry, which are largely based on machine-learning algorithms for sophisticated processing of large amounts of data. The state-of-the-art and most popular such machine-learning algorithms are Convolutional and Deep Neural Networks (CNNs and DNNs), which are known to be both computationally and memory intensive. A number of neural network accelerators have been recently proposed which can offer high computational capacity/area ratio, but which remain hampered by memory accesses. However, unlike the memory wall faced by processors on general-purpose workloads, the CNNs and DNNs memory footprint, while large, is not beyond the capability of the on chip storage of a multi-chip system. This property, combined with the CNN/DNN algorithmic characteristics, can lead to high internal bandwidth and low external communications, which can in turn enable high-degree parallelism at a reasonable area cost. In this article, we introduce a custom multi-chip machine-learning architecture along those lines. We show that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. We implement the node down to the place and route at 28nm, containing a combination of custom storage and computational units, with industry-grade interconnects.

...read moreread less

Posted Content•

Neural Turing Machines

[...]

Alex Graves¹, Greg Wayne¹, Ivo Danihelka¹•Institutions (1)

Google¹

20 Oct 2014-arXiv: Neural and Evolutionary Computing

TL;DR: Neural Turing Machines as discussed by the authors extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes, analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end.

...read moreread less

Abstract: We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples.

...read moreread less

Journal Article•DOI•

Searching for exotic particles in high-energy physics with deep learning

[...]

Pierre Baldi¹, Peter Sadowski¹, Daniel Whiteson¹•Institutions (1)

University of California, Irvine¹

02 Jul 2014-Nature Communications

TL;DR: It is shown that deep-learning methods need no manually constructed inputs and yet improve the classification metric by as much as 8% over the best current approaches, demonstrating that deep learning approaches can improve the power of collider searches for exotic particles.

...read moreread less

Abstract: Collisions at high-energy particle colliders are a traditionally fruitful source of exotic particle discoveries. Finding these rare particles requires solving difficult signal-versus-background classification problems, hence machine-learning approaches are often used. Standard approaches have relied on 'shallow' machine-learning models that have a limited capacity to learn complex nonlinear functions of the inputs, and rely on a painstaking search through manually constructed nonlinear features. Progress on this problem has slowed, as a variety of techniques have shown equivalent performance. Recent advances in the field of deep learning make it possible to learn more complex functions and better discriminate between signal and background classes. Here, using benchmark data sets, we show that deep-learning methods need no manually constructed inputs and yet improve the classification metric by as much as 8% over the best current approaches. This demonstrates that deep-learning approaches can improve the power of collider searches for exotic particles.

...read moreread less

Proceedings Article•DOI•

Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification

[...]

Duyu Tang¹, Furu Wei², Nan Yang², Ming Zhou³, Ting Liu¹, Bing Qin¹ - Show less +2 more•Institutions (3)

Harbin Institute of Technology¹, Microsoft², North China Electric Power University³

01 Jun 2014

TL;DR: Three neural networks are developed to effectively incorporate the supervision from sentiment polarity of text (e.g. sentences or tweets) in their loss functions and the performance of SSWE is improved by concatenating SSWE with existing feature set.

...read moreread less

Abstract: We present a method that learns word embedding for Twitter sentiment classification in this paper. Most existing algorithms for learning continuous word representations typically only model the syntactic context of words but ignore the sentiment of text. This is problematic for sentiment analysis as they usually map words with similar syntactic context but opposite sentiment polarity, such as good and bad, to neighboring word vectors. We address this issue by learning sentimentspecific word embedding (SSWE), which encodes sentiment information in the continuous representation of words. Specifically, we develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. sentences or tweets) in their loss functions. To obtain large scale training corpora, we learn the sentiment-specific word embedding from massive distant-supervised tweets collected by positive and negative emoticons. Experiments on applying SSWE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with hand-crafted features in the top-performed system; (2) the performance is further improved by concatenating SSWE with existing feature set.

...read moreread less

Proceedings Article•DOI•

Scalable Object Detection Using Deep Neural Networks

[...]

Dumitru Erhan¹, Christian Szegedy¹, Alexander Toshev¹, Dragomir Anguelov¹•Institutions (1)

Google¹

23 Jun 2014

TL;DR: This work proposes a saliency-inspired neural network model for detection, which predicts a set of class-agnostic bounding boxes along with a single score for each box, corresponding to its likelihood of containing any object of interest.

...read moreread less

Abstract: Deep convolutional neural networks have recently achieved state-of-the-art performance on a number of image recognition benchmarks, including the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC-2012). The winning model on the localization sub-task was a network that predicts a single bounding box and a confidence score for each object category in the image. Such a model captures the whole-image context around the objects but cannot handle multiple instances of the same object in the image without naively replicating the number of outputs for each instance. In this work, we propose a saliency-inspired neural network model for detection, which predicts a set of class-agnostic bounding boxes along with a single score for each box, corresponding to its likelihood of containing any object of interest. The model naturally handles a variable number of instances for each class and allows for cross-class generalization at the highest levels of the network. We are able to obtain competitive recognition performance on VOC2007 and ILSVRC2012, while using only the top few predicted locations in each image and a small number of neural network evaluations.

...read moreread less

Journal Article•DOI•

On training targets for supervised speech separation

[...]

Yuxuan Wang¹, Arun Narayanan¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

01 Dec 2014-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: Results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics, and that masking based targets, in general, are significantly better than spectral envelope based targets.

...read moreread less

Abstract: Formulation of speech separation as a supervised learning problem has shown considerable promise. In its simplest form, a supervised learning algorithm, typically a deep neural network, is trained to learn a mapping from noisy features to a time-frequency representation of the target of interest. Traditionally, the ideal binary mask (IBM) is used as the target because of its simplicity and large speech intelligibility gains. The supervised learning framework, however, is not restricted to the use of binary targets. In this study, we evaluate and compare separation results by using different training targets, including the IBM, the target binary mask, the ideal ratio mask (IRM), the short-time Fourier transform spectral magnitude and its corresponding mask (FFT-MASK), and the Gammatone frequency power spectrum. Our results in various test conditions reveal that the two ratio mask targets, the IRM and the FFT-MASK, outperform the other targets in terms of objective intelligibility and quality metrics. In addition, we find that masking based targets, in general, are significantly better than spectral envelope based targets. We also present comparisons with recent methods in non-negative matrix factorization and speech enhancement, which show clear performance advantages of supervised speech separation.

...read moreread less

Journal Article•DOI•

Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations

[...]

Ben Varkey Benjamin¹, Peiran Gao¹, Emmett McQuinn, Swadesh Choudhary², Anand R. Chandrasekaran, Jean-Marie Bussat³, Rodrigo Alvarez-Icaza⁴, John V. Arthur⁴, Paul A. Merolla⁴, Kwabena Boahen¹ - Show less +6 more•Institutions (4)

Stanford University¹, Intel², Apple Inc.³, IBM⁴

24 Apr 2014

TL;DR: Neurogrid as discussed by the authors is a real-time neuromorphic system for simulating large-scale neural models in real time using 16 Neurocores, including axonal arbor, synapse, dendritic tree, and soma.

...read moreread less

Abstract: In this paper, we describe the design of Neurogrid, a neuromorphic system for simulating large-scale neural models in real time. Neuromorphic systems realize the function of biological neural systems by emulating their structure. Designers of such systems face three major design choices: 1) whether to emulate the four neural elements-axonal arbor, synapse, dendritic tree, and soma-with dedicated or shared electronic circuits; 2) whether to implement these electronic circuits in an analog or digital manner; and 3) whether to interconnect arrays of these silicon neurons with a mesh or a tree network. The choices we made were: 1) we emulated all neural elements except the soma with shared electronic circuits; this choice maximized the number of synaptic connections; 2) we realized all electronic circuits except those for axonal arbors in an analog manner; this choice maximized energy efficiency; and 3) we interconnected neural arrays in a tree network; this choice maximized throughput. These three choices made it possible to simulate a million neurons with billions of synaptic connections in real time-for the first time-using 16 Neurocores integrated on a board that consumes three watts.

...read moreread less

Proceedings Article•DOI•

Convolutional Neural Networks for No-Reference Image Quality Assessment

[...]

Le Kang¹, Peng Ye¹, Yi Li², David Doermann¹•Institutions (2)

University of Maryland, College Park¹, NICTA²

23 Jun 2014

TL;DR: A Convolutional Neural Network is described to accurately predict image quality without a reference image to achieve state of the art performance on the LIVE dataset and shows excellent generalization ability in cross dataset experiments.

...read moreread less

Abstract: In this work we describe a Convolutional Neural Network (CNN) to accurately predict image quality without a reference image. Taking image patches as input, the CNN works in the spatial domain without using hand-crafted features that are employed by most previous methods. The network consists of one convolutional layer with max and min pooling, two fully connected layers and an output node. Within the network structure, feature learning and regression are integrated into one optimization process, which leads to a more effective model for estimating image quality. This approach achieves state of the art performance on the LIVE dataset and shows excellent generalization ability in cross dataset experiments. Further experiments on images with local distortions demonstrate the local quality estimation ability of our CNN, which is rarely reported in previous literature.

...read moreread less

Journal Article•DOI•

The SpiNNaker Project

[...]

Steve Furber¹, Francesco Galluppi¹, Steve Temple¹, Luis A. Plana¹•Institutions (1)

University of Manchester¹

27 Feb 2014

TL;DR: SpiNNaker as discussed by the authors is a massively parallel million-core computer whose interconnect architecture is inspired by the connectivity characteristics of the mammalian brain, and which is suited to the modeling of large-scale spiking neural networks in biological real time.

...read moreread less

Abstract: The spiking neural network architecture (SpiNNaker) project aims to deliver a massively parallel million-core computer whose interconnect architecture is inspired by the connectivity characteristics of the mammalian brain, and which is suited to the modeling of large-scale spiking neural networks in biological real time. Specifically, the interconnect allows the transmission of a very large number of very small data packets, each conveying explicitly the source, and implicitly the time, of a single neural action potential or “spike.” In this paper, we review the current state of the project, which has already delivered systems with up to 2500 processors, and present the real-time event-driven programming model that supports flexible access to the resources of the machine and has enabled its use by a wide range of collaborators around the world.

...read moreread less

Proceedings Article•DOI•

Learning Fine-Grained Image Similarity with Deep Ranking

[...]

Jiang Wang¹, Yang Song, Thomas Leung, Charles J. Rosenberg, Jingbin Wang, James Philbin, Bo Chen², Ying Wu¹ - Show less +4 more•Institutions (2)

Northwestern University¹, California Institute of Technology²

23 Jun 2014

TL;DR: Zhang et al. as mentioned in this paper proposed a deep ranking model that employs deep learning techniques to learn similarity metric directly from images, which has higher learning capability than models based on hand-crafted features.

...read moreread less

Abstract: Learning fine-grained image similarity is a challenging task. It needs to capture between-class and within-class image differences. This paper proposes a deep ranking model that employs deep learning techniques to learn similarity metric directly from images. It has higher learning capability than models based on hand-crafted features. A novel multiscale network structure has been developed to describe the images effectively. An efficient triplet sampling algorithm is also proposed to learn the model with distributed asynchronized stochastic gradient. Extensive experiments show that the proposed algorithm outperforms models based on hand-crafted visual features and deep classification models.

...read moreread less

Proceedings Article•DOI•

Deep Metric Learning for Person Re-identification

[...]

Dong Yi, Zhen Lei, Shengcai Liao, Stan Z. Li

24 Aug 2014

TL;DR: A more general way that can learn a similarity metric from image pixels directly by using a "siamese" deep neural network that can jointly learn the color feature, texture feature and metric in a unified framework is proposed.

...read moreread less

Abstract: Various hand-crafted features and metric learning methods prevail in the field of person re-identification Compared to these methods, this paper proposes a more general way that can learn a similarity metric from image pixels directly By using a "siamese" deep neural network, the proposed method can jointly learn the color feature, texture feature and metric in a unified framework The network has a symmetry structure with two sub-networks which are connected by a cosine layer Each sub network includes two convolutional layers and a full connected layer To deal with the big variations of person images, binomial deviance is used to evaluate the cost between similarities and labels, which is proved to be robust to outliers Experiments on VIPeR illustrate the superior performance of our method and a cross database experiment also shows its good generalization

...read moreread less

Posted Content•

On the Number of Linear Regions of Deep Neural Networks

[...]

Guido Montúfar¹, Razvan Pascanu², Kyunghyun Cho², Yoshua Bengio²•Institutions (2)

Max Planck Society¹, Université de Montréal²

08 Feb 2014-arXiv: Machine Learning

TL;DR: The complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have is investigated.

...read moreread less

Abstract: We study the complexity of functions computable by deep feedforward neural networks with piecewise linear activations in terms of the symmetries and the number of linear regions that they have. Deep networks are able to sequentially map portions of each layer's input-space to the same output. In this way, deep models compute functions that react equally to complicated patterns of different inputs. The compositional structure of these functions enables them to re-use pieces of computation exponentially often in terms of the network's depth. This paper investigates the complexity of such compositional maps and contributes new theoretical results regarding the advantage of depth for neural networks with piecewise linear activation functions. In particular, our analysis is not specific to a single family of models, and as an example, we employ it for rectifier and maxout networks. We improve complexity bounds from pre-existing work and investigate the behavior of units in higher layers.

...read moreread less

Posted Content•

Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition

[...]

Max Jaderberg¹, Karen Simonyan¹, Andrea Vedaldi¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

09 Jun 2014-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents a framework for the recognition of natural scene text that does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past.

...read moreread less

Abstract: In this work we present a framework for the recognition of natural scene text. Our framework does not require any human-labelled data, and performs word recognition on the whole image holistically, departing from the character based recognition systems of the past. The deep neural network models at the centre of this framework are trained solely on data produced by a synthetic text generation engine -- synthetic data that is highly realistic and sufficient to replace real data, giving us infinite amounts of training data. This excess of data exposes new possibilities for word recognition models, and here we consider three models, each one "reading" words in a different way: via 90k-way dictionary encoding, character sequence encoding, and bag-of-N-grams encoding. In the scenarios of language based and completely unconstrained text recognition we greatly improve upon state-of-the-art performance on standard datasets, using our fast, simple machinery and requiring zero data-acquisition costs.

...read moreread less

Collapse