Showing papers on "Artificial neural network published in 2013"

PDF

Open Access

Journal Article•DOI•

Representation Learning: A Review and New Perspectives

[...]

Yoshua Bengio, Aaron Courville, Pascal Vincent

01 Aug 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Recent work in the area of unsupervised feature learning and deep learning is reviewed, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks.

...read moreread less

Abstract: The success of machine learning algorithms generally depends on data representation, and we hypothesize that this is because different representations can entangle and hide more or less the different explanatory factors of variation behind the data. Although specific domain knowledge can be used to help design representations, learning with generic priors can also be used, and the quest for AI is motivating the design of more powerful representation-learning algorithms implementing such priors. This paper reviews recent work in the area of unsupervised feature learning and deep learning, covering advances in probabilistic models, autoencoders, manifold learning, and deep networks. This motivates longer term unanswered questions about the appropriate objectives for learning good representations, for computing representations (i.e., inference), and the geometrical connections between representation learning, density estimation, and manifold learning.

...read moreread less

11,201 citations

Journal Article•DOI•

3D Convolutional Neural Networks for Human Action Recognition

[...]

Shuiwang Ji¹, Wei Xu², Ming Yang, Kai Yu³•Institutions (3)

Old Dominion University¹, Facebook², Baidu³

01 Jan 2013-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Wang et al. as mentioned in this paper developed a novel 3D CNN model for action recognition, which extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames.

...read moreread less

Abstract: We consider the automated recognition of human actions in surveillance videos. Most current methods build classifiers based on complex handcrafted features computed from the raw inputs. Convolutional neural networks (CNNs) are a type of deep model that can act directly on the raw inputs. However, such models are currently limited to handling 2D inputs. In this paper, we develop a novel 3D CNN model for action recognition. This model extracts features from both the spatial and the temporal dimensions by performing 3D convolutions, thereby capturing the motion information encoded in multiple adjacent frames. The developed model generates multiple channels of information from the input frames, and the final feature representation combines information from all channels. To further boost the performance, we propose regularizing the outputs with high-level features and combining the predictions of a variety of different models. We apply the developed models to recognize human actions in the real-world environment of airport surveillance videos, and they achieve superior performance in comparison to baseline methods.

...read moreread less

4,545 citations

Posted Content•

Network In Network

[...]

Min Lin¹, Qiang Chen¹, Shuicheng Yan¹•Institutions (1)

National University of Singapore¹

16 Dec 2013-arXiv: Neural and Evolutionary Computing

TL;DR: With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.

...read moreread less

Abstract: We propose a novel deep network structure called "Network In Network" (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the state-of-the-art classification performances with NIN on CIFAR-10 and CIFAR-100, and reasonable performances on SVHN and MNIST datasets.

...read moreread less

3,905 citations

Posted Content•

Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation

[...]

Yoshua Bengio, Nicholas Léonard, Aaron Courville

15 Aug 2013-arXiv: Learning

TL;DR: This work considers a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network.

...read moreread less

Abstract: Stochastic neurons and hard non-linearities can be useful for a number of reasons in deep learning models, but in many cases they pose a challenging problem: how to estimate the gradient of a loss function with respect to the input of such stochastic or non-smooth neurons? I.e., can we "back-propagate" through these stochastic neurons? We examine this question, existing approaches, and compare four families of solutions, applicable in different settings. One of them is the minimum variance unbiased gradient estimator for stochatic binary neurons (a special case of the REINFORCE algorithm). A second approach, introduced here, decomposes the operation of a binary stochastic neuron into a stochastic binary part and a smooth differentiable part, which approximates the expected effect of the pure stochatic binary neuron to first order. A third approach involves the injection of additive or multiplicative noise in a computational graph that is otherwise differentiable. A fourth approach heuristically copies the gradient with respect to the stochastic output directly as an estimator of the gradient with respect to the sigmoid argument (we call this the straight-through estimator). To explore a context where these estimators are useful, we consider a small-scale version of {\em conditional computation}, where sparse stochastic units form a distributed representation of gaters that can turn off in combinatorially many ways large chunks of the computation performed in the rest of the neural network. In this case, it is important that the gating units produce an actual 0 most of the time. The resulting sparsity can be potentially be exploited to greatly reduce the computational cost of large deep networks for which conditional computation would be useful.

...read moreread less

2,178 citations

Proceedings Article•DOI•

Improving deep neural networks for LVCSR using rectified linear units and dropout

[...]

George E. Dahl¹, Tara N. Sainath², Geoffrey E. Hinton¹•Institutions (2)

University of Toronto¹, IBM²

26 May 2013

TL;DR: Modelling deep neural networks with rectified linear unit (ReLU) non-linearities with minimal human hyper-parameter tuning on a 50-hour English Broadcast News task shows an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improved over a strong GMM/HMM system.

...read moreread less

Abstract: Recently, pre-trained deep neural networks (DNNs) have outperformed traditional acoustic models based on Gaussian mixture models (GMMs) on a variety of large vocabulary speech recognition benchmarks. Deep neural nets have also achieved excellent results on various computer vision tasks using a random “dropout” procedure that drastically improves generalization error by randomly omitting a fraction of the hidden units in all layers. Since dropout helps avoid over-fitting, it has also been successful on a small-scale phone recognition task using larger neural nets. However, training deep neural net acoustic models for large vocabulary speech recognition takes a very long time and dropout is likely to only increase training time. Neural networks with rectified linear unit (ReLU) non-linearities have been highly successful for computer vision tasks and proved faster to train than standard sigmoid units, sometimes also improving discriminative performance. In this work, we show on a 50-hour English Broadcast News task that modified deep neural networks using ReLUs trained with dropout during frame level training provide an 4.2% relative improvement over a DNN trained with sigmoid units, and a 14.4% relative improvement over a strong GMM/HMM system. We were able to obtain our results with minimal human hyper-parameter tuning using publicly available Bayesian optimization code.

...read moreread less

1,342 citations

Posted Content•

Intriguing properties of neural networks

[...]

Christian Szegedy¹, Wojciech Zaremba², Ilya Sutskever¹, Joan Bruna², Dumitru Erhan¹, Ian Goodfellow³, Rob Fergus⁴, Rob Fergus² - Show less +4 more•Institutions (4)

Google¹, New York University², Université de Montréal³, Facebook⁴

21 Dec 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: This article showed that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend, which suggests that it is the space, rather than individual units, that contains of the semantic information in the high layers of neural networks.

...read moreread less

Abstract: Deep neural networks are highly expressive models that have recently achieved state of the art performance on speech and visual recognition tasks. While their expressiveness is the reason they succeed, it also causes them to learn uninterpretable solutions that could have counter-intuitive properties. In this paper we report two such properties. First, we find that there is no distinction between individual high level units and random linear combinations of high level units, according to various methods of unit analysis. It suggests that it is the space, rather than the individual units, that contains of the semantic information in the high layers of neural networks. Second, we find that deep neural networks learn input-output mappings that are fairly discontinuous to a significant extend. We can cause the network to misclassify an image by applying a certain imperceptible perturbation, which is found by maximizing the network's prediction error. In addition, the specific nature of these perturbations is not a random artifact of learning: the same perturbation can cause a different network, that was trained on a different subset of the dataset, to misclassify the same input.

...read moreread less

1,313 citations

Proceedings Article•DOI•

New types of deep neural network learning for speech recognition and related applications: an overview

[...]

Li Deng¹, Geoffrey E. Hinton², Brian Kingsbury³•Institutions (3)

Microsoft¹, University of Toronto², IBM³

26 May 2013

TL;DR: An overview of the invited and contributed papers presented at the special session at ICASSP-2013, entitled “New Types of Deep Neural Network Learning for Speech Recognition and Related Applications,” as organized by the authors is provided.

...read moreread less

Abstract: In this paper, we provide an overview of the invited and contributed papers presented at the special session at ICASSP-2013, entitled “New Types of Deep Neural Network Learning for Speech Recognition and Related Applications,” as organized by the authors. We also describe the historical context in which acoustic models based on deep neural networks have been developed. The technical overview of the papers presented in our special session is organized into five ways of improving deep learning methods: (1) better optimization; (2) better types of neural activation function and better network architectures; (3) better ways to determine the myriad hyper-parameters of deep neural networks; (4) more appropriate ways to preprocess speech for deep neural networks; and (5) ways of leveraging multiple languages or dialects that are more easily achieved with deep neural networks than with Gaussian mixture models.

...read moreread less

1,098 citations

Proceedings Article•

Learning a Deep Compact Image Representation for Visual Tracking

[...]

Naiyan Wang¹, Dit-Yan Yeung¹•Institutions (1)

Hong Kong University of Science and Technology¹

05 Dec 2013

TL;DR: Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that the deep learning tracker is more accurate while maintaining low computational cost with real-time performance when the MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU).

...read moreread less

Abstract: In this paper, we study the challenging problem of tracking the trajectory of a moving object in a video with possibly very complex background. In contrast to most existing trackers which only learn the appearance of the tracked object online, we take a different approach, inspired by recent advances in deep learning architectures, by putting more emphasis on the (unsupervised) feature learning problem. Specifically, by using auxiliary natural images, we train a stacked de-noising autoencoder offline to learn generic image features that are more robust against variations. This is then followed by knowledge transfer from offline training to the online tracking process. Online tracking involves a classification neural network which is constructed from the encoder part of the trained autoencoder as a feature extractor and an additional classification layer. Both the feature extractor and the classifier can be further tuned to adapt to appearance changes of the moving object. Comparison with the state-of-the-art trackers on some challenging benchmark video sequences shows that our deep learning tracker is more accurate while maintaining low computational cost with real-time performance when our MATLAB implementation of the tracker is used with a modest graphics processing unit (GPU).

...read moreread less

926 citations

Proceedings Article•DOI•

Statistical parametric speech synthesis using deep neural networks

[...]

Heiga Ze¹, Andrew W. Senior¹, Mike Schuster¹•Institutions (1)

Google¹

26 May 2013

TL;DR: This paper examines an alternative scheme that is based on a deep neural network (DNN), the relationship between input texts and their acoustic realizations is modeled by a DNN, and experimental results show that the DNN- based systems outperformed the HMM-based systems with similar numbers of parameters.

...read moreread less

Abstract: Conventional approaches to statistical parametric speech synthesis typically use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech parameters given texts. Speech parameters are generated from the probability densities to maximize their output probabilities, then a speech waveform is reconstructed from the generated parameters. This approach is reasonably effective but has a couple of limitations, e.g. decision trees are inefficient to model complex context dependencies. This paper examines an alternative scheme that is based on a deep neural network (DNN). The relationship between input texts and their acoustic realizations is modeled by a DNN. The use of the DNN can address some limitations of the conventional approach. Experimental results show that the DNN-based systems outperformed the HMM-based systems with similar numbers of parameters.

...read moreread less

880 citations

Proceedings Article•

Guided Policy Search

[...]

Sergey Levine¹, Vladlen Koltun¹•Institutions (1)

Stanford University¹

16 Jun 2013

TL;DR: This work presents a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima, and shows how differential dynamic programming can be used to generate suitable guiding samples, and describes a regularized importance sampled policy optimization that incorporates these samples into the policy search.

...read moreread less

Abstract: Direct policy search can effectively scale to high-dimensional systems, but complex policies with hundreds of parameters often present a challenge for such methods, requiring numerous samples and often falling into poor local optima. We present a guided policy search algorithm that uses trajectory optimization to direct policy learning and avoid poor local optima. We show how differential dynamic programming can be used to generate suitable guiding samples, and describe a regularized importance sampled policy optimization that incorporates these samples into the policy search. We evaluate the method by learning neural network controllers for planar swimming, hopping, and walking, as well as simulated 3D humanoid running.

...read moreread less

773 citations

Posted Content•

Deep Learning using Linear Support Vector Machines

[...]

Yichuan Tang¹•Institutions (1)

University of Toronto¹

02 Jun 2013-arXiv: Learning

TL;DR: The results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.

...read moreread less

Abstract: Recently, fully-connected and convolutional neural networks have been trained to achieve state-of-the-art performance on a wide variety of tasks such as speech recognition, image classification, natural language processing, and bioinformatics. For classification tasks, most of these "deep learning" models employ the softmax activation function for prediction and minimize cross-entropy loss. In this paper, we demonstrate a small but consistent advantage of replacing the softmax layer with a linear support vector machine. Learning minimizes a margin-based loss instead of the cross-entropy loss. While there have been various combinations of neural nets and SVMs in prior art, our results using L2-SVMs show that by simply replacing softmax with linear SVMs gives significant gains on popular deep learning datasets MNIST, CIFAR-10, and the ICML 2013 Representation Learning Workshop's face expression recognition challenge.

...read moreread less

Posted Content•

An Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural Networks

[...]

Ian Goodfellow¹, Mehdi Mirza¹, Da Xiao², Aaron Courville¹, Yoshua Bengio¹ - Show less +1 more•Institutions (2)

Université de Montréal¹, Beijing University of Posts and Telecommunications²

21 Dec 2013-arXiv: Machine Learning

TL;DR: In this article, the authors investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions and find that the dropout algorithm is consistently best at adapting to the new task, remembering the old task and has the best tradeoff curve between these two extremes.

...read moreread less

Abstract: Catastrophic forgetting is a problem faced by many machine learning models and algorithms. When trained on one task, then trained on a second task, many machine learning models "forget" how to perform the first task. This is widely believed to be a serious problem for neural networks. Here, we investigate the extent to which the catastrophic forgetting problem occurs for modern neural networks, comparing both established and recent gradient-based training algorithms and activation functions. We also examine the effect of the relationship between the first task and the second task on catastrophic forgetting. We find that it is always best to train using the dropout algorithm--the dropout algorithm is consistently best at adapting to the new task, remembering the old task, and has the best tradeoff curve between these two extremes. We find that different tasks and relationships between tasks result in very different rankings of activation function performance. This suggests the choice of activation function should always be cross-validated.

...read moreread less

Journal Article•DOI•

Review on Methods to Fix Number of Hidden Neurons in Neural Networks

[...]

K. Gnana Sheela, S. N. Deepa

20 Jun 2013-Mathematical Problems in Engineering

TL;DR: The experimental results show that with minimum errors the proposed approach can be used for wind speed prediction in renewable energy systems and the perfect design of the neural network based on the selection criteria is substantiated using convergence theorem.

...read moreread less

Abstract: This paper reviews methods to fix a number of hidden neurons in neural networks for the past 20 years. And it also proposes a new method to fix the hidden neurons in Elman networks for wind speed prediction in renewable energy systems. The random selection of a number of hidden neurons might cause either overfitting or underfitting problems. This paper proposes the solution of these problems. To fix hidden neurons, 101 various criteria are tested based on the statistical errors. The results show that proposed model improves the accuracy and minimal error. The perfect design of the neural network based on the selection criteria is substantiated using convergence theorem. To verify the effectiveness of the model, simulations were conducted on real-time wind data. The experimental results show that with minimum errors the proposed approach can be used for wind speed prediction. The survey has been made for the fixation of hidden neurons in neural networks. The proposed model is simple, with minimal error, and efficient for fixation of hidden neurons in Elman networks.

...read moreread less

Proceedings Article•DOI•

Sequence-discriminative training of deep neural networks

[...]

Karel Veselý¹, Arnab Ghoshal², Lukas Burget¹, Daniel Povey³•Institutions (3)

Brno University of Technology¹, University of Edinburgh², Johns Hopkins University³

01 Aug 2013

TL;DR: Different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on a standard 300 hour American conversational telephone speech task.

...read moreread less

Abstract: Sequence-discriminative training of deep neural networks (DNNs) is investigated on a standard 300 hour American En- glish conversational telephone speech task. Different sequence- discriminative criteria — maximum mutual information (MMI), minimum phone error (MPE), state-level minimum Bayes risk (sMBR), and boosted MMI — are compared. Two different heuristics are investigated to improve the performance of the DNNs trained using sequence-based criteria — lattices are re- generated after the first iteration of training; and, for MMI and BMMI, the frames where the numerator and denominator hy- potheses are disjoint are removed from the gradient compu- tation. Starting from a competitive DNN baseline trained us- ing cross-entropy, different sequence-discriminative criteria are shown to lower word error rates by 7-9% relative, on aver- age. Little difference is noticed between the different sequence- based criteria that are investigated. The experiments are done using the open-source Kaldi toolkit, which makes it possible for the wider community to reproduce these results. Index Terms: speech recognition, deep learning, sequence- criterion training, neural networks, reproducible research

...read moreread less

Journal Article•DOI•

Parallel photonic information processing at gigabyte per second data rates using transient states

[...]

Daniel Brunner¹, Miguel C. Soriano¹, Claudio R. Mirasso¹, Ingo Fischer¹•Institutions (1)

Spanish National Research Council¹

15 Jan 2013-Nature Communications

TL;DR: The potential of a simple photonic architecture to process information at unprecedented data rates is demonstrated, implementing a learning-based approach and all digits with very low classification errors are identified and chaotic time-series prediction with 10% error is performed.

...read moreread less

Abstract: Inspired by neural networks, reservoir computing uses nonlinear transient states to perform computations, offering faster parallel information processing Brunner et al show a photonic approach to reservoir computing capable of simultaneous spoken digit and speaker recognition at high data rates

...read moreread less

Posted Content•

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

[...]

Andrew M. Saxe¹, James L. McClelland¹, Surya Ganguli¹•Institutions (1)

Stanford University¹

20 Dec 2013-arXiv: Neural and Evolutionary Computing

TL;DR: In this paper, the authors show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions.

...read moreread less

Abstract: Despite the widespread practical success of deep learning methods, our theoretical understanding of the dynamics of learning in deep neural networks remains quite sparse. We attempt to bridge the gap between the theory and practice of deep learning by systematically analyzing learning dynamics for the restricted case of deep linear neural networks. Despite the linearity of their input-output map, such networks have nonlinear gradient descent dynamics on weights that change with the addition of each new hidden layer. We show that deep linear networks exhibit nonlinear learning phenomena similar to those seen in simulations of nonlinear networks, including long plateaus followed by rapid transitions to lower error solutions, and faster convergence from greedy unsupervised pretraining initial conditions than from random initial conditions. We provide an analytical description of these phenomena by finding new exact solutions to the nonlinear dynamics of deep learning. Our theoretical analysis also reveals the surprising finding that as the depth of a network approaches infinity, learning speed can nevertheless remain finite: for a special class of initial conditions on the weights, very deep networks incur only a finite, depth independent, delay in learning speed relative to shallow networks. We show that, under certain conditions on the training data, unsupervised pretraining can find this special class of initial conditions, while scaled random Gaussian initializations cannot. We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times. We further show that these initial conditions also lead to faithful propagation of gradients even in deep nonlinear networks, as long as they operate in a special regime known as the edge of chaos.

...read moreread less

Proceedings Article•DOI•

An investigation of deep neural networks for noise robust speech recognition

[...]

Michael L. Seltzer¹, Dong Yu¹, Yongqiang Wang²•Institutions (2)

Microsoft¹, University of Cambridge²

26 May 2013

TL;DR: The noise robustness of DNN-based acoustic models can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation and can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training.

...read moreread less

Abstract: Recently, a new acoustic model based on deep neural networks (DNN) has been introduced. While the DNN has generated significant improvements over GMM-based systems on several tasks, there has been no evaluation of the robustness of such systems to environmental distortion. In this paper, we investigate the noise robustness of DNN-based acoustic models and find that they can match state-of-the-art performance on the Aurora 4 task without any explicit noise compensation. This performance can be further improved by incorporating information about the environment into DNN training using a new method called noise-aware training. When combined with the recently proposed dropout training technique, a 7.5% relative improvement over the previously best published result on this task is achieved using only a single decoding pass and no additional decoding complexity compared to a standard DNN.

...read moreread less

Journal Article•DOI•

Artificial neural networks in medical diagnosis

[...]

Filippo Amato¹, Alberto López¹, Eladia María Peña-Méndez², Petr Vaňhara¹, Aleš Hampl¹, Josef Havel¹ - Show less +2 more•Institutions (2)

Masaryk University¹, University of La Laguna²

01 Jan 2013-Journal of Applied Biomedicine

TL;DR: The philosophy, capabilities, and limitations of artificial neural networks in medical diagnosis through selected examples are reviewed and discussed.

...read moreread less

Journal Article•DOI•

Representational learning with ELMs for big data

[...]

Liyanaarachchi Lekamalage Chamara Kasun, Hongming Zhou, Guang-Bin Huang, Chi-Man Vong

01 Jan 2013-IEEE Intelligent Systems

TL;DR: Huang et al. as mentioned in this paper proposed ELM-AE, a special case of ELM, where the input is equal to output, and the randomly generated weights are chosen to be orthogonal.

...read moreread less

Abstract: Geoffrey Hinton and Pascal Vincent showed that a restricted Boltzmann machine (RBM) and auto-encoders (AE) could be used for feature engineering. These engineered features then could be used to train multiple-layer neural networks, or deep networks. Two types of deep networks based on RBM exist: the deep belief network (DBN)1 and the deep Boltzmann machine (DBM). Guang-Bin Huang and colleagues introduced the extreme learning machine (ELM) as an single-layer feed-forward neural networks (SLFN) with a fast learning speed and good generalization capability. The ELM for SLFNs shows that hidden nodes can be randomly generated. ELM-AE output weights can be determined analytically, unlike RBMs and traditional auto-encoders, which require iterative algorithms. ELM-AE can be seen as a special case of ELM, where the input is equal to output, and the randomly generated weights are chosen to be orthogonal.

...read moreread less

Proceedings Article•DOI•

Low-rank matrix factorization for Deep Neural Network training with high-dimensional output targets

[...]

Tara N. Sainath¹, Brian Kingsbury¹, Vikas Sindhwani¹, Ebru Arisoy¹, Bhuvana Ramabhadran¹ - Show less +1 more•Institutions (1)

IBM¹

26 Jun 2013

TL;DR: A low-rank matrix factorization of the final weight layer is proposed and applied to DNNs for both acoustic modeling and language modeling, showing an equivalent reduction in training time and a significant loss in final recognition accuracy compared to a full-rank representation.

...read moreread less

Abstract: While Deep Neural Networks (DNNs) have achieved tremendous success for large vocabulary continuous speech recognition (LVCSR) tasks, training of these networks is slow. One reason is that DNNs are trained with a large number of training parameters (i.e., 10-50 million). Because networks are trained with a large number of output targets to achieve good performance, the majority of these parameters are in the final weight layer. In this paper, we propose a low-rank matrix factorization of the final weight layer. We apply this low-rank technique to DNNs for both acoustic modeling and language modeling. We show on three different LVCSR tasks ranging between 50-400 hrs, that a low-rank factorization reduces the number of parameters of the network by 30-50%. This results in roughly an equivalent reduction in training time, without a significant loss in final recognition accuracy, compared to a full-rank representation.

...read moreread less

Book Chapter•DOI•

Deep Feature Learning for Knee Cartilage Segmentation Using a Triplanar Convolutional Neural Network

[...]

Adhish Prasoon¹, Kersten Petersen¹, Christian Igel¹, François Lauze¹, Erik B. Dam, Mads Nielsen¹ - Show less +2 more•Institutions (1)

University of Copenhagen¹

22 Sep 2013

TL;DR: A novel system for voxel classification integrating three 2D CNNs, which have a one-to-one association with the xy, yz and zx planes of 3D image, respectively, which performs better than a state-of-the-art method using 3D multi-scale features.

...read moreread less

Abstract: Segmentation of anatomical structures in medical images is often based on a voxel/pixel classification approach. Deep learning systems, such as convolutional neural networks (CNNs), can infer a hierarchical representation of images that fosters categorization. We propose a novel system for voxel classification integrating three 2D CNNs, which have a one-to-one association with the xy, yz and zx planes of 3D image, respectively. We applied our method to the segmentation of tibial cartilage in low field knee MRI scans and tested it on 114 unseen scans. Although our method uses only 2D features at a single scale, it performs better than a state-of-the-art method using 3D multi-scale features. In the latter approach, the features and the classifier have been carefully adapted to the problem at hand. That we were able to get better results by a deep learning architecture that autonomously learns the features from the images is the main insight of this study.

...read moreread less

Journal Article•DOI•

Stochastic Synchronization of Markovian Jump Neural Networks With Time-Varying Delay Using Sampled Data

[...]

Zheng-Guang Wu¹, Peng Shi², Hongye Su¹, Jian Chu¹•Institutions (2)

Zhejiang University¹, Victoria University, Australia²

09 Jan 2013-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: Two delay-dependent criteria are derived to ensure the stochastic stability of the error systems, and thus, the master systems stochastically synchronize with the slave systems.

...read moreread less

Abstract: In this paper, the problem of sampled-data synchronization for Markovian jump neural networks with time-varying delay and variable samplings is considered. In the framework of the input delay approach and the linear matrix inequality technique, two delay-dependent criteria are derived to ensure the stochastic stability of the error systems, and thus, the master systems stochastically synchronize with the slave systems. The desired mode-independent controller is designed, which depends upon the maximum sampling interval. The effectiveness and potential of the obtained results is verified by two simulation examples.

...read moreread less

Proceedings Article•DOI•

Ideal ratio mask estimation using deep neural networks for robust speech recognition

[...]

Arun Narayanan¹, DeLiang Wang¹•Institutions (1)

Ohio State University¹

26 May 2013

TL;DR: The proposed feature enhancement algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask.

...read moreread less

Abstract: We propose a feature enhancement algorithm to improve robust automatic speech recognition (ASR). The algorithm estimates a smoothed ideal ratio mask (IRM) in the Mel frequency domain using deep neural networks and a set of time-frequency unit level features that has previously been used to estimate the ideal binary mask. The estimated IRM is used to filter out noise from a noisy Mel spectrogram before performing cepstral feature extraction for ASR. On the noisy subset of the Aurora-4 robust ASR corpus, the proposed enhancement obtains a relative improvement of over 38% in terms of word error rates using ASR models trained in clean conditions, and an improvement of over 14% when the models are trained using the multi-condition training data. In terms of instantaneous SNR estimation performance, the proposed system obtains a mean absolute error of less than 4 dB in most frequency channels.

...read moreread less

Proceedings Article•DOI•

On rectified linear units for speech processing

[...]

Matthew D. Zeiler¹, Marc'Aurelio Ranzato², Rajat Monga², Mark Z. Mao², Ke Yang², Quoc V. Le², Patrick Nguyen², Andrew W. Senior², Vincent Vanhoucke², Jeffrey Dean², Geoffrey E. Hinton³ - Show less +7 more•Institutions (3)

New York University¹, Google², University of Toronto³

26 May 2013

TL;DR: This work shows that it can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units.

...read moreread less

Abstract: Deep neural networks have recently become the gold standard for acoustic modeling in speech recognition systems The key computational unit of a deep network is a linear projection followed by a point-wise non-linearity, which is typically a logistic function In this work, we show that we can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units These units are linear when their input is positive and zero otherwise In a supervised setting, we can successfully train very deep nets from random initialization on a large vocabulary speech recognition task achieving lower word error rates than using a logistic network with the same topology Similarly in an unsupervised setting, we show how we can learn sparse features that can be useful for discriminative tasks All our experiments are executed in a distributed environment using several hundred machines and several hundred hours of speech data

...read moreread less

Book•

Neural networks, fuzzy logic, and genetic algorithms : synthesis and applications

[...]

Sanguthevar Rajasekaran, G. A. Vijayalakshmi Pai

16 Jun 2013

TL;DR: A study of Adaptive Neural Network Control System based on Differential Evolution Algorithm.

...read moreread less

Abstract: A Study of Adaptive Neural Network Control System. Zhong, Heng Design of Fuzzy Logic Controller Based on Differential Evolution Algorithm. Shuai, Li (et al.). Neural Networks, Fuzzy Logic and Genetic Algorithms: Synthesis. Fuzzy Logic and Neural Networks: Basic Concepts and Applications. logic genetic by rajasekaran ebook. srajasekaran and ga vijayalakshmi pai neural networks. MODERN MAGNETIC MATERIALS PRINCIPLES AND APPLICATIONS PDF FREE NETWORKS FUZZY LOGIC AND GENETIC ALGORITHMS SYNTHESIS.

...read moreread less

Journal Article•DOI•

Pattern classification by memristive crossbar circuits using ex situ and in situ training

[...]

Fabien Alibart¹, Elham Zamanidoost¹, Dmitri B. Strukov¹•Institutions (1)

University of California, Santa Barbara¹

25 Jun 2013-Nature Communications

TL;DR: In this article, a single-layer perceptron network implemented with a memrisitive crossbar circuit and trained using the perceptron learning rule by ex situ and in situ methods is presented.

...read moreread less

Abstract: Memristors are memory resistors that promise the efficient implementation of synaptic weights in artificial neural networks. Whereas demonstrations of the synaptic operation of memristors already exist, the implementation of even simple networks is more challenging and has yet to be reported. Here we demonstrate pattern classification using a single-layer perceptron network implemented with a memrisitive crossbar circuit and trained using the perceptron learning rule by ex situ and in situ methods. In the first case, synaptic weights, which are realized as conductances of titanium dioxide memristors, are calculated on a precursor software-based network and then imported sequentially into the crossbar circuit. In the second case, training is implemented in situ, so the weights are adjusted in parallel. Both methods work satisfactorily despite significant variations in the switching behaviour of the memristors. These results give hope for the anticipated efficient implementation of artificial neuromorphic networks and pave the way for dense, high-performance information processing systems.

...read moreread less

Proceedings Article•DOI•

Investigation of recurrent-neural-network architectures and learning methods for spoken language understanding.

[...]

Grégoire Mesnil¹, Xiaodong He¹, Li Deng², Yoshua Bengio²•Institutions (2)

Université de Montréal¹, Microsoft²

25 Aug 2013

TL;DR: The results show that on this task, both types of recurrent networks outperform the CRF baseline substantially, and a bi-directional Jordantype network that takes into account both past and future dependencies among slots works best, outperforming a CRFbased baseline by 14% in relative error reduction.

...read moreread less

Abstract: One of the key problems in spoken language understanding (SLU) is the task of slot filling. In light of the recent success of applying deep neural network technologies in domain detection and intent identification, we carried out an in-depth investigation on the use of recurrent neural networks for the more difficult task of slot filling involving sequence discrimination. In this work, we implemented and compared several important recurrent-neural-network architectures, including the Elman-type and Jordan-type recurrent networks and their variants. To make the results easy to reproduce and compare, we implemented these networks on the common Theano neural network toolkit, and evaluated them on the ATIS benchmark. We also compared our results to a conditional random fields (CRF) baseline. Our results show that on this task, both types of recurrent networks outperform the CRF baseline substantially, and a bi-directional Jordantype network that takes into account both past and future dependencies among slots works best, outperforming a CRFbased baseline by 14% in relative error reduction.

...read moreread less

Journal Article•DOI•

Machine Learning of Molecular Electronic Properties in Chemical Compound Space

[...]

Grégoire Montavon¹, Matthias Rupp², Vivekanand V. Gobre³, Álvaro Vázquez-Mayagoitia⁴, Katja Hansen³, Alexandre Tkatchenko⁵, Alexandre Tkatchenko³, Klaus-Robert Müller¹, Klaus-Robert Müller⁶, O. Anatole von Lilienfeld⁴ - Show less +6 more•Institutions (6)

Technical University of Berlin¹, ETH Zurich², Max Planck Society³, Argonne National Laboratory⁴, Pohang University of Science and Technology⁵, Korea University⁶

04 Sep 2013-New Journal of Physics

TL;DR: In this article, a deep multi-task artificial neural network is used to predict multiple electronic ground and excited-state properties, such as atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies.

...read moreread less

Abstract: The combination of modern scientific computing with electronic structure theory can lead to an unprecedented amount of data amenable to intelligent data analysis for the identification of meaningful, novel and predictive structure?property relationships. Such relationships enable high-throughput screening for relevant properties in an exponentially growing pool of virtual compounds that are synthetically accessible. Here, we present a machine learning model, trained on a database of ab initio calculation results for thousands of organic molecules, that simultaneously predicts multiple electronic ground- and excited-state properties. The properties include atomization energy, polarizability, frontier orbital eigenvalues, ionization potential, electron affinity and excitation energies. The machine learning model is based on a deep multi-task artificial neural network, exploiting the underlying correlations between various molecular properties. The input is identical to ab initio methods, i.e.?nuclear charges and Cartesian coordinates of all atoms. For small organic molecules, the accuracy of such a ?quantum machine? is similar, and sometimes superior, to modern quantum-chemical methods?at negligible computational cost.

...read moreread less

Proceedings Article•DOI•

KL-divergence regularized deep neural network adaptation for improved large vocabulary speech recognition

[...]

Dong Yu¹, Kaisheng Yao¹, Hang Su¹, Gang Li¹, Frank Seide¹ - Show less +1 more•Institutions (1)

Microsoft¹

26 May 2013

TL;DR: Experiments demonstrate that the proposed adaptation technique can provide 2%-30% relative error reduction against the already very strong speaker independent CD-DNN-HMM systems using different adaptation sets under both supervised and unsupervised adaptation setups.

...read moreread less

Abstract: We propose a novel regularized adaptation technique for context dependent deep neural network hidden Markov models (CD-DNN-HMMs). The CD-DNN-HMM has a large output layer and many large hidden layers, each with thousands of neurons. The huge number of parameters in the CD-DNN-HMM makes adaptation a challenging task, esp. when the adaptation set is small. The technique developed in this paper adapts the model conservatively by forcing the senone distribution estimated from the adapted model to be close to that from the unadapted model. This constraint is realized by adding Kullback-Leibler divergence (KLD) regularization to the adaptation criterion. We show that applying this regularization is equivalent to changing the target distribution in the conventional backpropagation algorithm. Experiments on Xbox voice search, short message dictation, and Switchboard and lecture speech transcription tasks demonstrate that the proposed adaptation technique can provide 2%-30% relative error reduction against the already very strong speaker independent CD-DNN-HMM systems using different adaptation sets under both supervised and unsupervised adaptation setups.

...read moreread less

Proceedings Article•

Understanding Dropout

[...]

Pierre Baldi¹, Peter Sadowski¹•Institutions (1)

University of California, Irvine¹

05 Dec 2013

TL;DR: A general formalism for studying dropout on either units or connections, with arbitrary probability values, is introduced and used to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks.

...read moreread less

Abstract: Dropout is a relatively new algorithm for training neural networks which relies on stochastically "dropping out" neurons during training in order to avoid the co-adaptation of feature detectors. We introduce a general formalism for studying dropout on either units or connections, with arbitrary probability values, and use it to analyze the averaging and regularizing properties of dropout in both linear and non-linear networks. For deep neural networks, the averaging properties of dropout are characterized by three recursive equations, including the approximation of expectations by normalized weighted geometric means. We provide estimates and bounds for these approximations and corroborate the results with simulations. Among other results, we also show how dropout performs stochastic gradient descent on a regularized error function.

...read moreread less

Collapse