Showing papers on "Artificial neural network published in 2012"

PDF

Open Access

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Journal Article•DOI•

Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups

[...]

Geoffrey E. Hinton¹, Li Deng², Dong Yu², George E. Dahl¹, Abdelrahman Mohamed¹, Navdeep Jaitly¹, Andrew W. Senior³, Vincent Vanhoucke³, Patrick Nguyen³, Tara N. Sainath⁴, Brian Kingsbury⁴ - Show less +7 more•Institutions (4)

University of Toronto¹, Microsoft², Google³, IBM⁴

18 Oct 2012-IEEE Signal Processing Magazine

TL;DR: This article provides an overview of progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

Abstract: Most current speech recognition systems use hidden Markov models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. An alternative way to evaluate the fit is to use a feed-forward neural network that takes several frames of coefficients as input and produces posterior probabilities over HMM states as output. Deep neural networks (DNNs) that have many hidden layers and are trained using new methods have been shown to outperform GMMs on a variety of speech recognition benchmarks, sometimes by a large margin. This article provides an overview of this progress and represents the shared views of four research groups that have had recent successes in using DNNs for acoustic modeling in speech recognition.

...read moreread less

9,091 citations

Posted Content•

Improving neural networks by preventing co-adaptation of feature detectors

[...]

Geoffrey E. Hinton¹, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, Ruslan Salakhutdinov¹ - Show less +1 more•Institutions (1)

University of Toronto¹

03 Jul 2012-arXiv: Neural and Evolutionary Computing

TL;DR: The authors randomly omits half of the feature detectors on each training case to prevent complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors.

...read moreread less

Abstract: When a large feedforward neural network is trained on a small training set, it typically performs poorly on held-out test data. This "overfitting" is greatly reduced by randomly omitting half of the feature detectors on each training case. This prevents complex co-adaptations in which a feature detector is only helpful in the context of several other specific feature detectors. Instead, each neuron learns to detect a feature that is generally helpful for producing the correct answer given the combinatorially large variety of internal contexts in which it must operate. Random "dropout" gives big improvements on many benchmark tasks and sets new records for speech and object recognition.

...read moreread less

6,899 citations

Proceedings Article•DOI•

Multi-column deep neural networks for image classification

[...]

Dan Ciresan¹, Ueli Meier¹, Jürgen Schmidhuber¹•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

16 Jun 2012

TL;DR: In this paper, a biologically plausible, wide and deep artificial neural network architectures was proposed to match human performance on tasks such as the recognition of handwritten digits or traffic signs, achieving near-human performance.

...read moreread less

Abstract: Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.

...read moreread less

3,717 citations

Proceedings Article•

Large Scale Distributed Deep Networks

[...]

Jeffrey Dean¹, Greg S. Corrado¹, Rajat Monga¹, Kai Chen¹, Matthieu Devin¹, Mark Z. Mao¹, Marc'Aurelio Ranzato¹, Andrew W. Senior¹, Paul A. Tucker¹, Ke Yang¹, Quoc V. Le¹, Andrew Y. Ng¹ - Show less +8 more•Institutions (1)

Google¹

03 Dec 2012

TL;DR: This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.

...read moreread less

Abstract: Recent work in unsupervised feature learning and deep learning has shown that being able to train large models can dramatically improve performance. In this paper, we consider the problem of training a deep network with billions of parameters using tens of thousands of CPU cores. We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas, and (ii) Sandblaster, a framework that supports a variety of distributed batch optimization procedures, including a distributed implementation of L-BFGS. Downpour SGD and Sandblaster L-BFGS both increase the scale and speed of deep network training. We have successfully used our system to train a deep network 30x larger than previously reported in the literature, and achieves state-of-the-art performance on ImageNet, a visual object recognition task with 16 million images and 21k categories. We show that these same techniques dramatically accelerate the training of a more modestly- sized deep network for a commercial speech recognition service. Although we focus on and report performance of these methods as applied to training large neural networks, the underlying algorithms are applicable to any gradient-based machine learning algorithm.

...read moreread less

3,475 citations

Journal Article•DOI•

Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition

[...]

George E. Dahl¹, Dong Yu², Li Deng², Alex Acero²•Institutions (2)

University of Toronto¹, Microsoft²

01 Jan 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output that can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs.

...read moreread less

Abstract: We propose a novel context-dependent (CD) model for large-vocabulary speech recognition (LVSR) that leverages recent advances in using deep belief networks for phone recognition. We describe a pre-trained deep neural network hidden Markov model (DNN-HMM) hybrid architecture that trains the DNN to produce a distribution over senones (tied triphone states) as its output. The deep belief network pre-training algorithm is a robust and often helpful way to initialize deep neural networks generatively that can aid in optimization and reduce generalization error. We illustrate the key components of our model, describe the procedure for applying CD-DNN-HMMs to LVSR, and analyze the effects of various modeling choices on performance. Experiments on a challenging business search dataset demonstrate that CD-DNN-HMMs can significantly outperform the conventional context-dependent Gaussian mixture model (GMM)-HMMs, with an absolute sentence accuracy improvement of 5.8% and 9.2% (or relative error reduction of 16.0% and 23.2%) over the CD-GMM-HMMs trained using the minimum phone error rate (MPE) and maximum-likelihood (ML) criteria, respectively.

...read moreread less

3,120 citations

Journal Article•

Deep Neural Networks for Acoustic Modeling in Speech Recognition

[...]

Geoffrey E. Hinton, Li Deng, Dong Yu, George E. Dahl, Abdelrahman Mohamed, Navdeep Jaitly, Andrew W. Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N. Sainath, Brian Kingsbury - Show less +7 more

01 Nov 2012-IEEE Signal Processing Magazine

TL;DR: This paper provides an overview of this progress and repres nts the shared views of four research groups who have had recent successes in using deep neural networks for a coustic modeling in speech recognition.

...read moreread less

2,527 citations

Artificial neural networks

[...]

Andrea Roli

09 Mar 2012

TL;DR: Artificial neural networks (ANNs) constitute a class of flexible nonlinear models designed to mimic biological neural systems as mentioned in this paper, and they have been widely used in computer vision applications.

...read moreread less

Abstract: Artificial neural networks (ANNs) constitute a class of flexible nonlinear models designed to mimic biological neural systems. In this entry, we introduce ANN using familiar econometric terminology and provide an overview of ANN modeling approach and its implementation methods. † Correspondence: Chung-Ming Kuan, Institute of Economics, Academia Sinica, 128 Academia Road, Sec. 2, Taipei 115, Taiwan; ckuan@econ.sinica.edu.tw. †† I would like to express my sincere gratitude to the editor, Professor Steven Durlauf, for his patience and constructive comments on early drafts of this entry. I also thank Shih-Hsun Hsu and Yu-Lieh Huang for very helpful suggestions. The remaining errors are all mine.

...read moreread less

2,069 citations

Proceedings Article•DOI•

LSTM Neural Networks for Language Modeling.

[...]

Martin Sundermeyer¹, Ralf Schlüter¹, Hermann Ney¹•Institutions (1)

RWTH Aachen University¹

01 Jan 2012

TL;DR: This work analyzes the Long Short-Term Memory neural network architecture on an English and a large French language modeling task and gains considerable improvements in WER on top of a state-of-the-art speech recognition system.

...read moreread less

Abstract: Neural networks have become increasingly popular for the task of language modeling. Whereas feed-forward networks only exploit a fixed context length to predict the next word of a sequence, conceptually, standard recurrent neural networks can take into account all of the predecessor words. On the other hand, it is well known that recurrent networks are difficult to train and therefore are unlikely to show the full potential of recurrent models. These problems are addressed by a the Long Short-Term Memory neural network architecture. In this work, we analyze this type of network on an English and a large French language modeling task. Experiments show improvements of about 8 % relative in perplexity over standard recurrent neural network LMs. In addition, we gain considerable improvements in WER on top of a state-of-the-art speech recognition system.

...read moreread less

1,966 citations

Journal Article•DOI•

Acoustic Modeling Using Deep Belief Networks

[...]

Abdelrahman Mohamed¹, George E. Dahl¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Jan 2012-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: It is shown that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters.

...read moreread less

Abstract: Gaussian mixture models are currently the dominant technique for modeling the emission distribution of hidden Markov models for speech recognition. We show that better phone recognition on the TIMIT dataset can be achieved by replacing Gaussian mixture models by deep neural networks that contain many layers of features and a very large number of parameters. These networks are first pre-trained as a multi-layer generative model of a window of spectral feature vectors without making use of any discriminative information. Once the generative pre-training has designed the features, we perform discriminative fine-tuning using backpropagation to adjust the features slightly to make them better at predicting a probability distribution over the states of monophone hidden Markov models.

...read moreread less

1,767 citations

Book Chapter•DOI•

Stochastic Gradient Descent Tricks

[...]

Léon Bottou¹•Institutions (1)

Microsoft¹

01 Jan 2012

TL;DR: This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations.

...read moreread less

Abstract: Chapter 1 strongly advocates the stochastic back-propagation method to train neural networks. This is in fact an instance of a more general technique called stochastic gradient descent (SGD). This chapter provides background material, explains why SGD is a good learning algorithm when the training set is large, and provides useful recommendations.

...read moreread less

Posted Content•

Practical recommendations for gradient-based training of deep architectures

[...]

Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

24 Jun 2012-arXiv: Learning

TL;DR: Overall, this chapter describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks and closes with open questions about the training difficulties observed with deeper architectures.

...read moreread less

Abstract: Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyper-parameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.

...read moreread less

Proceedings Article•

Deep Neural Networks Segment Neuronal Membranes in Electron Microscopy Images

[...]

Dan Ciresan¹, Alessandro Giusti¹, Luca Maria Gambardella¹, Jürgen Schmidhuber¹•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

03 Dec 2012

TL;DR: This work addresses a central problem of neuroanatomy, namely, the automatic segmentation of neuronal structures depicted in stacks of electron microscopy images, using a special type of deep artificial neural network as a pixel classifier to segment biological neuron membranes.

...read moreread less

Abstract: We address a central problem of neuroanatomy, namely, the automatic segmentation of neuronal structures depicted in stacks of electron microscopy (EM) images. This is necessary to efficiently map 3D brain structure and connectivity. To segment biological neuron membranes, we use a special type of deep artificial neural network as a pixel classifier. The label of each pixel (membrane or non-membrane) is predicted from raw pixel values in a square window centered on it. The input layer maps each window pixel to a neuron. It is followed by a succession of convolutional and max-pooling layers which preserve 2D information and extract features with increasing levels of abstraction. The output layer produces a calibrated probability for each class. The classifier is trained by plain gradient descent on a 512 × 512 × 30 stack with known ground truth, and tested on a stack of the same size (ground truth unknown to the authors) by the organizers of the ISBI 2012 EM Segmentation Challenge. Even without problem-specific postprocessing, our approach outperforms competing techniques by a large margin in all three considered metrics, i.e. rand error, warping error and pixel error. For pixel error, our approach is the only one outperforming a second human observer.

...read moreread less

Book Chapter•DOI•

Practical recommendations for gradient-based training of deep architectures

[...]

Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

24 Jun 2012

TL;DR: In this article, the authors present a practical guide with recommendations for some of the most commonly used hyperparameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization.

...read moreread less

Abstract: Learning algorithms related to artificial neural networks and in particular for Deep Learning may seem to involve many bells and whistles, called hyper-parameters. This chapter is meant as a practical guide with recommendations for some of the most commonly used hyperparameters, in particular in the context of learning algorithms based on back-propagated gradient and gradient-based optimization. It also discusses how to deal with the fact that more interesting results can be obtained when allowing one to adjust many hyper-parameters. Overall, it describes elements of the practice used to successfully and efficiently train and debug large-scale and often deep multi-layer neural networks. It closes with open questions about the training difficulties observed with deeper architectures.

...read moreread less

Book•DOI•

Neural Networks: Tricks of the Trade

[...]

Grgoire Montavon, Genevive Orr, Klaus-Robert Mller

06 Nov 2012

TL;DR: The second edition of the book augments the first edition with more tricks, which have resulted from 14 years of theory and experimentation by some of the world's most prominent neural network researchers.

...read moreread less

Abstract: The twenty last years have been marked by an increase in available data and computing power In parallel to this trend, the focus of neural network research and the practice of training neural networks has undergone a number of important changes, for example, use of deep learning machines The second edition of the book augments the first edition with more tricks, which have resulted from 14 years of theory and experimentation by some of the world's most prominent neural network researchers These tricks can make a substantial difference (in terms of speed, ease of implementation, and accuracy) when it comes to putting algorithms to work on real problems

...read moreread less

Journal Article•DOI•

2012 Special Issue: Multi-column deep neural network for traffic sign classification

[...]

Dan Ciresan¹, Ueli Meier¹, Jonathan Masci¹, Jürgen Schmidhuber¹•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

01 Aug 2012-Neural Networks

TL;DR: This work uses a fast, fully parameterizable GPU implementation of a Deep Neural Network (DNN) that does not require careful design of pre-wired feature extractors, which are rather learned in a supervised way.

...read moreread less

Proceedings Article•DOI•

Applying Convolutional Neural Networks concepts to hybrid NN-HMM model for speech recognition

[...]

Ossama Abdel-Hamid¹, Abdelrahman Mohamed², Hui Jiang¹, Gerald Penn²•Institutions (2)

York University¹, University of Toronto²

25 Mar 2012

TL;DR: The proposed CNN architecture is applied to speech recognition within the framework of hybrid NN-HMM model to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance.

...read moreread less

Abstract: Convolutional Neural Networks (CNN) have showed success in achieving translation invariance for many image processing tasks. The success is largely attributed to the use of local filtering and max-pooling in the CNN architecture. In this paper, we propose to apply CNN to speech recognition within the framework of hybrid NN-HMM model. We propose to use local filtering and max-pooling in frequency domain to normalize speaker variance to achieve higher multi-speaker speech recognition performance. In our method, a pair of local filtering layer and max-pooling layer is added at the lowest end of neural network (NN) to normalize spectral variations of speech signals. In our experiments, the proposed CNN architecture is evaluated in a speaker independent speech recognition task using the standard TIMIT data sets. Experimental results show that the proposed CNN method can achieve over 10% relative error reduction in the core TIMIT test sets when comparing with a regular NN using the same number of hidden layers and weights. Our results also show that the best result of the proposed CNN model is better than previously published results on the same TIMIT test sets that use a pre-trained deep NN model.

...read moreread less

Proceedings Article•

End-to-end text recognition with convolutional neural networks

[...]

Tao Wang¹, David J. Wu¹, Adam Coates¹, Andrew Y. Ng¹•Institutions (1)

Stanford University¹

01 Nov 2012

TL;DR: This paper combines the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows them to use a common framework to train highly-accurate text detector and character recognizer modules.

...read moreread less

Abstract: Full end-to-end text recognition in natural images is a challenging problem that has received much attention recently. Traditional systems in this area have relied on elaborate models incorporating carefully hand-engineered features or large amounts of prior knowledge. In this paper, we take a different route and combine the representational power of large, multilayer neural networks together with recent developments in unsupervised feature learning, which allows us to use a common framework to train highly-accurate text detector and character recognizer modules. Then, using only simple off-the-shelf methods, we integrate these two modules into a full end-to-end, lexicon-driven, scene text recognition system that achieves state-of-the-art performance on standard benchmarks, namely Street View Text and ICDAR 2003.

...read moreread less

Journal Article•DOI•

Coupled Dictionary Training for Image Super-Resolution

[...]

Jianchao Yang¹, Zhaowen Wang¹, Zhe Lin², Scott Cohen², Thomas S. Huang¹ - Show less +1 more•Institutions (2)

University of Illinois at Urbana–Champaign¹, Adobe Systems²

01 Aug 2012-IEEE Transactions on Image Processing

TL;DR: This paper demonstrates that the coupled dictionary learning method can outperform the existing joint dictionary training method both quantitatively and qualitatively and speed up the algorithm approximately 10 times by learning a neural network model for fast sparse inference and selectively processing only those visually salient regions.

...read moreread less

Abstract: In this paper, we propose a novel coupled dictionary training method for single-image super-resolution (SR) based on patchwise sparse recovery, where the learned couple dictionaries relate the low- and high-resolution (HR) image patch spaces via sparse representation. The learning process enforces that the sparse representation of a low-resolution (LR) image patch in terms of the LR dictionary can well reconstruct its underlying HR image patch with the dictionary in the high-resolution image patch space. We model the learning problem as a bilevel optimization problem, where the optimization includes an l1-norm minimization problem in its constraints. Implicit differentiation is employed to calculate the desired gradient for stochastic gradient descent. We demonstrate that our coupled dictionary learning method can outperform the existing joint dictionary training method both quantitatively and qualitatively. Furthermore, for real applications, we speed up the algorithm approximately 10 times by learning a neural network model for fast sparse inference and selectively processing only those visually salient regions. Extensive experimental comparisons with state-of-the-art SR algorithms validate the effectiveness of our proposed approach.

...read moreread less

Proceedings Article•

Convolutional-Recursive Deep Learning for 3D Object Classification

[...]

Richard Socher¹, Brody Huval¹, Bharath Bath¹, Christopher D. Manning¹, Andrew Y. Ng¹ - Show less +1 more•Institutions (1)

Stanford University¹

03 Dec 2012

TL;DR: This work introduces a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images, which obtains state of the art performance on a standardRGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs.

...read moreread less

Abstract: Recent advances in 3D sensing technologies make it possible to easily record color and depth images which together can improve object recognition. Most current methods rely on very well-designed features for this new 3D modality. We introduce a model based on a combination of convolutional and recursive neural networks (CNN and RNN) for learning features and classifying RGB-D images. The CNN layer learns low-level translationally invariant features which are then given as inputs to multiple, fixed-tree RNNs in order to compose higher order features. RNNs can be seen as combining convolution and pooling into one efficient, hierarchical operation. Our main result is that even RNNs with random weights compose powerful features. Our model obtains state of the art performance on a standard RGB-D object dataset while being more accurate and faster during training and testing than comparable architectures such as two-layer CNNs.

...read moreread less

Proceedings Article•DOI•

Neural Acceleration for General-Purpose Approximate Programs

[...]

Hadi Esmaeilzadeh¹, Adrian Sampson¹, Luis Ceze¹, Doug Burger²•Institutions (2)

University of Washington¹, Microsoft²

01 Dec 2012

TL;DR: A programming model is defined that allows programmers to identify approximable code regions -- code that can produce imprecise but acceptable results and is faster and more energy efficient than executing the original code.

...read moreread less

Abstract: This paper describes a learning-based approach to the acceleration of approximate programs. We describe the \emph{Parrot transformation}, a program transformation that selects and trains a neural network to mimic a region of imperative code. After the learning phase, the compiler replaces the original code with an invocation of a low-power accelerator called a \emph{neural processing unit} (NPU). The NPU is tightly coupled to the processor pipeline to accelerate small code regions. Since neural networks produce inherently approximate results, we define a programming model that allows programmers to identify approximable code regions -- code that can produce imprecise but acceptable results. Offloading approximable code regions to NPUs is faster and more energy efficient than executing the original code. For a set of diverse applications, NPU acceleration provides whole-application speedup of 2.3x and energy savings of 3.0x on average with quality loss of at most 9.6%.

...read moreread less

Journal Article•DOI•

An efficient learning procedure for deep boltzmann machines

[...]

Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Aug 2012-Neural Computation

TL;DR: A new learning algorithm for Boltzmann machines that contain many layers of hidden variables is presented and results on the MNIST and NORB data sets are presented showing that deep BoltZmann machines learn very good generative models of handwritten digits and 3D objects.

...read moreread less

Abstract: We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer pretraining phase that initializes the weights sensibly. The pretraining also allows the variational inference to be initialized sensibly with a single bottom-up pass. We present results on the MNIST and NORB data sets showing that deep Boltzmann machines learn very good generative models of handwritten digits and 3D objects. We also show that the features discovered by deep Boltzmann machines are a very effective way to initialize the hidden layers of feedforward neural nets, which are then discriminatively fine-tuned.

...read moreread less

Journal Article•DOI•

Training feedforward neural networks using hybrid particle swarm optimization and gravitational search algorithm

[...]

Seyedali Mirjalili¹, Siti Zaiton Mohd Hashim¹, Hossein Moradian Sardroudi¹•Institutions (1)

Universiti Teknologi Malaysia¹

15 Jul 2012-Applied Mathematics and Computation

TL;DR: The experimental results show that PSOGSA outperforms both PSO and GSA for training FNNs in terms of converging speed and avoiding local minima and it is also proven that an FNN trained withPSOGSA has better accuracy than one trained with GSA.

...read moreread less

Book Chapter•DOI•

Brief Introduction of Back Propagation (BP) Neural Network Algorithm and Its Improvement

[...]

Jing Li, Ji-hang Cheng, Jing-yuan Shi, Fei Huang

01 Jan 2012

TL;DR: This paper focuses on the analysis of the characteristics and mathematical theory of BP neural network and also points out the shortcomings of BP algorithm as well as several methods for improvement.

...read moreread less

Abstract: The back propagation (BP) neural network algorithm is a multi-layer feedforward network trained according to error back propagation algorithm and is one of the most widely applied neural network models. BP network can be used to learn and store a great deal of mapping relations of input-output model, and no need to disclose in advance the mathematical equation that describes these mapping relations. Its learning rule is to adopt the steepest descent method in which the back propagation is used to regulate the weight value and threshold value of the network to achieve the minimum error sum of square. This paper focuses on the analysis of the characteristics and mathematical theory of BP neural network and also points out the shortcomings of BP algorithm as well as several methods for improvement.

...read moreread less

Journal Article•DOI•

Selection of Proper Neural Network Sizes and Architectures—A Comparative Study

[...]

David Hunter¹, Hao Yu¹, Michael S. Pukish¹, Janusz Kolbusz, Bogdan M. Wilamowski¹ - Show less +1 more•Institutions (1)

Auburn University¹

14 Feb 2012-IEEE Transactions on Industrial Informatics

TL;DR: Different learning algorithms, including the Error Back Propagation algorithm, the Levenberg Marquardt (LM) algorithm, and the recently developed Neuron-by-Neuron (NBN) algorithm are discussed and compared based on several benchmark problems.

...read moreread less

Abstract: One of the major difficulties facing researchers using neural networks is the selection of the proper size and topology of the networks. The problem is even more complex because often when the neural network is trained to very small errors, it may not respond properly for patterns not used in the training process. A partial solution proposed to this problem is to use the least possible number of neurons along with a large number of training patterns. The discussion consists of three main parts: first, different learning algorithms, including the Error Back Propagation (EBP) algorithm, the Levenberg Marquardt (LM) algorithm, and the recently developed Neuron-by-Neuron (NBN) algorithm, are discussed and compared based on several benchmark problems; second, the efficiency of different network topologies, including traditional Multilayer Perceptron (MLP) networks, Bridged Multilayer Perceptron (BMLP) networks, and Fully Connected Cascade (FCC) networks, are evaluated by both theoretical analysis and experimental results; third, the generalization issue is discussed to illustrate the importance of choosing the proper size of neural networks.

...read moreread less

Journal Article•DOI•

Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming

[...]

Ding Wang¹, Derong Liu¹, Qinglai Wei¹, Dongbin Zhao¹, Ning Jin² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, University of Illinois at Chicago²

01 Aug 2012-Automatica

TL;DR: An intelligent-optimal control scheme for unknown nonaffine nonlinear discrete-time systems with discount factor in the cost function is developed and implemented via globalized dual heuristic programming technique.

...read moreread less

Proceedings Article•

Recurrent Neural Networks for Noise Reduction in Robust ASR

[...]

Andrew L. Maas¹, Quoc V. Le¹, Tyler M. O'Neil¹, Oriol Vinyals², Patrick Nguyen³, Andrew Y. Ng¹ - Show less +2 more•Institutions (3)

Stanford University¹, University of California, Berkeley², Google³

01 Jan 2012

TL;DR: This work introduces a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR, and demonstrates the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

...read moreread less

Abstract: Recent work on deep neural networks as acoustic models for automatic speech recognition (ASR) have demonstrated substantial performance improvements. We introduce a model which uses a deep recurrent auto encoder neural network to denoise input features for robust ASR. The model is trained on stereo (noisy and clean) audio features to predict clean features given noisy input. The model makes no assumptions about how noise affects the signal, nor the existence of distinct noise environments. Instead, the model can learn to model any type of distortion or additive noise given sufficient training data. We demonstrate the model is competitive with existing feature denoising approaches on the Aurora2 task, and outperforms a tandem approach where deep networks are used to predict phoneme posteriors directly.

...read moreread less

Book Chapter•DOI•

Computing with Spiking Neuron Networks

[...]

Hélène Paugam-Moisy, Sander M. Bohte¹•Institutions (1)

Centrum Wiskunde & Informatica¹

01 Jan 2012

TL;DR: This chapter relates theory of the “spiking neuron” in Section 1 and summarizes the most currently-in-use models of neurons and synaptic plasticity in Section 2, and addresses the computational power and problem of learning in networks of spiking neurons.

...read moreread less

Abstract: Spiking Neuron Networks (SNNs) are often referred to as the 3rd gener- ation of neural networks. Highly inspired from natural computing in the brain and recent advances in neurosciences, they derive their strength and interest from an ac- curate modeling of synaptic interactions between neurons, taking into account the time of spike firing. SNNs overcome the computational power of neural networks made of threshold or sigmoidal units. Based on dynamic event-driven processing, they open up new horizons for developing models with an exponential capacity of memorizing and a strong ability to fast adaptation. Today, the main challenge is to discover efficient learning rules that might take advantage of the specific features of SNNs while keeping the nice properties (general-purpose, easy-to-use, available simulators, etc.) of traditional connectionist models. This chapter relates the his- tory of the “spiking neuron” in Section 1 and summarizes the most currently-in-use models of neurons and synaptic plasticity in Section 2. The computational power of SNNs is addressed in Section 3 and the problem of learning in networks of spiking neurons is tackled in Section 4, with insights into the tracks currently explored for solving it. Finally, Section 5 discusses application domains, implementation issues and proposes several simulation frameworks.

...read moreread less

Journal Article•DOI•

Global Stability of Complex-Valued Recurrent Neural Networks With Time-Delays

[...]

Jin Hu¹, Jun Wang¹•Institutions (1)

The Chinese University of Hong Kong¹

03 May 2012-IEEE Transactions on Neural Networks

TL;DR: Several sufficient conditions derived are presented to ascertain the existence of unique equilibrium, global asymptotic stability, and global exponential stability of delayedcomplex-valued recurrent neural networks with two classes of complex-valued activation functions.

...read moreread less

Abstract: Since the last decade, several complex-valued neural networks have been developed and applied in various research areas. As an extension of real-valued recurrent neural networks, complex-valued recurrent neural networks use complex-valued states, connection weights, or activation functions with much more complicated properties than real-valued ones. This paper presents several sufficient conditions derived to ascertain the existence of unique equilibrium, global asymptotic stability, and global exponential stability of delayed complex-valued recurrent neural networks with two classes of complex-valued activation functions. Simulation results of three numerical examples are also delineated to substantiate the effectiveness of the theoretical results.

...read moreread less

Journal Article•DOI•

Memristor Bridge Synapse-Based Neural Network and Its Learning

[...]

Shyam Prasad Adhikari¹, Changju Yang¹, Hyongsuk Kim¹, Leon O. Chua²•Institutions (2)

Chonbuk National University¹, University of California, Berkeley²

05 Jul 2012-IEEE Transactions on Neural Networks

TL;DR: The use of memristor bridge synapse in the proposed architecture solves one of the major problems, regarding nonvolatile weight storage in analog neural network implementations, and a modified chip-in-the-loop learning scheme suitable for the proposed neural network architecture is proposed.

...read moreread less

Abstract: Analog hardware architecture of a memristor bridge synapse-based multilayer neural network and its learning scheme is proposed. The use of memristor bridge synapse in the proposed architecture solves one of the major problems, regarding nonvolatile weight storage in analog neural network implementations. To compensate for the spatial nonuniformity and nonideal response of the memristor bridge synapse, a modified chip-in-the-loop learning scheme suitable for the proposed neural network architecture is also proposed. In the proposed method, the initial learning is conducted in software, and the behavior of the software-trained network is learned by the hardware network by learning each of the single-layered neurons of the network independently. The forward calculation of the single-layered neuron learning is implemented on circuit hardware, and followed by a weight updating phase assisted by a host computer. Unlike conventional chip-in-the-loop learning, the need for the readout of synaptic weights for calculating weight updates in each epoch is eliminated by virtue of the memristor bridge synapse and the proposed learning scheme. The hardware architecture along with the successful implementation of proposed learning on a three-bit parity network, and on a car detection network is also presented.

...read moreread less

Collapse