scispace - formally typeset
Search or ask a question
Journal ArticleDOI

A fast learning algorithm for deep belief nets

01 Jul 2006-Neural Computation (MIT Press)-Vol. 18, Iss: 7, pp 1527-1554
TL;DR: A fast, greedy algorithm is derived that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory.
Abstract: We show how to use "complementary priors" to eliminate the explaining-away effects that make inference difficult in densely connected belief nets that have many hidden layers. Using complementary priors, we derive a fast, greedy algorithm that can learn deep, directed belief networks one layer at a time, provided the top two layers form an undirected associative memory. The fast, greedy algorithm is used to initialize a slower learning procedure that fine-tunes the weights using a contrastive version of the wake-sleep algorithm. After fine-tuning, a network with three hidden layers forms a very good generative model of the joint distribution of handwritten digit images and their labels. This generative model gives better digit classification than the best discriminative learning algorithms. The low-dimensional manifolds on which the digits lie are modeled by long ravines in the free-energy landscape of the top-level associative memory, and it is easy to explore these ravines by using the directed connections to display what the associative memory has in mind.

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI
01 Jan 2019
TL;DR: This chapter covers the crucial machine learning techniques required to understand the remained of the book: namely neural networks.
Abstract: This chapter covers the crucial machine learning techniques required to understand the remained of the book: namely neural networks Readers already familiar with neural networks can freely skip this chapter Readers interested in a more comprehensive coverage of all aspects of machine learning are referred to the many textbooks on this subject matter

3 citations

Journal ArticleDOI
TL;DR: It is concluded that integrating the ML technique for dimension reduction and the DL technique for feature extraction can improve multi-parameter manufacturing quality predictions.
Abstract: Manufacturing quality prediction can be used to design better parameters at an earlier production stage. However, in complex manufacturing processes, prediction performance is affected by multi-parameter inputs. To address this issue, a deep regression framework based on manifold learning (MDRN) is proposed in this paper. The multi-parameter inputs (i.e., high-dimensional information) were firstly analyzed using manifold learning (ML), which is an effective nonlinear technique for low-dimensional feature extraction that can enhance the representation of multi-parameter inputs and reduce calculation burdens. The features obtained through the ML were then learned by a deep learning architecture (DL). It can learn sufficient features of the pattern between manufacturing quality and the low-dimensional information in an unsupervised framework, which has been proven to be effective in many fields. Finally, the learned features were inputted into the regression network, and manufacturing quality predictions were made. One type (two cases) of machinery parts manufacturing system was investigated in order to estimate the performance of the proposed MDRN with three comparisons. The experiments showed that the MDRN overwhelmed all the peer methods in terms of mean absolute percentage error, root-mean-square error, and threshold statistics. Based on these results, we conclude that integrating the ML technique for dimension reduction and the DL technique for feature extraction can improve multi-parameter manufacturing quality predictions.

3 citations

Proceedings ArticleDOI
08 Jul 2018
TL;DR: An autonomous object tracking methodological framework that adopts the convolutional neural network and the stacked denoising autoencoder as opposed to the most frequently used tracking algorithms that only learn the appearance of the tracked object is explored.
Abstract: Object tracking which refers to automatic estimation of the trajectory is a challenging problem. To track the object robustly and efficiently, we explored an autonomous object tracking methodological framework that adopts the deep learning architectures, specifically the convolutional neural network (CNN) and the stacked denoising autoencoder (SDAE), as opposed to the most frequently used tracking algorithms that only learn the appearance of the tracked object. Moreover, we conduct a comparative study of both approaches in terms of tracking accuracy and efficiency. The results show that the features learned by both CNN and SDAE are very supportive in object tracking problem and the detailed comparisons are demonstrated in this work.

3 citations


Cites background from "A fast learning algorithm for deep ..."

  • ...SDAE came from the work in [12] and was based on the deep belief network (DBN) [9]....

    [...]

Journal ArticleDOI
TL;DR: Although sex can be distinguished using CFP even in elementary school students, the discrimination accuracy was relatively low and some sex difference in the ocular fundus may begin after the age of 10 years.
Abstract: Recently, artificial intelligence has been used to determine sex using fundus photographs alone. We had earlier reported that sex can be distinguished using known factors obtained from color fundus photography (CFP) in adult eyes. However, it is not clear when the sex difference in fundus parameters begins. Therefore, we conducted this study to investigate sex determination based on fundus parameters using binominal logistic regression in elementary school students. This prospective observational cross-sectional study was conducted on 119 right eyes of elementary school students (aged 8 or 9 years, 59 boys and 60 girls). Through CFP, the tessellation fundus index was calculated as R/(R + G + B) using the mean value of red-green-blue intensity in the eight locations around the optic disc. Optic disc ovality ratio, papillomacular angle, retinal artery trajectory, and retinal vessel were quantified based on our earlier reports. Regularized binomial logistic regression was applied to these variables to select the decisive factors. Furthermore, its discriminative performance was evaluated using the leave-one-out cross-validation method. Sex difference in the parameters was assessed using the Mann–Whitney U test. The optimal model yielded by the Ridge binomial logistic regression suggested that the ovality ratio of girls was significantly smaller, whereas their nasal green and blue intensities were significantly higher, than those of boys. Using this approach, the area under the receiver-operating characteristic curve was 63.2%. Although sex can be distinguished using CFP even in elementary school students, the discrimination accuracy was relatively low. Some sex difference in the ocular fundus may begin after the age of 10 years.

3 citations


Cites background from "A fast learning algorithm for deep ..."

  • ...For instance, the development of deep learning [1] has made it possible to diagnose ocular diseases, such as diabetic retinopathy [2–8] and glaucoma, [2, 9–16] using a fundus photograph....

    [...]

References
More filters
Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Book
01 Jan 1988
TL;DR: Probabilistic Reasoning in Intelligent Systems as mentioned in this paper is a complete and accessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty, and provides a coherent explication of probability as a language for reasoning with partial belief.
Abstract: From the Publisher: Probabilistic Reasoning in Intelligent Systems is a complete andaccessible account of the theoretical foundations and computational methods that underlie plausible reasoning under uncertainty. The author provides a coherent explication of probability as a language for reasoning with partial belief and offers a unifying perspective on other AI approaches to uncertainty, such as the Dempster-Shafer formalism, truth maintenance systems, and nonmonotonic logic. The author distinguishes syntactic and semantic approaches to uncertainty—and offers techniques, based on belief networks, that provide a mechanism for making semantics-based systems operational. Specifically, network-propagation techniques serve as a mechanism for combining the theoretical coherence of probability theory with modern demands of reasoning-systems technology: modular declarative inputs, conceptually meaningful inferences, and parallel distributed computation. Application areas include diagnosis, forecasting, image interpretation, multi-sensor fusion, decision support systems, plan recognition, planning, speech recognition—in short, almost every task requiring that conclusions be drawn from uncertain clues and incomplete information. Probabilistic Reasoning in Intelligent Systems will be of special interest to scholars and researchers in AI, decision theory, statistics, logic, philosophy, cognitive psychology, and the management sciences. Professionals in the areas of knowledge-based systems, operations research, engineering, and statistics will find theoretical and computational tools of immediate practical use. The book can also be used as an excellent text for graduate-level courses in AI, operations research, or applied probability.

15,671 citations

Journal ArticleDOI
TL;DR: This paper presents work on computing shape models that are computationally fast and invariant basic transformations like translation, scaling and rotation, and proposes shape detection using a feature called shape context, which is descriptive of the shape of the object.
Abstract: We present a novel approach to measuring similarity between shapes and exploit it for object recognition. In our framework, the measurement of similarity is preceded by: (1) solving for correspondences between points on the two shapes; (2) using the correspondences to estimate an aligning transform. In order to solve the correspondence problem, we attach a descriptor, the shape context, to each point. The shape context at a reference point captures the distribution of the remaining points relative to it, thus offering a globally discriminative characterization. Corresponding points on two similar shapes will have similar shape contexts, enabling us to solve for correspondences as an optimal assignment problem. Given the point correspondences, we estimate the transformation that best aligns the two shapes; regularized thin-plate splines provide a flexible class of transformation maps for this purpose. The dissimilarity between the two shapes is computed as a sum of matching errors between corresponding points, together with a term measuring the magnitude of the aligning transform. We treat recognition in a nearest-neighbor classification framework as the problem of finding the stored prototype shape that is maximally similar to that in the image. Results are presented for silhouettes, trademarks, handwritten digits, and the COIL data set.

6,693 citations

Journal ArticleDOI
TL;DR: A product of experts (PoE) is an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary because it is hard even to approximate the derivatives of the renormalization term in the combination rule.
Abstract: It is possible to combine multiple latent-variable models of the same data by multiplying their probability distributions together and then renormalizing. This way of combining individual "expert" models makes it hard to generate samples from the combined model but easy to infer the values of the latent variables of each expert, because the combination rule ensures that the latent variables of different experts are conditionally independent when given the data. A product of experts (PoE) is therefore an interesting candidate for a perceptual system in which rapid inference is vital and generation is unnecessary. Training a PoE by maximizing the likelihood of the data is difficult because it is hard even to approximate the derivatives of the renormalization term in the combination rule. Fortunately, a PoE can be trained using a different objective function called "contrastive divergence" whose derivatives with regard to the parameters can be approximated accurately and efficiently. Examples are presented of contrastive divergence learning using several types of expert on several types of data.

5,150 citations

Proceedings ArticleDOI
03 Aug 2003
TL;DR: A set of concrete bestpractices that document analysis researchers can use to get good results with neural networks, including a simple "do-it-yourself" implementation of convolution with a flexible architecture suitable for many visual document problems.
Abstract: Neural networks are a powerful technology forclassification of visual inputs arising from documents.However, there is a confusing plethora of different neuralnetwork methods that are used in the literature and inindustry. This paper describes a set of concrete bestpractices that document analysis researchers can use toget good results with neural networks. The mostimportant practice is getting a training set as large aspossible: we expand the training set by adding a newform of distorted data. The next most important practiceis that convolutional neural networks are better suited forvisual document tasks than fully connected networks. Wepropose that a simple "do-it-yourself" implementation ofconvolution with a flexible architecture is suitable formany visual document problems. This simpleconvolutional neural network does not require complexmethods, such as momentum, weight decay, structure-dependentlearning rates, averaging layers, tangent prop,or even finely-tuning the architecture. The end result is avery simple yet general architecture which can yieldstate-of-the-art performance for document analysis. Weillustrate our claims on the MNIST set of English digitimages.

2,783 citations


"A fast learning algorithm for deep ..." refers methods in this paper

  • ...Using local elastic deformations in a convolutional neural network, Simard, Steinkraus, and Platt (2003) achieve 0.4%, which is slightly better than the 0.63% achieved by the best hand-coded recognition algorithm (Belongie, Malik, & Puzicha, 2002)....

    [...]

  • ...Using local elastic deformations in a convolutional neural network, Simard, Steinkraus, and Platt (2003) achieve 0.4%, which is slightly better than the 0.63% achieved by the best hand-coded recognition algorithm (Belongie, Malik, & Puzicha, 2002)....

    [...]