scispace - formally typeset
Search or ask a question
Topic

MNIST database

About: MNIST database is a research topic. Over the lifetime, 6212 publications have been published within this topic receiving 327921 citations. The topic is also known as: MNIST dataset.


Papers
More filters
Posted Content
TL;DR: InfoGAN as mentioned in this paper is a generative adversarial network that maximizes the mutual information between a small subset of the latent variables and the observation, which can be interpreted as a variation of the Wake-Sleep algorithm.
Abstract: This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

2,409 citations

Posted Content
TL;DR: A binary matrix multiplication GPU kernel is written with which it is possible to run the MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy.
Abstract: We introduce a method to train Binarized Neural Networks (BNNs) - neural networks with binary weights and activations at run-time. At training-time the binary weights and activations are used for computing the parameters gradients. During the forward pass, BNNs drastically reduce memory size and accesses, and replace most arithmetic operations with bit-wise operations, which is expected to substantially improve power-efficiency. To validate the effectiveness of BNNs we conduct two sets of experiments on the Torch7 and Theano frameworks. On both, BNNs achieved nearly state-of-the-art results over the MNIST, CIFAR-10 and SVHN datasets. Last but not least, we wrote a binary matrix multiplication GPU kernel with which it is possible to run our MNIST BNN 7 times faster than with an unoptimized GPU kernel, without suffering any loss in classification accuracy. The code for training and running our BNNs is available on-line.

2,320 citations

Proceedings ArticleDOI
01 Sep 2009
TL;DR: It is shown that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks and that two stages of feature extraction yield better accuracy than one.
Abstract: In many recent object recognition systems, feature extraction stages are generally composed of a filter bank, a non-linear transformation, and some sort of feature pooling layer Most systems use only one stage of feature extraction in which the filters are hard-wired, or two stages where the filters in one or both stages are learned in supervised or unsupervised mode This paper addresses three questions: 1 How does the non-linearities that follow the filter banks influence the recognition accuracy? 2 does learning the filter banks in an unsupervised or supervised manner improve the performance over random filters or hardwired filters? 3 Is there any advantage to using an architecture with two stages of feature extraction, rather than one? We show that using non-linearities that include rectification and local contrast normalization is the single most important ingredient for good accuracy on object recognition benchmarks We show that two stages of feature extraction yield better accuracy than one Most surprisingly, we show that a two-stage system with random filters can yield almost 63% recognition rate on Caltech-101, provided that the proper non-linearities and pooling layers are used Finally, we show that with supervised refinement, the system achieves state-of-the-art performance on NORB dataset (56%) and unsupervised pre-training followed by supervised refinement produces good accuracy on Caltech-101 (≫ 65%), and the lowest known error rate on the undistorted, unprocessed MNIST dataset (053%)

2,317 citations

Proceedings Article
05 Dec 2016
TL;DR: InfoGAN as mentioned in this paper is an information-theoretic extension to the GAN that is able to learn disentangled representations in a completely unsupervised manner, and it also discovers visual concepts that include hair styles, presence of eyeglasses, and emotions on the CelebA face dataset.
Abstract: This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound of the mutual information objective that can be optimized efficiently. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing supervised methods. For an up-to-date version of this paper, please see https://arxiv.org/abs/1606.03657.

2,290 citations

Posted Content
TL;DR: In this article, a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes was developed, which mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy.
Abstract: Deep learning tools have gained tremendous attention in applied machine learning. However such tools for regression and classification do not capture model uncertainty. In comparison, Bayesian models offer a mathematically grounded framework to reason about model uncertainty, but usually come with a prohibitive computational cost. In this paper we develop a new theoretical framework casting dropout training in deep neural networks (NNs) as approximate Bayesian inference in deep Gaussian processes. A direct result of this theory gives us tools to model uncertainty with dropout NNs -- extracting information from existing models that has been thrown away so far. This mitigates the problem of representing uncertainty in deep learning without sacrificing either computational complexity or test accuracy. We perform an extensive study of the properties of dropout's uncertainty. Various network architectures and non-linearities are assessed on tasks of regression and classification, using MNIST as an example. We show a considerable improvement in predictive log-likelihood and RMSE compared to existing state-of-the-art methods, and finish by using dropout's uncertainty in deep reinforcement learning.

2,261 citations


Network Information
Related Topics (5)
Deep learning
79.8K papers, 2.1M citations
91% related
Convolutional neural network
74.7K papers, 2M citations
90% related
Artificial neural network
207K papers, 4.5M citations
86% related
Robustness (computer science)
94.7K papers, 1.6M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
85% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20241
2023812
20221,705
20211,128
20201,229
20191,178