Showing papers on "MNIST database published in 2011"

PDF

Open Access

Book Chapter•DOI•

Stacked convolutional auto-encoders for hierarchical feature extraction

[...]

Jonathan Masci, Ueli Meier, Dan Ciresan, Jürgen Schmidhuber

14 Jun 2011

TL;DR: A novel convolutional auto-encoder (CAE) for unsupervised feature learning that initializing a CNN with filters of a trained CAE stack yields superior performance on a digit and an object recognition benchmark.

...read moreread less

Abstract: We present a novel convolutional auto-encoder (CAE) for unsupervised feature learning. A stack of CAEs forms a convolutional neural network (CNN). Each CAE is trained using conventional on-line gradient descent without additional regularization terms. A max-pooling layer is essential to learn biologically plausible features consistent with those found by previous approaches. Initializing a CNN with filters of a trained CAE stack yields superior performance on a digit (MNIST) and an object recognition (CIFAR10) benchmark.

...read moreread less

1,832 citations

Proceedings Article•DOI•

Flexible, high performance convolutional neural networks for image classification

[...]

Dan Ciresan¹, Ueli Meier¹, Jonathan Masci¹, Luca Maria Gambardella¹, Jürgen Schmidhuber¹ - Show less +1 more•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

16 Jul 2011

TL;DR: A fast, fully parameterizable GPU implementation of Convolutional Neural Network variants and their feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way.

...read moreread less

Abstract: We present a fast, fully parameterizable GPU implementation of Convolutional Neural Network variants. Our feature extractors are neither carefully designed nor pre-wired, but rather learned in a supervised way. Our deep hierarchical architectures achieve the best published results on benchmarks for object classification (NORB, CIFAR10) and handwritten digit recognition (MNIST), with error rates of 2.53%, 19.51%, 0.35%, respectively. Deep nets trained by simple back-propagation perform better than more shallow ones. Learning is surprisingly rapid. NORB is completely trained within five epochs. Test error rates on MNIST drop to 2.42%, 0.97% and 0.48% after 1, 3 and 17 epochs, respectively.

...read moreread less

1,216 citations

Proceedings Article•

On optimization methods for deep learning

[...]

Jiquan Ngiam¹, Adam Coates¹, Ahbik Lahiri¹, Bobby Prochnow¹, Quoc V. Le¹, Andrew Y. Ng¹ - Show less +2 more•Institutions (1)

Stanford University¹

28 Jun 2011

TL;DR: It is shown that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with line search can significantly simplify and speed up the process of pretraining deep algorithms.

...read moreread less

Abstract: The predominant methodology in training deep learning advocates the use of stochastic gradient descent methods (SGDs). Despite its ease of implementation, SGDs are difficult to tune and parallelize. These problems make it challenging to develop, debug and scale up deep learning algorithms with SGDs. In this paper, we show that more sophisticated off-the-shelf optimization methods such as Limited memory BFGS (L-BFGS) and Conjugate gradient (CG) with line search can significantly simplify and speed up the process of pretraining deep algorithms. In our experiments, the difference between L-BFGS/CG and SGDs are more pronounced if we consider algorithmic extensions (e.g., sparsity regularization) and hardware extensions (e.g., GPUs or computer clusters). Our experiments with distributed optimization support the use of L-BFGS with locally connected networks and convolutional neural networks. Using L-BFGS, our convolutional network model achieves 0.69% on the standard MNIST dataset. This is a state-of-the-art result on MNIST among algorithms that do not use distortions or pretraining.

...read moreread less

908 citations

Proceedings Article•DOI•

Convolutional Neural Network Committees for Handwritten Character Classification

[...]

Dan Ciresan¹, Ueli Meier¹, Luca Maria Gambardella¹, Jürgen Schmidhuber¹•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

18 Sep 2011

TL;DR: This work applies the same architecture to NIST SD 19, a more challenging dataset including lower and upper case letters, and obtains the best results published so far for both NIST digits and NIST letters.

...read moreread less

Abstract: In 2010, after many years of stagnation, the MNIST handwriting recognition benchmark record dropped from 0.40% error rate to 0.35%. Here we report 0.27% for a committee of seven deep CNNs trained on graphics cards, narrowing the gap to human performance. We also apply the same architecture to NIST SD 19, a more challenging dataset including lower and upper case letters. A committee of seven CNNs obtains the best results published so far for both NIST digits and NIST letters. The robustness of our method is verified by analyzing 78125 different 7-net committees.

...read moreread less

504 citations

Posted Content•

High-Performance Neural Networks for Visual Object Classification

[...]

Dan Claudio Ciresan, Ueli Meier, Jonatan Masci, Luca Maria Gambardella, Jürgen Schmidhuber - Show less +1 more

01 Feb 2011-arXiv: Artificial Intelligence

...read moreread less

275 citations

Proceedings Article•DOI•

Learning image representations from the pixel level via hierarchical sparse coding

[...]

Kai Yu, Yuanqing Lin, John Lafferty¹•Institutions (1)

Carnegie Mellon University¹

20 Jun 2011

TL;DR: The algorithm gives excellent results for hand-written digit recognition on MNIST and object recognition on the Caltech101 benchmark, marking the first time that such accuracies have been achieved using automatically learned features from the pixel level, rather than using hand-designed descriptors.

...read moreread less

Abstract: We present a method for learning image representations using a two-layer sparse coding scheme at the pixel level. The first layer encodes local patches of an image. After pooling within local regions, the first layer codes are then passed to the second layer, which jointly encodes signals from the region. Unlike traditional sparse coding methods that encode local patches independently, this approach accounts for high-order dependency among patterns in a local image neighborhood. We develop algorithms for data encoding and codebook learning, and show in experiments that the method leads to more invariant and discriminative image representations. The algorithm gives excellent results for hand-written digit recognition on MNIST and object recognition on the Caltech101 benchmark. This marks the first time that such accuracies have been achieved using automatically learned features from the pixel level, rather than using hand-designed descriptors.

...read moreread less

240 citations

Proceedings Article•DOI•

Deep Convex Net: A Scalable Architecture for Speech Pattern Classification.

[...]

Dong Yu¹, Li Deng¹•Institutions (1)

Microsoft¹

27 Aug 2011

TL;DR: Results on both MNIST and TIMIT tasks evaluated thus far demonstrate superior performance of DCN over the DBN (Deep Belief Network) counterpart that forms the basis of the DNN, reflected not only in training scalability and CPU-only computation, but more importantly in classification accuracy in both tasks.

...read moreread less

Abstract: We recently developed context-dependent DNN-HMM (DeepNeural-Net/Hidden-Markov-Model) for large-vocabulary speech recognition. While achieving impressive recognition error rate reduction, we face the insurmountable problem of scalability in dealing with virtually unlimited amount of training data available nowadays. To overcome the scalability challenge, we have designed the deep convex network (DCN) architecture. The learning problem in DCN is convex within each module. Additional structure-exploited fine tuning further improves the quality of DCN. The full learning in DCN is batch-mode based instead of stochastic, naturally lending it amenable to parallel training that can be distributed over many machines. Experimental results on both MNIST and TIMIT tasks evaluated thus far demonstrate superior performance of DCN over the DBN (Deep Belief Network) counterpart that forms the basis of the DNN. The superiority is reflected not only in training scalability and CPU-only computation, but more importantly in classification accuracy in both tasks.

...read moreread less

163 citations

One-shot learning with a hierarchical nonparametric Bayesian model

[...]

Ruslan Salakhutdinov¹, Josh Tenenbaum², Antonio Torralba²•Institutions (2)

University of Toronto¹, Massachusetts Institute of Technology²

02 Jul 2011

TL;DR: In this article, a hierarchical Bayesian model is proposed to transfer knowledge from previously learned categories to a novel category, in the form of a prior over category means and variances, which can discover how to group categories into meaningful super-categories that express different priors for new classes.

...read moreread less

Abstract: We develop a hierarchical Bayesian model that learns categories from single training examples. The model transfers acquired knowledge from previously learned categories to a novel category, in the form of a prior over category means and variances. The model discovers how to group categories into meaningful super-categories that express different priors for new classes. Given a single example of a novel category, we can efficiently infer which super-category the novel category belongs to, and thereby estimate not only the new category's mean but also an appropriate similarity metric based on parameters inherited from the super-category. On MNIST and MSR Cambridge image datasets the model learns useful representations of novel categories based on just a single training example, and performs significantly better than simpler hierarchical Bayesian approaches. It can also discover new categories in a completely unsupervised fashion, given just one or a few examples.

...read moreread less

111 citations

Proceedings Article•DOI•

Better Digit Recognition with a Committee of Simple Neural Nets

[...]

Ueli Meier¹, Dan Ciresan¹, Luca Maria Gambardella¹, Jürgen Schmidhuber¹•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

18 Sep 2011

TL;DR: A new method to train the members of a committee of one-hidden-layer neural nets is presented, which obtains a recognition error rate on the MNIST digit recognition benchmark set of 0.39%, on par with state-of-the-art recognition rates of more complicated systems.

...read moreread less

Abstract: We present a new method to train the members of a committee of one-hidden-layer neural nets. Instead of training various nets on subsets of the training data we preprocess the training data for each individual model such that the corresponding errors are decor related. On the MNIST digit recognition benchmark set we obtain a recognition error rate of 0.39%, using a committee of 25 one-hidden-layer neural nets, which is on par with state-of-the-art recognition rates of more complicated systems.

...read moreread less

79 citations

Proceedings Article•

Access to Unlabeled Data can Speed up Prediction Time

[...]

Ruth Urner¹, Shai Shalev-Shwartz¹, Shai Ben-David²•Institutions (2)

University of Waterloo¹, Hebrew University of Jerusalem²

28 Jun 2011

TL;DR: This work proposes an SSL algorithmic framework which can utilize unlabeled examples for learning classifiers from a predefined set of fast classifiers and proposes a novel quantitative measure of the so-called cluster assumption.

...read moreread less

Abstract: Semi-supervised learning (SSL) addresses the problem of training a classifier using a small number of labeled examples and many un-labeled examples. Most previous work on SSL focused on how availability of unlabeled data can improve the accuracy of the learned classifiers. In this work we study how un-labeled data can be beneficial for constructing faster classifiers. We propose an SSL algorithmic framework which can utilize unlabeled examples for learning classifiers from a predefined set of fast classifiers. We formally analyze conditions under which our algorithmic paradigm obtains significant improvements by the use of unlabeled data. As a side benefit of our analysis we propose a novel quantitative measure of the so-called cluster assumption. We demonstrate the potential merits of our approach by conducting experiments on the MNIST data set, showing that, when a sufficiently large unlabeled sample is available, a fast classifier can be learned from much fewer labeled examples than without such a sample.

...read moreread less

46 citations

Proceedings Article•

Sparse group restricted boltzmann machines

[...]

Heng Luo¹, Ruimin Shen¹, Changyong Niu², Carsten Ullrich¹•Institutions (2)

Shanghai Jiao Tong University¹, Zhengzhou University²

07 Aug 2011

TL;DR: In this paper, the activation probabilities of hidden units in restricted Boltzmann machines were used to capture the local dependencies among hidden units, and the proposed SGRBMs were applied to model patches of natural images, handwritten digits and OCR English letters.

...read moreread less

Abstract: Since learning in Boltzmann machines is typically quite slow, there is a need to restrict connections within hidden layers. However, the resulting states of hidden units exhibit statistical dependencies. Based on this observation, we propose using l1/l2 regularization upon the activation probabilities of hidden units in restricted Boltzmann machines to capture the local dependencies among hidden units. This regularization not only encourages hidden units of many groups to be inactive given observed data but also makes hidden units within a group compete with each other for modeling observed data. Thus, the l1/l2 regularization on RBMs yields sparsity at both the group and the hidden unit levels. We call RBMs trained with the regularizer sparse group RBMs (SGRBMs). The proposed SGRBMs are applied to model patches of natural images, handwritten digits and OCR English letters. Then to emphasize that SGRBMs can learn more discriminative features we applied SGRBMs to pretrain deep networks for classification tasks. Furthermore, we illustrate the regularizer can also be applied to deep Boltzmann machines, which lead to sparse group deep Boltzmann machines. When adapted to the MNIST data set, a two-layer sparse group Boltzmann machine achieves an error rate of 0.84%, which is, to our knowledge, the best published result on the permutation-invariant version of the MNIST task.

...read moreread less

Proceedings Article•DOI•

A Semi-supervised Ensemble Learning Approach for Character Labeling with Minimal Human Effort

[...]

Szil´rd Vajda, Akmal Junaidi, Gernot A. Fink

18 Sep 2011

TL;DR: An efficient and low-cost semi-automatic labeling system for character datasets that proves that labeling only less than 0.5% of the training data is sufficient to achieve 86.21% recognition rate for a brand new script and 94.81% for the MNIST benchmark dataset.

...read moreread less

Abstract: One of the major issues in handwritten character recognition is the efficient creation of ground truth to train and test the different recognizers. The manual labeling of the data by a human expert is a tedious and costly procedure. In this paper we propose an efficient and low-cost semi-automatic labeling system for character datasets. First, the data is represented in different abstraction levels, which is clustered after in an unsupervised manner. The different clusters are labeled by the human experts and finally an unanimity voting is considered to decide if a label is accepted or not. The experimental results prove that labeling only less than 0.5% of the training data is sufficient to achieve 86.21% recognition rate for a brand new script (Lampung) and 94.81% for the MNIST benchmark dataset, considering only a K-nearest neighbor classifier for recognition.

...read moreread less

Posted Content•

Handwritten Digit Recognition with a Committee of Deep Neural Nets on GPUs

[...]

Dan Claudio Ciresan, Ueli Meier, Luca Maria Gambardella, Jürgen Schmidhuber

03 Mar 2011-arXiv: Learning

TL;DR: Another substantial improvement is reported: 0.31% obtained using a committee of MLPs using simple but deep MLPs, outperforming all the previous more complex methods.

...read moreread less

Abstract: The competitive MNIST handwritten digit recognition benchmark has a long history of broken records since 1998. The most recent substantial improvement by others dates back 7 years (error rate 0.4%) . Recently we were able to significantly improve this result, using graphics cards to greatly speed up training of simple but deep MLPs, which achieved 0.35%, outperforming all the previous more complex methods. Here we report another substantial improvement: 0.31% obtained using a committee of MLPs.

...read moreread less

Proceedings Article•DOI•

Comparative Study of Part-Based Handwritten Character Recognition Methods

[...]

Wang Song¹, Seiichi Uchida¹, Marcus Liwicki²•Institutions (2)

Kyushu University¹, German Research Centre for Artificial Intelligence²

18 Sep 2011

TL;DR: Three part-based methods for handwritten character recognition are introduced and the relative superiority of the class distance method and the robustness of the multiple voting method against the reduction of training set are shown.

...read moreread less

Abstract: The purpose of this paper is to introduce three part-based methods for handwritten character recognition and then compare their performances experimentally. All of those methods decompose handwritten characters into "parts". Then some recognition processes are done in a part-wise manner and, finally, the recognition results at all the parts are combined via voting to have the recognition result of the entire character. Since part-based methods do not rely on the global structure of the character, we can expect their robustness against various deformations. Three voting methods have been investigated for the combination: single voting, multiple voting, and class distance. All of them use different strategies for voting. Experimental results on the MNIST database showed the relative superiority of the class distance method and the robustness of the multiple voting method against the reduction of training set.

...read moreread less

Journal Article•DOI•

Nonlinear dimensionality reduction using a temporal coherence principle

[...]

Yaping Huang¹, Jiali Zhao¹, Yunhui Liu¹, Siwei Luo¹, Qi Zou¹, Mei Tian¹ - Show less +2 more•Institutions (1)

Beijing Jiaotong University¹

01 Aug 2011-Information Sciences

TL;DR: A new Nonlinear Neighborhood Preserving (NNP) technique is developed, by utilizing the temporal coherence principle to find an optimal low dimensional representation from the original high dimensional data.

...read moreread less

Journal Article•DOI•

Discriminative structure selection method of Gaussian Mixture Models with its application to handwritten digit recognition

[...]

Xuefeng Chen¹, Xiabi Liu¹, Yunde Jia¹•Institutions (1)

Beijing Institute of Technology¹

01 Feb 2011-Neurocomputing

TL;DR: The proposed discriminative method to select GMM structures for pattern classification behaves better than the manual method and the generative counterparts, including Bayesian Information Criterion, Minimum Description Length (MDL) and AutoClass.

...read moreread less

Proceedings Article•DOI•

Hybrid Evolution of Convolutional Networks

[...]

Brian Cheung¹, Carl Sable¹•Institutions (1)

Cooper Union¹

18 Dec 2011

TL;DR: This paper applies a hybrid evolutionary search procedure to define the initialization and architectural parameters of convolutional networks, one of the first successful deep network models, and makes use of stochastic diagonal Levenberg-Marquardt to accelerate the convergence of training.

...read moreread less

Abstract: With the increasing trend of neural network models towards larger structures with more layers, we expect a corresponding exponential increase in the number of possible architectures. In this paper, we apply a hybrid evolutionary search procedure to define the initialization and architectural parameters of convolutional networks, one of the first successful deep network models. We make use of stochastic diagonal Levenberg-Marquardt to accelerate the convergence of training, lowering the time cost of fitness evaluation. Using parameters found from the evolutionary search together with absolute value and local contrast normalization preprocessing between layers, we achieve the best known performance on several of the MNIST Variations, rectangles-image and convex image datasets.

...read moreread less

Proceedings Article•DOI•

Resource Efficient Arithmetic Effects on RBM Neural Network Solution Quality Using MNIST

[...]

Antony Savich¹, Medhat Moussa¹•Institutions (1)

University of Guelph¹

30 Nov 2011

TL;DR: This paper presents a case study on the impact of using reduced precision arithmetic on learning in Restricted Boltzmann Machine (RBM) deep belief networks and demonstrates that RBM can be trained successfully using resource-efficient fixed point formats commonly found in current FPGA devices.

...read moreread less

Abstract: This paper presents a case study on the impact of using reduced precision arithmetic on learning in Restricted Boltzmann Machine (RBM) deep belief networks. FPGAs provide a hardware accelerator framework to speed up many algorithms, including the learning and recognition tasks of ever growing neural network topologies and problem complexities. Current FPGAs include DSP blocks - hard blocks that allow designers to roll in hardware otherwise built using significant quantity of reconfigurable logic (slices) and increase clock performance of arithmetic operations. Accelerators on FPGAs can take advantage of, in some products, thousands DSP blocks on a single chip to scale up the parallelism of designs. Conversely, IEEE floating point representation cannot be fully implemented in single DSP slices and requires a significant amount of general logic thus reducing the amount of resources available to breadth of parallelism in an accelerator design. Reduced precision fixed point format arithmetic can fit within a single DSP slice without external logic. It has been used successfully for training MLP-BP neural networks on small problems. The merit of reduced precision computation in RBM networks for sizable problems has not been evaluated. In this work, a three layer RBM network linked to one classification layer (1.6M weights) is used to learn the classic MNIST dataset over a set of common limited precisions used in FPGA designs. Issues of parameter saturation and a method to overcome inherent training difficulties is discussed. The results demonstrate that RBM can be trained successfully using resource-efficient fixed point formats commonly found in current FPGA devices.

...read moreread less

Proceedings Article•DOI•

Building a better probabilistic model of images by factorization

[...]

Benjamin J. Culpepper¹, Jascha Sohl-Dickstein¹, Bruno A. Olshausen¹•Institutions (1)

University of California, Berkeley¹

06 Nov 2011

TL;DR: A directed bilinear model that learns higher-order groupings among features of natural images and achieves high log-likelihood (−94 nats), surpassing the current state of the art for natural images achievable with an mcRBM model.

...read moreread less

Abstract: We describe a directed bilinear model that learns higher-order groupings among features of natural images. The model represents images in terms of two sets of latent variables: one set of variables represents which feature groups are active, while the other specifies the relative activity within groups. Such a factorized representation is beneficial because it is stable in response to small variations in the placement of features while still preserving information about relative spatial relationships. When trained on MNIST digits, the resulting representation provides state of the art performance in classification using a simple classifier. When trained on natural images, the model learns to group features according to proximity in position, orientation, and scale. The model achieves high log-likelihood (−94 nats), surpassing the current state of the art for natural images achievable with an mcRBM model.

...read moreread less

Proceedings Article•DOI•

Modular deep belief networks that do not forget

[...]

Leo Pape¹, Faustino Gomez¹, Mark Ring¹, Jürgen Schmidhuber¹•Institutions (1)

University of Lugano¹

03 Oct 2011

TL;DR: The M-DBN is introduced, an unsupervised modular DBN that addresses the forgetting problem and retains learned features even after those features are removed from the training data, while monolithic DBNs of comparable size forget feature mappings learned before.

...read moreread less

Abstract: Deep belief networks (DBNs) are popular for learning compact representations of high-dimensional data. However, most approaches so far rely on having a single, complete training set. If the distribution of relevant features changes during subsequent training stages, the features learned in earlier stages are gradually forgotten. Often it is desirable for learning algorithms to retain what they have previously learned, even if the input distribution temporarily changes. This paper introduces the M-DBN, an unsupervised modular DBN that addresses the forgetting problem. M-DBNs are composed of a number of modules that are trained only on samples they best reconstruct. While modularization by itself does not prevent forgetting, the M-DBN additionally uses a learning method that adjusts each module's learning rate proportionally to the fraction of best reconstructed samples. On the MNIST handwritten digit dataset module specialization largely corresponds to the digits discerned by humans. Furthermore, in several learning tasks with changing MNIST digits, M-DBNs retain learned features even after those features are removed from the training data, while monolithic DBNs of comparable size forget feature mappings learned before.

...read moreread less

Proceedings Article•DOI•

Semi-supervised handwritten digit recognition using very few labeled data

[...]

Steven Van Vaerenbergh¹, Ignacio Santamaria¹, Paolo Emilio Barbano²•Institutions (2)

University of Cantabria¹, University of Cambridge²

22 May 2011

TL;DR: Experimental results on the MNIST benchmark indicate that the proposed classifier outperforms current state-of-the-art techniques, especially when very few labeled patterns are available.

...read moreread less

Abstract: We propose a novel semi-supervised classifier for handwritten digit recognition problems that is based on the assumption that any digit can be obtained as a slight transformation of another sufficiently close digit. Given a number of labeled and unlabeled images, it is possible to determine the class membership of each unlabeled image by creating a sequence of such image transformations that connect it, through other unlabeled images, to a labeled image. In order to measure the total transformation, a robust and reliable metric of the path length is proposed, which combines a local dissimilarity between consecutive images along the path with a global connectivity-based metric. For the local dissimilarity we use a symmetrized version of the zero-order image deformation model (IDM) proposed by Keysers et al. in [1]. For the global distance we use a connectivity-based metric proposed by Chapelle and Zien in [2]. Experimental results on the MNIST benchmark indicate that the proposed classifier outperforms current state-of-the-art techniques, especially when very few labeled patterns are available.

...read moreread less

Journal Article•DOI•

Graph attribute embedding via Riemannian submersion learning

[...]

Haifeng Zhao¹, Antonio Robles-Kelly², Jun Zhou², Jianfeng Lu¹, Jing-Yu Yang¹ - Show less +1 more•Institutions (2)

Nanjing University of Science and Technology¹, Australian Research Council²

01 Jul 2011-Computer Vision and Image Understanding

TL;DR: This paper views the problem of embedding a set of relational structures into a metric space for purposes of matching and categorisation from a Riemannian perspective and makes use of the concepts of charts on the manifold to define the embedding as a mixture of class-specific submersions.

...read moreread less

Proceedings Article•DOI•

Joint dictionary learning and topic modeling for image clustering

[...]

Lingbo Li¹, Mingyuan Zhou¹, Eric Wang¹, Lawrence Carin¹•Institutions (1)

Duke University¹

22 May 2011

TL;DR: A new Bayesian model is proposed, integrating dictionary learning and topic modeling into a unified framework, and a subset of the images may be annotated, demonstrating state-of-the-art performance.

...read moreread less

Abstract: A new Bayesian model is proposed, integrating dictionary learning and topic modeling into a unified framework. The model is applied to cluster multiple images, and a subset of the images may be annotated. Example results are presented on the MNIST digit data and on the Microsoft MSRC multi-scene image data. These results reveal the working mechanisms of the model and demonstrate state-of-the-art performance.

...read moreread less

Proceedings Article•DOI•

Parallel execution of distributed SVM using MPI (CoDLib)

[...]

Nur Shakirah Md Salleh¹, Azizah Suliman¹, Abdul Rahim Ahmad¹•Institutions (1)

Universiti Tenaga Nasional¹

01 Nov 2011

TL;DR: A combination of distributed and parallel computing method, CoDLib have been proposed that shows a great speed up on the training of the MNIST dataset where training time has been significantly reduced compared with standard LIBSVM without affecting the quality of the SVM.

...read moreread less

Abstract: Support Vector Machine (SVM) is an efficient data mining approach for data classification. However, SVM algorithm requires very large memory requirement and computational time to deal with very large dataset. To reduce the computational time during the process of training the SVM, a combination of distributed and parallel computing method, CoDLib have been proposed. Instead of using a single machine for parallel computing, multiple machines in a cluster are used. Message Passing Interface (MPI) is used in the communication between machines in the cluster. The original dataset is split and distributed to the respective machines. Experiments results shows a great speed up on the training of the MNIST dataset where training time has been significantly reduced compared with standard LIBSVM without affecting the quality of the SVM.

...read moreread less

Posted Content•

Handwritten digit classification

[...]

Andrea Giuliodori, Rosa Elvira Lillo Rodríguez, Daniel Peña Sánchez de Rivera

01 Jun 2011-Research Papers in Economics

TL;DR: A comparison between a multivariate and a probabilistic approach is shown, concluding that both methods provide similar results in terms of test-error rate.

...read moreread less

Abstract: Pattern recognition is one of the major challenges in statistics framework. Its goal is the feature extraction to classify the patterns into categories. A well-known example in this field is the handwritten digit recognition where digits have to be assigned into one of the 10 classes using some classification method. Our purpose is to present alternative classification methods based on statistical techniques. We show a comparison between a multivariate and a probabilistic approach, concluding that both methods provide similar results in terms of test-error rate. Experiments are performed on the known MNIST and USPS databases in binary-level image. Then, as an additional contribution we introduce a novel method to binarize images, based on statistical concepts associated to the written trace of the digit

...read moreread less

Journal Article•DOI•

A Statistical Approach For Latin Handwritten Digit Recognition

[...]

Ihab Zaqout

01 Jan 2011-International Journal of Advanced Computer Science and Applications

TL;DR: A simple method based on some statistical measurements for Latin handwritten digit recognition is proposed in this paper and six categories are created based on the relation between number of termination points and possible digits.

...read moreread less

Abstract: A simple method based on some statistical measurements for Latin handwritten digit recognition is proposed in this paper. Firstly, a preprocess step is started with thresholding the gray-scale digit image into a binary image, and then noise removal, spurring and thinning are performed. Secondly, by reducing the search space, the region-of-interest (ROI) is cropped from the preprocessed image, then a freeman chain code template is applied and five feature sets are extracted from each digit image. Counting the number of termination points, their coordinates with relation to the center of the ROI, Euclidian distances, orientations in terms of angles, and other statistical properties such as minor-to-major axis length ratio, area and others. Finally, six categories are created based on the relation between number of termination points and possible digits. The present method is applied and tested on training set (60,000 images) and test set (10,000 images) of MNIST handwritten digit database. Our experiments report a correct classification of 92.9041% for the testing set and 95.0953% for the training set.

...read moreread less

Proceedings Article•

Discussion of \The Neural Autoregressive Distribution Estimator"

[...]

Yoshua Bengio¹•Institutions (1)

Université de Montréal¹

14 Jun 2011

TL;DR: The Restricted Boltzmann Machine (RBM) is an undirected graphical model with latent variables, exact inference, rather simple sampling procedures (block Gibbs), and several successful learning algorithms based on approximations of the log-likelihood gradient.

...read moreread less

Abstract: The Restricted Boltzmann Machine (Smolensky, 1986; Hinton et al., 2006) has inspired much research in recent years, in particular as a building block for deep architectures (see Bengio (2009) for a review). The Restricted Boltzmann Machine (RBM) is an undirected graphical model with latent variables, exact inference, rather simple sampling procedures (block Gibbs), and several successful learning algorithms based on approximations of the log-likelihood gradient. However, when it comes to actually computing the distribution or density function, it is intractable, except when either the number of inputs or latent variables is very small (about 25 binary hidden units with current computers and about an hour of computing, on MNIST).

...read moreread less

Deep Convex Networks for Image and Speech Classification

[...]

Li Deng¹, Dong Yu¹•Institutions (1)

Microsoft¹

01 Jun 2011

TL;DR: Experimental results on handwriting image recognition task (MNIST) and on phone state classification (TIMIT) demonstrate superior performance of DCN over DBN not only in training efficiency but also in classification accuracy.

...read moreread less

Abstract:  To overcome the scalability challenge associated with Deep Belief Network (DBN), we have designed a novel deep learning architecture, deep convex network (DCN). The learning problem in DCN is convex within each layer. Additional structure-exploited fine tuning further improves the quality of DCN. The full learning in DCN is batch-mode based instead of stochastic, naturally lending it amenable to parallel training that can be distributed over many machines. Experimental results on handwriting image recognition task (MNIST) and on phone state classification (TIMIT) demonstrate superior performance of DCN over DBN not only in training efficiency but also in classification accuracy. DCN gives the error rate of 0.83%, the lowest without the use of additional training data produced by elastic distortion. The corresponding error rate by the best DBN which we have carefully tuned is 1.06%. On the TIMIT task, DCN also outperforms DBN but with a relatively smaller percentage so far.

...read moreread less

Proceedings Article•

Learning Class-relevant Features and Class-irrelevant Features via a Hybrid third-order RBM

[...]

Heng Luo¹, Ruimin Shen¹, Changyong Niu², Carsten Ullrich³•Institutions (3)

San Jose State University¹, Zhengzhou University², Shanghai Jiao Tong University³

14 Jun 2011

TL;DR: It is shown that classirrelevant features help class- relevant features to focus on the recognition task and introduce useful regularization effects to reduce the norms of class-relevant features in this hybrid third-order Restricted Boltzmann Machine.

...read moreread less

Abstract: Restricted Boltzmann Machines are commonly used in unsupervised learning to extract features from training data. Since these features are learned for regenerating training data a classifier based on them has to be trained. If only a few of the learned features are discriminative other non-discriminative features will distract the classifier during the training process and thus waste computing resources for testing. In this paper, we present a hybrid third-order Restricted Boltzmann Machine in which class-relevant features (for recognizing) and class-irrelevant features (for generating only) are learned simultaneously. As the classification task uses only the class-relevant features, the test itself becomes very fast. We show that classirrelevant features help class-relevant features to focus on the recognition task and introduce useful regularization effects to reduce the norms of class-relevant features. Thus there is no need to use weight-decay for the parameters of this model. Experiments on the MNIST, NORB and Caltech101 Silhouettes datasets show very promising results.

...read moreread less

Proceedings Article•DOI•

Adaptation of a mixture of multivariate Bernoulli distributions

[...]

Ankur Kamthe¹, Miguel Á. Carreira-Perpinñán¹, Alberto E. Cerpa¹•Institutions (1)

University of California, Merced¹

16 Jul 2011

TL;DR: This work proposes an algorithm that can adapt a preexisting MMB trained with extensive data to a new link from which very limited data is available, and shows it can learn accurate models from data traces of about 1 minute, about 10 times shorter than needed if training an MMB from scratch.

...read moreread less

Abstract: The mixture of multivariate Bernoulli distributions (MMB) is a statistical model for high-dimensional binary data in widespread use. Recently, the MMB has been used to model the sequence of packet receptions and losses of wireless links in sensor networks. Given an MMB trained on long data traces recorded from links of a deployed network, one can then use samples from the MMB to test different routing algorithms for as long as desired. However, learning an accurate model for a new link requires collecting from it long traces over periods of hours, a costly process in practice (e.g. limited battery life). We propose an algorithm that can adapt a preexisting MMB trained with extensive data to a new link from which very limited data is available. Our approach constrains the new MMB's parameters through a nonlinear transformation of the existing MMB's parameters. The transformation has a small number of parameters that are estimated using a generalized EM algorithm with an inner loop of BFGS iterations. We demonstrate the efficacy of the approach using the MNIST dataset of handwritten digits, and wireless link data from a sensor network. We show we can learn accurate models from data traces of about 1 minute, about 10 times shorter than needed if training an MMB from scratch.

...read moreread less