Showing papers on "MNIST database published in 2010"

PDF

Open Access

Journal Article•DOI•

Deep, big, simple neural nets for handwritten digit recognition

[...]

Dan Ciresan¹, Ueli Meier¹, Luca Maria Gambardella¹, Jürgen Schmidhuber¹•Institutions (1)

01 Dec 2010-Neural Computation

TL;DR: Good old online backpropagation for plain multilayer perceptrons yields a very low 0.35 error rate on the MNIST handwritten digits benchmark.

...read moreread less

Abstract: Good old online backpropagation for plain multilayer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images to avoid overfitting, and graphics cards to greatly speed up learning.

...read moreread less

1,016 citations

Journal Article•DOI•

Deep Big Simple Neural Nets Excel on Handwritten Digit Recognition

[...]

Dan Ciresan, Ueli Meier, Luca Maria Gambardella, Jürgen Schmidhuber

01 Mar 2010-arXiv: Neural and Evolutionary Computing

TL;DR: This paper achieved the state-of-the-art performance on the MNIST handwritten digits benchmark by using a large number of hidden layers, many neurons per layer, numerous deformed training images, and graphics cards.

...read moreread less

Abstract: Good old on-line back-propagation for plain multi-layer perceptrons yields a very low 0.35% error rate on the famous MNIST handwritten digits benchmark. All we need to achieve this best result so far are many hidden layers, many neurons per layer, numerous deformed training images, and graphics cards to greatly speed up learning.

...read moreread less

526 citations

Proceedings Article•

Efficient Learning of Deep Boltzmann Machines

[...]

Ruslan Salakhutdinov¹, Hugo Larochelle²•Institutions (2)

Massachusetts Institute of Technology¹, University of Toronto²

31 Mar 2010

TL;DR: A new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables, that learns a separate “recognition” model that is used to quickly initialize, in a single bottom-up pass, the values of the latent variables in all hidden layers.

...read moreread less

Abstract: We present a new approximate inference algorithm for Deep Boltzmann Machines (DBM’s), a generative model with many layers of hidden variables. The algorithm learns a separate “recognition” model that is used to quickly initialize, in a single bottom-up pass, the values of the latent variables in all hidden layers. We show that using such a recognition model, followed by a combined top-down and bottom-up pass, it is possible to efficiently learn a good generative model of high-dimensional highly-structured sensory input. We show that the additional computations required by incorporating a top-down feedback plays a critical role in the performance of a DBM, both as a generative and discriminative model. Moreover, inference is only at most three times slower compared to the approximate inference in a Deep Belief Network (DBN), making large-scale learning of DBM’s practical. Finally, we demonstrate that the DBM’s trained using the proposed approximate inference algorithm perform well compared to DBN’s and SVM’s on the MNIST handwritten digit, OCR English letters, and NORB visual object recognition tasks.

...read moreread less

374 citations

Journal Article•DOI•

Handwritten character recognition through two-stage foreground sub-sampling

[...]

Georgios Vamvakas, Basilis Gatos, Stavros Perantonis

01 Aug 2010-Pattern Recognition

TL;DR: The proposed methodology relies on a new feature extraction technique based on recursive subdivisions of the character image so that the resulting sub-images at each iteration have balanced (approximately equal) numbers of foreground pixels, as far as this is possible.

...read moreread less

109 citations

Proceedings Article•

A Stick-Breaking Construction of the Beta Process

[...]

John Paisley¹, Aimee K. Zaas¹, Christopher W. Woods¹, Geoffrey S. Ginsburg¹, Lawrence Carin¹ - Show less +1 more•Institutions (1)

Duke University¹

21 Jun 2010

TL;DR: An inference procedure that relies on Monte Carlo integration to reduce the number of parameters to be inferred is derived, and results on synthetic data, the MNIST handwritten digits data set and a time-evolving gene expression data set are presented.

...read moreread less

Abstract: We present and derive a new stick-breaking construction of the beta process The construction is closely related to a special case of the stick-breaking construction of the Dirichlet process (Sethuraman, 1994) applied to the beta distribution We derive an inference procedure that relies on Monte Carlo integration to reduce the number of parameters to be inferred, and present results on synthetic data, the MNIST handwritten digits data set and a time-evolving gene expression data set

...read moreread less

86 citations

Proceedings Article•DOI•

A tutorial on stochastic approximation algorithms for training Restricted Boltzmann Machines and Deep Belief Nets

[...]

Kevin Swersky¹, Bo Chen¹, Ben Marlin¹, Nando de Freitas¹•Institutions (1)

University of British Columbia¹

26 Apr 2010

TL;DR: It is demonstrated that Stochastic Maximum Likelihood is superior when using the Restricted Boltzmann Machine as a classifier, and that the algorithm can be greatly improved using the technique of iterate averaging from the field of stochastic approximation.

...read moreread less

Abstract: In this study, we provide a direct comparison of the Stochastic Maximum Likelihood algorithm and Contrastive Divergence for training Restricted Boltzmann Machines using the MNIST data set. We demonstrate that Stochastic Maximum Likelihood is superior when using the Restricted Boltzmann Machine as a classifier, and that the algorithm can be greatly improved using the technique of iterate averaging from the field of stochastic approximation. We further show that training with optimal parameters for classification does not necessarily lead to optimal results when Restricted Boltzmann Machines are stacked to form a Deep Belief Network. In our experiments we observe that fine tuning a Deep Belief Network significantly changes the distribution of the latent data, even though the parameter changes are negligible.

...read moreread less

81 citations

Journal Article•DOI•

Producing pattern examples from mental images

[...]

Bruno P. A. Grieco¹, Priscila M. V. Lima¹, Massimo De Gregorio, Felipe M. G. França¹•Institutions (1)

Federal University of Rio de Janeiro¹

01 Mar 2010-Neurocomputing

TL;DR: It is shown, through the exploration of the MNIST database of handwritten digits as benchmark, how the process of mental images formation can improve WiSARD's classification skills.

...read moreread less

53 citations

Handwritten Digit Recognition Using Multiple Feature Extraction Techniques and Classifier Ensemble

[...]

Rafael M. O. Cruz, George D. C. Cavalcanti, Tsang Ing Ren

01 Jan 2010

TL;DR: A handwritten digit recognition system which uses multiple feature extraction methods and classifier ensemble and it is shown that combining these feature sets is sufficient to achieve high recognition rates.

...read moreread less

Abstract: It is herein proposed a handwritten digit recognition system which uses multiple feature extraction methods and classifier ensemble. The combination of the feature extraction methods is motivated by the observation that different feature extraction algorithms have a better discriminative power for some types of digits. Six features sets were extracted, two proposed by the authors and four published in previous works. It is shown that combining these feature sets is sufficient to achieve high recognition rates. Several combination schemes were tested, showing good results. A scheme using neural networks as a combiner achieved a recognition rate of 99.68%, the highest one on the MNIST database.

...read moreread less

35 citations

Journal Article•DOI•

Classification and recognition of handwritten digits by using mathematical morphology

[...]

V. Vijaya Kumar, A. Srikrishna, B. Raveendra Babu, M. Radhika Mani

09 Oct 2010-Sadhana-academy Proceedings in Engineering Sciences

TL;DR: A novel algorithm for recognition of handwritten digits that completely eliminates the complex process of recognition of horizontal or vertical lines and the property called ‘concavities’ is proposed.

...read moreread less

Abstract: The present paper proposes a novel algorithm for recognition of handwritten digits. For this, the present paper classified the digits into two groups: one group consists of blobs with/without stems and the other digits with stems only. The blobs are identified based on a new concept called morphological region filling methods. This eliminates the problem of finding the size of blobs and their structuring elements. The digits with blobs and stems are identified by a new concept called ‘connected component’. This method completely eliminates the complex process of recognition of horizontal or vertical lines and the property called ‘concavities’. The digits with only stems are recognized, by extending stems into blobs by using connected component approach of morphology. The present method has been applied and tested with various handwritten digits from modified NIST (National Institute of Standards and Technology) handwritten digit database (MNIST), and the success rate has been given. The present method is also compared with various existing methods.

...read moreread less

25 citations

Proceedings Article•DOI•

Deep Spatiotemporal Feature Learning with Application to Image Classification

[...]

Thomas P. Karnowski¹, Itamar Arel², Derek C. Rose²•Institutions (2)

Oak Ridge National Laboratory¹, University of Tennessee²

12 Dec 2010

TL;DR: This work expands DeSTIN to a popular problem, the MNIST data set of handwritten digits, and demonstrates that information from the different layers of this hierarchical system can be extracted and utilized for the purpose of pattern classification.

...read moreread less

Abstract: Deep machine learning is an emerging framework for dealing with complex high-dimensionality data in a hierarchical fashion which draws some inspiration from biological sources. Despite the notable progress made in the field, there remains a need for an architecture that can represent temporal information with the same ease that spatial information is discovered. In this work, we present new results using a recently introduced deep learning architecture called Deep Spatio-Temporal Inference Network (DeSTIN). DeSTIN is a discriminative deep learning architecture that combines concepts from unsupervised learning for dynamic pattern representation together with Bayesian inference. In DeSTIN the spatiotemporal dependencies that exist within the observations are modeled inherently in an unguided manner. Each node models the inputs by means of clustering and simple dynamics modeling while it constructs a belief state over the distribution of sequences using Bayesian inference. We demonstrate that information from the different layers of this hierarchical system can be extracted and utilized for the purpose of pattern classification. Earlier simulation results indicated that the framework is highly promising, consequently in this work we expand DeSTIN to a popular problem, the MNIST data set of handwritten digits. The system as a preprocessor to a neural network achieves a recognition accuracy of 97.98% on this data set. We further show related experimental results pertaining to automatic cluster adaptation and termination.

...read moreread less

22 citations

Proceedings Article•DOI•

CUDA Implementation of Deformable Pattern Recognition and its Application to MNIST Handwritten Digit Database

[...]

Yoshiki Mizukami¹, Katsumi Tadamura¹, Jonathan Warrell², Peng Li³, Simon J. D. Prince³ - Show less +1 more•Institutions (3)

Yamaguchi University¹, Oxford Brookes University², University College London³

23 Aug 2010

TL;DR: A deformable pattern recognition method with CUDA implementation using the prototype-parallel displacement computation on CUDA and the gradual prototype elimination technique for reducing the computational time without sacrificing the accuracy.

...read moreread less

Abstract: In this study we propose a deformable pattern recognition method with CUDA implementation. In order to achieve the proper correspondence between foreground pixels of input and prototype images, a pair of distance maps are generated from input and prototype images, whose pixel values are given based on the distance to the nearest foreground pixel. Then a regularization technique computes the horizontal and vertical displacements based on these distance maps. The dissimilarity is measured based on the eight-directional derivative of input and prototype images in order to leverage characteristic information on the curvature of line segments that might be lost after the deformation. The prototype-parallel displacement computation on CUDA and the gradual prototype elimination technique are employed for reducing the computational time without sacrificing the accuracy. A simulation shows that the proposed method with the k-nearest neighbor classifier gives the error rate of 0.57% for the MNIST handwritten digit database.

...read moreread less

Proceedings Article•

Efficient online learning of a non-negative sparse autoencoder

[...]

Andre Lemme¹, R. Felix Reinhart¹, Jochen J. Steil¹•Institutions (1)

Bielefeld University¹

01 Jan 2010

TL;DR: It is shown that the efficient autoencoder yields to better sparseness and lower reconstruction errors than the batch algorithms on the MNIST benchmark dataset.

...read moreread less

Abstract: We introduce an efficient online learning mechanism for non- negative sparse coding in autoencoder neural networks. In this paper we compare the novel method to the batch algorithm non-negative matrix factorization with and without sparseness constraint. We show that the efficient autoencoder yields to better sparseness and lower reconstruction errors than the batch algorithms on the MNIST benchmark dataset.

...read moreread less

Journal Article•DOI•

Hybrid Linear Modeling via Local Best-fit Flats

[...]

Teng Zhang¹, Arthur Szlam², Yi Wang¹, Gilad Lerman¹•Institutions (2)

University of Minnesota¹, Courant Institute of Mathematical Sciences²

17 Oct 2010-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this paper, a simple and fast geometric method for modeling data by a union of affine subspaces is presented, which can be used for tracking-based motion segmentation and clustering of faces under different illuminating conditions.

...read moreread less

Abstract: We present a simple and fast geometric method for modeling data by a union of affine subspaces. The method begins by forming a collection of local best-fit affine subspaces, i.e., subspaces approximating the data in local neighborhoods. The correct sizes of the local neighborhoods are determined automatically by the Jones' $\beta_2$ numbers (we prove under certain geometric conditions that our method finds the optimal local neighborhoods). The collection of subspaces is further processed by a greedy selection procedure or a spectral method to generate the final model. We discuss applications to tracking-based motion segmentation and clustering of faces under different illuminating conditions. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the suggested algorithms on these problems and also on synthetic hybrid linear data as well as the MNIST handwritten digits data; and we demonstrate how to use our algorithms for fast determination of the number of affine subspaces.

...read moreread less

Journal Article•DOI•

Flexible Kernel Memory

[...]

Dimitri Nowicki¹, Dimitri Nowicki², Hava T. Siegelmann¹, Hava T. Siegelmann³•Institutions (3)

University of Massachusetts Amherst¹, National Academy of Sciences of Ukraine², Harvard University³

11 Jun 2010-PLOS ONE

TL;DR: A new model of associative memory, capable of both binary and continuous-valued inputs, based on kernel theory, and a generalization of Radial Basis Function networks and in feature space, analogous to a Hopfield network.

...read moreread less

Abstract: This paper introduces a new model of associative memory, capable of both binary and continuous-valued inputs. Based on kernel theory, the memory model is on one hand a generalization of Radial Basis Function networks and, on the other, is in feature space, analogous to a Hopfield network. Attractors can be added, deleted, and updated on-line simply, without harming existing memories, and the number of attractors is independent of input dimension. Input vectors do not have to adhere to a fixed or bounded dimensionality; they can increase and decrease it without relearning previous memories. A memory consolidation process enables the network to generalize concepts and form clusters of input data, which outperforms many unsupervised clustering techniques; this process is demonstrated on handwritten digits from MNIST. Another process, reminiscent of memory reconsolidation is introduced, in which existing memories are refreshed and tuned with new inputs; this process is demonstrated on series of morphed faces.

...read moreread less

Discovering Cortical Algorithms.

[...]

Atif Hashmi¹, Mikko H. Lipasti¹•Institutions (1)

University of Wisconsin-Madison¹

01 Jan 2010

TL;DR: In this paper, the authors describe a cortical architecture inspired by the structural and functional properties of the cortical columns distributed and hierarchically organized throughout the mammalian neocortex, which creates invariant representations for various similar patterns occurring within its receptive field.

...read moreread less

Abstract: We describe a cortical architecture inspired by the structural and functional properties of the cortical columns distributed and hierarchically organized throughout the mammalian neocortex. This results in a model which is both computationally efficient and biologically plausible. The strength and robustness of our cortical architecture is ascribed to its distributed and uniformly structured processing units and their local update rules. Since our architecture avoids complexities involved in modelling individual neurons and their synaptic connections, we can study other interesting neocortical properties like independent feature detection, feedback, plasticity, invariant representation, etc. with ease. Using feedback, plasticity, object permanence, and temporal associations, our architecture creates invariant representations for various similar patterns occurring within its receptive field. We trained and tested our cortical architecture using a subset of handwritten digit images obtained from the MNIST database. Our initial results show that our architecture uses unsupervised feedforward processing as well as supervised feedback processing to differentiate handwritten digits from one another and at the same time pools variations of the same digit together to generate invariant representations.

...read moreread less

Proceedings Article•DOI•

PASS-GP: Predictive active set selection for Gaussian processes

[...]

Ricardo Henao¹, Ole Winther¹•Institutions (1)

Technical University of Denmark¹

07 Oct 2010

TL;DR: A new approximation method for Gaussian process (GP) learning for large data sets that combines inline active set selection with hyperparameter optimization and the leave-one-out predictive probability of the label is used for ranking the data points.

...read moreread less

Abstract: We propose a new approximation method for Gaussian process (GP) learning for large data sets that combines inline active set selection with hyperparameter optimization. The predictive probability of the label is used for ranking the data points. We use the leave-one-out predictive probability available in GPs to make a common ranking for both active and inactive points, allowing points to be removed again from the active set. This is important for keeping the complexity down and at the same time focusing on points close to the decision boundary. We lend both theoretical and empirical support to the active set selection strategy and marginal likelihood optimization on the active set. We make extensive tests on the USPS and MNIST digit classification databases with and without incorporating invariances, demonstrating that we can get state-of-the-art results (e.g.0.86% error on MNIST) with reasonable time complexity.

...read moreread less

Proceedings Article•DOI•

Intensity-Based Congealing for Unsupervised Joint Image Alignment

[...]

Markus Storer, Martin Urschler, Horst Bischof

23 Aug 2010

TL;DR: This work presents an approach for unsupervised alignment of an ensemble of images called congealing, based on image registration using the mutual information measure as a cost function, and presents results on the MNIST handwritten digit database and on facial images obtained from the CVL database.

...read moreread less

Abstract: We present an approach for unsupervised alignment of an ensemble of images called congealing. Our algorithm is based on image registration using the mutual information measure as a cost function. The cost function is optimized by a standard gradient descent method in a multiresolution scheme. As opposed to other congealing methods, which use the SSD measure, the mutual information measure is better suited as a similarity measure for registering images since no prior assumptions on the relation of intensities between images are required. We present alignment results on the MNIST handwritten digit database and on facial images obtained from the CVL database.

...read moreread less

Proceedings Article•DOI•

Sparse Representations for Efficient Shape Matching

[...]

Alexandre Noma¹, Roberto M. Cesar¹•Institutions (1)

University of São Paulo¹

30 Aug 2010

TL;DR: The approximation technique based on graph matching and efficient belief propagation is extended by using sparse representations for efficient shape matching, and successful results of recognition of 3D objects and handwritten digits are illustrated.

...read moreread less

Abstract: Graph matching is a fundamental problem with many applications in computer vision. Patterns are represented by graphs and pattern recognition corresponds to finding a correspondence between vertices from different graphs. In many cases, the problem can be formulated as a quadratic assignment problem, where the cost function consists of two components: a linear term representing the vertex compatibility and a quadratic term encoding the edge compatibility. The quadratic assignment problem is NP-hard and the present paper extends the approximation technique based on graph matching and efficient belief propagation, described in a previous work, by using sparse representations for efficient shape matching. Successful results of recognition of 3D objects and handwritten digits are illustrated, using COIL and MNIST datasets, respectively.

...read moreread less

Proceedings Article•DOI•

Topological features in locally connected RBMs

[...]

Andreas Müller¹, Hannes Schulz¹, Sven Behnke¹•Institutions (1)

University of Bonn¹

18 Jul 2010

TL;DR: This work proposes a modification to the learning rule of locally connected RBMs, which ensures that topological image structure is preserved in the latent representation, and uses a Gaussian kernel to transfer topological properties of the image space to the feature space.

...read moreread less

Abstract: Unsupervised learning algorithms find ways to model latent structure present in the data. These latent structures can then serve as a basis for supervised classification methods. A common choice for unsupervised feature discovery is the Restricted Boltzmann Machine (RBM). Since the RBM is a general purpose learning machine, it is not particularly tailored for image data. Representations found by RBMs are consequently not image-like. Since it is essential to exploit the known topological structure for image analysis, it is desirable not to discard the topology property when learning new representations. Then, the same learning methods can be applied to the latent representation in a hierarchical manner. In this work, we propose a modification to the learning rule of locally connected RBMs, which ensures that topological image structure is preserved in the latent representation. To this end, we use a Gaussian kernel to transfer topological properties of the image space to the feature space. The learned model is then used as an initialization for a neural network trained to classify the images. We evaluate our approach on the MNIST and Caltech 101 datasets and demonstrate that we are able to learn topological feature maps.

...read moreread less

Posted Content•

Sparse Group Restricted Boltzmann Machines

[...]

Heng Luo, Ruimin Shen, Cahngyong Niu

30 Aug 2010-arXiv: Machine Learning

TL;DR: Sparse group RBMs as mentioned in this paper are trained with the regularizer \emph{sparse group} RBMs, which not only encourages hidden units of many groups to be inactive given observed data but also makes hidden units within a group compete with each other for modeling observed data.

...read moreread less

Abstract: Since learning is typically very slow in Boltzmann machines, there is a need to restrict connections within hidden layers. However, the resulting states of hidden units exhibit statistical dependencies. Based on this observation, we propose using $l_1/l_2$ regularization upon the activation possibilities of hidden units in restricted Boltzmann machines to capture the loacal dependencies among hidden units. This regularization not only encourages hidden units of many groups to be inactive given observed data but also makes hidden units within a group compete with each other for modeling observed data. Thus, the $l_1/l_2$ regularization on RBMs yields sparsity at both the group and the hidden unit levels. We call RBMs trained with the regularizer \emph{sparse group} RBMs. The proposed sparse group RBMs are applied to three tasks: modeling patches of natural images, modeling handwritten digits and pretaining a deep networks for a classification task. Furthermore, we illustrate the regularizer can also be applied to deep Boltzmann machines, which lead to sparse group deep Boltzmann machines. When adapted to the MNIST data set, a two-layer sparse group Boltzmann machine achieves an error rate of $0.84\%$, which is, to our knowledge, the best published result on the permutation-invariant version of the MNIST task.

...read moreread less

Proceedings Article•DOI•

Semi-supervised regression with temporal image sequences

[...]

Ling Xie¹, Miguel Á. Carreira-Perpiñán¹, Shawn Newsam¹•Institutions (1)

University of California, Merced¹

03 Dec 2010

TL;DR: A regression framework regularized with a graph Laplacian prior, where the graph is given by the sequential information, outperforms graphs learned in an unsupervised way for detecting the rotation of MNIST digits and estimating the time of day an image is captured, and provides modest improvement in the challenging visibility problem.

...read moreread less

Abstract: We consider a semi-supervised regression setting where we have temporal sequences of partially labeled data, under the assumption that the labels should vary slowly along a sequence, but that nearby points in input space may have drastically different labels. The setting ismotivated by problems such as determining the time of the day or the level of air visibility given an image of a landscape, which is hard because the time or visibility label is related in a complex way with the pixel values. We propose a regression framework regularized with a graph Laplacian prior, where the graph is given by the sequential information. We show this outperforms graphs learned in an unsupervised way for detecting the rotation of MNIST digits and estimating the time of day an image is captured, and provides modest improvement in the challenging visibility problem.

...read moreread less

Proceedings Article•DOI•

Exploring Pattern Selection Strategies for Fast Neural Network Training

[...]

Szilárd Vajda, Gernot A. Fink

23 Aug 2010

TL;DR: Three different strategies to select more efficiently the patterns for a fast learning in such a neural framework by reducing the number of available training patterns by dealing just with samples close to the decision boundaries of the classifiers are proposed.

...read moreread less

Abstract: Nowadays, the usage of neural network strategies in pattern recognition is a widely considered solution. In this paper we propose three different strategies to select more efficiently the patterns for a fast learning in such a neural framework by reducing the number of available training patterns. All the strategies rely on the idea of dealing just with samples close to the decision boundaries of the classifiers. The effectiveness (accuracy, speed) of these methods is confirmed through different experiments on the MNIST handwritten digit data [1], Bangla handwritten numerals [2] and the Shuttle data from the UCI machine learning repository [3].

...read moreread less

Proceedings Article•DOI•

Unsupervised Layer-Wise Model Selection in Deep Neural Networks

[...]

Ludovic Arnold¹, Hélène Paugam-Moisy², Michèle Sebag³•Institutions (3)

University of Paris-Sud¹, University of Lyon², French Institute for Research in Computer Science and Automation³

04 Aug 2010

TL;DR: The proposed approach, considering an unsupervised criterion, empirically examines whether model selection is a modular optimization problem, and can be tackled in a layer-wise manner, and preliminary results suggest the answer is positive.

...read moreread less

Abstract: Deep Neural Networks (DNN) propose a new and efficient ML architecture based on the layer-wise building of several representation layers. A critical issue for DNNs remains model selection, e.g. selecting the number of neurons in each DNN layer. The hyper-parameter search space exponentially increases with the number of layers, making the popular grid search-based approach used for finding good hyper-parameter values intractable. The question investigated in this paper is whether the unsupervised, layer-wise methodology used to train a DNN can be extended to model selection as well. The proposed approach, considering an unsupervised criterion, empirically examines whether model selection is a modular optimization problem, and can be tackled in a layer-wise manner. Preliminary results on the MNIST data set suggest the answer is positive. Further, some unexpected results regarding the optimal size of layers depending on the training process, are reported and discussed.

...read moreread less

Book Chapter•DOI•

Totally-corrective multi-class boosting

[...]

Zhihui Hao¹, Chunhua Shen², Nick Barnes², Bo Wang¹•Institutions (2)

Beijing Institute of Technology¹, Australian National University²

08 Nov 2010

TL;DR: This work proposes a column-generation based totally-corrective framework for multi-class boosting learning by looking at the Lagrange dual problems and proposes two boosting algorithms that have comparable generalization capability but converge much faster than their counterparts.

...read moreread less

Abstract: We proffer totally-corrective multi-class boosting algorithms in this work. First, we discuss the methods that extend two-class boosting to multi-class case by studying two existing boosting algorithms: AdaBoost.MO and SAMME, and formulate convex optimization problems that minimize their regularized cost functions. Then we propose a column-generation based totally-corrective framework for multi-class boosting learning by looking at the Lagrange dual problems. Experimental results on UCI datasets show that the new algorithms have comparable generalization capability but converge much faster than their counterparts. Experiments on MNIST handwriting digit classification also demonstrate the effectiveness of the proposed algorithms.

...read moreread less

Book Chapter•DOI•

Linear dimensionality reduction through eigenvector selection for object recognition

[...]

Fadi Dornaika¹, A. Assoum²•Institutions (2)

University of the Basque Country¹, Lebanese University²

29 Nov 2010

TL;DR: This paper addresses Linear Dimensionality Reduction through Eigenvector selection for object recognition with a unified framework for one transform based LDR and a framework for two transform based DLR.

...read moreread less

Abstract: Past work on Linear Dimensionality Reduction (LDR)has emphasized the issues of classification and dimension estimation. However, relatively less attention has been given to the critical issue of eigenvector selection. The main trend in feature extraction has been representing the data in a lower dimensional space, for example, using principal component analysis (PCA) without using an effective scheme to select an appropriate set of features/eigenvectors in this space. This paper addresses Linear Dimensionality Reduction through Eigenvector selection for object recognition. It has two main contributions. First, we propose a unified framework for one transform based LDR. Second, we propose a framework for two transform based DLR. As a case study, we consider PCA and Linear Discriminant Analysis (LDA) for the linear transforms. We have tested our proposed frameworks on several public benchmark data sets. Experiments on ORL, UMIST, and YALE Face Databases and MNIST Handwritten Digit Database show significant performance improvements in recognition that are based on eigenvector selection.

...read moreread less

Dissertation•

Robust Visual Recognition Using Multilayer Generative Neural Networks

[...]

Yichuan Tang

25 Aug 2010

TL;DR: This thesis develops a probabilistic denoising algorithm to determine a subset of the hidden layer nodes to unclamp and shows significantly better performance over the standard DBN implementations for various sources of noise on the standard and Variations MNIST databases.

...read moreread less

Abstract: Deep generative neural networks such as the Deep Belief Network and Deep Boltzmann Machines have been used successfully to model high dimensional visual data. However, they are not robust to common variations such as occlusion and random noise. In this thesis, we explore two strategies for improving the robustness of DBNs. First, we show that a DBN with sparse connections in the first layer is more robust to variations that are not in the training set. Second, we develop a probabilistic denoising algorithm to determine a subset of the hidden layer nodes to unclamp. We show that this can be applied to any feedforward network classifier with localized first layer connections. By utilizing the already available generative model for denoising prior to recognition, we show significantly better performance over the standard DBN implementations for various sources of noise on the standard and Variations MNIST databases.

...read moreread less

Shape representation in primate visual area 4 and inferotemporal cortex

[...]

Brian Litt¹, Thomas M. Murphy¹•Institutions (1)

University of Pennsylvania¹

01 Jan 2010

TL;DR: The results support the hypothesis that the response properties of V4 and IT cells, as well as the computer models of them, function as robust shape descriptors in the object recognition process.

...read moreread less

Abstract: The representation of contour shape is an essential component of object recognition, but the cortical mechanisms underlying it are incompletely understood, leaving it a fundamental open question in neuroscience. Such an understanding would be useful theoretically as well as in developing computer vision and Brain-Computer Interface applications. We ask two fundamental questions: “How is contour shape represented in cortex and how can neural models and computer vision algorithms more closely approximate this?” We begin by analyzing the statistics of contour curvature variation and develop a measure of salience based upon the arc length over which it remains within a constrained range. We create a population of V4-like cells – responsive to a particular local contour conformation located at a specific position on an object’s boundary – and demonstrate high recognition accuracies classifying handwritten digits in the MNIST database and objects in the MPEG-7 Shape Silhouette database. We compare the performance of the cells to the “shape-context” representation (Belongie et al., 2002) and achieve roughly comparable recognition accuracies using a small test set. We analyze the relative contributions of various feature sensitivities to recognition accuracy and robustness to noise. Local curvature appears to be the most informative for shape recognition. We create a population of IT-like cells, which integrate specific information about the 2-D boundary shapes of multiple contour fragments, and evaluate its performance on a set of real images as a function of the V4 cell inputs. We determine the sub-population of cells that are most effective at identifying a particular category. We classify based upon cell population response and obtain very good results. We use the Morris-Lecar neuronal model to more realistically illustrate the previously explored shape representation pathway in V4 – IT. We demonstrate recognition using spatiotemporal patterns within a winnerless competition network with FitzHugh-Nagumo model neurons. Finally, we use the Izhikevich neuronal model to produce an enhanced response in IT, correlated with recognition, via gamma synchronization in V4. Our results support the hypothesis that the response properties of V4 and IT cells, as well as our computer models of them, function as robust shape descriptors in the object recognition process.

...read moreread less

Proceedings Article•DOI•

An improved radial basis function neural network based on a cooperative coevolutionary algorithm for handwritten digits recognition

[...]

Salima Nebti¹, Abdellah Boukerram¹•Institutions (1)

Université de Sétif¹

29 Nov 2010

TL;DR: A cooperative co-evolutionary algorithm to improve the performance of a radial basis function neural network when it is applied to recognition of handwritten Arabic digits.

...read moreread less

Abstract: Co-evolutionary algorithms are a class of adaptive search meta-heuristics inspired from the mechanism of reciprocal benefits between species in nature. The present work proposes a cooperative co-evolutionary algorithm to improve the performance of a radial basis function neural network (RBFNN) when it is applied to recognition of handwritten Arabic digits. This work is in fact a combination of ten RBFNNs where each of them is considered as an expert classifier in distinguishing one digit from the others; each RBFNN classifier adapts its input features and its structure including the number of centres and their positions based on a symbiotic approach. The set of characteristic features and RBF centres have been considered as dissimilar species where each of them can benefit from the other, imitating in a simplified way the symbiotic interaction of species in nature. Co-evolution is founded on saving the best weights and centres that give the maximum improvement on the sum of squared error of each RBFNN after a number of learning iterations. The results quality has been estimated and compared to other experiments. Results on extracted handwritten digits from the MNIST database show that the co-evolutionary approach is the best.

...read moreread less

Proceedings Article•DOI•

Subspace Methods with Globally/Locally Weighted Correlation Matrix

[...]

Yukihiko Yamashita¹, Toru Wakahara²•Institutions (2)

Tokyo Institute of Technology¹, Hosei University²

23 Aug 2010

TL;DR: Two kinds of weighted correlation matrices for subspace methods that attach importance to training patterns that are far from the average and can reflect the distribution of patterns around the category boundary more precisely are proposed.

...read moreread less

Abstract: The discriminant function of a subspace method is provided by using correlation matrices that reflect the averaged feature of a category. As a result, it will not work well on unknown input patterns that are far from the average. To address this problem, we propose two kinds of weighted correlation matrices for subspace methods. The globally weighted correlation matrix (GWCM) attaches importance to training patterns that are far from the average. Then, it can reflect the distribution of patterns around the category boundary more precisely. The computational cost of a subspace method using GWCMs is almost the same as that using ordinary correlation matrices. The locally weighted correlation matrix (LWCM) attaches importance to training patterns that arenear to an input pattern to be classified. Then, it can reflect the distribution of training patterns around the input pattern in more detail. The computational cost of a subspace method with LWCM at the recognition stage does not depend on the number of training patterns, while those of the conventional adaptive local and the nonlinear subspace methods do. We show the advantages of the proposed methods by experiments made on the MNIST database of handwritten digits.

...read moreread less

Proceedings Article•DOI•

Cluster Preserving Embedding

[...]

Yubin Zhan¹, Jianping Yin•Institutions (1)

National University of Defense Technology¹

23 Aug 2010

TL;DR: A novel dimensionality reduction method called Cluster Preserving Embedding (CPE) is proposed, in which the cluster structure of original data is preserved via preserving the robust path-based similarity between pairwise points.

...read moreread less

Abstract: Most of existing dimensionality reduction methods obtain the low-dimensional embedding via preserving a certain property of the data, such as locality, neighborhood relationship. However, the intrinsic cluster structure of data, which plays a key role in analyzing and utilizing the data, has been ignored by the state-of-the-art dimensionality reduction methods. Hence, in this paper we propose a novel dimensionality reduction method called Cluster Preserving Embedding(CPE), in which the cluster structure of original data is preserved via preserving the robust path-based similarity between pairwise points. We present two different methods to preserve this similarity. One is the Multidimensional Scaling(MDS) way, which tries to preserve similarity matrix accurately, the other one is a Laplacian-style way, which preserves the topological partial order of the similarity rather than similarity itself. Encouraging experimental results on a toy data set and handwritten digits from MNIST database demonstrate the effectiveness of our Cluster Preserving Embedding method.

...read moreread less