scispace - formally typeset
Search or ask a question

Showing papers on "MNIST database published in 2007"


Proceedings ArticleDOI
17 Jun 2007
TL;DR: An unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions that alleviates the over-parameterization problems that plague purely supervised learning procedures, and yields good performance with very few labeled training samples.
Abstract: We present an unsupervised method for learning a hierarchy of sparse feature detectors that are invariant to small shifts and distortions. The resulting feature extractor consists of multiple convolution filters, followed by a feature-pooling layer that computes the max of each filter output within adjacent windows, and a point-wise sigmoid non-linearity. A second level of larger and more invariant features is obtained by training the same algorithm on patches of features from the first level. Training a supervised classifier on these features yields 0.64% error on MNIST, and 54% average recognition rate on Caltech 101 with 30 training samples per category. While the resulting architecture is similar to convolutional networks, the layer-wise unsupervised training procedure alleviates the over-parameterization problems that plague purely supervised learning procedures, and yields good performance with very few labeled training samples.

1,232 citations


Journal ArticleDOI
TL;DR: A trainable feature extractor based on the LeNet5 convolutional neural network architecture is introduced to solve the first problem in a black box scheme without prior knowledge on the data and the results show that the system can outperform both SVMs and Le net5 while providing performances comparable to the best performance on this database.

306 citations


Journal ArticleDOI
TL;DR: It is shown experimentally that the proposed nonlinear image deformation models performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity.
Abstract: We present the application of different nonlinear image deformation models to the task of image recognition The deformation models are especially suited for local changes as they often occur in the presence of image object variability We show that, among the discussed models, there is one approach that combines simplicity of implementation, low-computational complexity, and highly competitive performance across various real-world image recognition tasks We show experimentally that the model performs very well for four different handwritten digit recognition tasks and for the classification of medical images, thus showing high generalization capacity In particular, an error rate of 054 percent on the MNIST benchmark is achieved, as well as the lowest reported error rate, specifically 126 percent, in the 2005 international ImageCLEF evaluation of medical image specifically categorization

257 citations


Journal ArticleDOI
TL;DR: The fast condensed nearest neighbor (FCNN) rule was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.
Abstract: This work has two main objectives, namely, to introduce a novel algorithm, called the fast condensed nearest neighbor (FCNN) rule, for computing a training-set-consistent subset for the nearest neighbor decision rule and to show that condensation algorithms for the nearest neighbor rule can be applied to huge collections of data. The FCNN rule has some interesting properties: it is order independent, its worst-case time complexity is quadratic but often with a small constant prefactor, and it is likely to select points very close to the decision boundary. Furthermore, its structure allows for the triangle inequality to be effectively exploited to reduce the computational effort. The FCNN rule outperformed even here-enhanced variants of existing competence preservation methods both in terms of learning speed and learning scaling behavior and, often, in terms of the size of the model while it guaranteed the same prediction accuracy. Furthermore, it was three orders of magnitude faster than hybrid instance-based learning algorithms on the MNIST and Massachusetts Institute of Technology (MIT) Face databases and computed a model of accuracy comparable to that of methods incorporating a noise-filtering pass.

180 citations


Journal ArticleDOI
TL;DR: The notion of the support associated to an instantiation is defined, and the combination of a deformable model with an efficient estimation procedure yields competitive results in a variety of applications with very small training sets, without need to train decision boundaries.
Abstract: We formulate a deformable template model for objects with an efficient mechanism for computation and parameter estimation. The data consists of binary oriented edge features, robust to photometric variation and small local deformations. The template is defined in terms of probability arrays for each edge type. A primary contribution of this paper is the definition of the instantiation of an object in terms of shifts of a moderate number local submodels--parts--which are subsequently recombined using a patchwork operation, to define a coherent statistical model of the data. Object classes are modeled as mixtures of patchwork of parts POP models that are discovered sequentially as more class data is observed. We define the notion of the support associated to an instantiation, and use this to formulate statistical models for multi-object configurations including possible occlusions. All decisions on the labeling of the objects in the image are based on comparing likelihoods. The combination of a deformable model with an efficient estimation procedure yields competitive results in a variety of applications with very small training sets, without need to train decision boundaries--only data from the class being trained is used. Experiments are presented on the MNIST database, reading zipcodes, and face detection.

146 citations


01 Jan 2007
TL;DR: A novel unsupervised method for learning sparse, overcomplete features using a linear encoder, and a linear decoder preceded by a sparsifying non-linearity that turns a code vector into a quasi-binary sparse code vector.
Abstract: We describe a novel unsupervised method for learning sparse, overcomplete features The model uses a linear encoder, and a linear decoder preceded by a sparsifying non-linearity that turns a code vector into a quasi-binary sparse code vector Given an input, the optimal code minimizes the distance between the output of the decoder and the input patch while being as similar as possible to the encoder output Learning proceeds in a two-phase EM-like fashion: (1) compute the minimum-energy code vector, (2) adjust the parameters of the encoder and decoder so as to decrease the energy The model produces “stroke detectors” when trained on handwritten numerals, and Gabor-like filters when trained on natural image patches Inference and learning are very fast, requiring no preprocessing, and no expensive sampling Using the proposed unsupervised method to initialize the first layer of a convolutional network, we achieved an error rate slightly lower than the best reported result on the MNIST dataset Finally, an extension of the method is described to learn topographical filter maps

110 citations


Journal ArticleDOI
TL;DR: This paper presents a novel cascade ensemble classifier system for the recognition of handwritten digits with encouraging results: a high reliability of 99.96% with minimal rejection, or a 99.59% correct recognition rate without rejection in the last cascade layer.

99 citations


Proceedings ArticleDOI
23 Sep 2007
TL;DR: This work aims at studying random forest methods in a strictly pragmatic approach, in order to provide rules on parameter settings for practitioners and draws some conclusions on random forest global behavior according to their parameter tuning.
Abstract: In the pattern recognition field, growing interest has been shown in recent years for multiple classifier systems and particularly for bagging, boosting and random sub-spaces. Those methods aim at inducing an ensemble of classifiers by producing diversity at different levels. Following this principle, Breiman has introduced in 2001 another family of methods called random forest. Our work aims at studying those methods in a strictly pragmatic approach, in order to provide rules on parameter settings for practitioners. For that purpose we have experimented the forest-RI algorithm, considered as the random forest reference method, on the MNIST handwritten digits database. In this paper, we describe random forest principles and review some methods proposed in the literature. We present next our experimental protocol and results. We finally draw some conclusions on random forest global behavior according to their parameter tuning.

71 citations


Proceedings ArticleDOI
05 Nov 2007
TL;DR: A novel system based on radon transform for handwritten digit recognition is proposed which represents an image as a collection of projections along various directions and a nearest neighbor classifier is used for the subsequent recognition purpose.
Abstract: The performance of a character recognition system depends heavily on what features are being used. Though many kinds of features have been developed and their test performances on standard database have been reported, there is still room to improve the recognition rate by developing improved features. In this paper, we propose a novel system based on radon transform for handwritten digit recognition. We have used radon function which represents an image as a collection of projections along various directions. The resultant feature vector by applying this method is the input for the classification stage. A nearest neighbor classifier is used for the subsequent recognition purpose. A test performed on the MNIST handwritten numeral database and on Kannada handwritten numerals demonstrate the effectiveness and feasibility of the proposed method

43 citations


Proceedings ArticleDOI
26 Dec 2007
TL;DR: The "Sparse LDA" algorithm is extended with new sparsity bounds on 2-class separability and efficient partitioned matrix inverse techniques leading to 1000-fold speed-ups and state-of-the-art recognition is obtained while discarding the majority of pixels in all experiments.
Abstract: We extend the "Sparse LDA" algorithm of [7] with new sparsity bounds on 2-class separability and efficient partitioned matrix inverse techniques leading to 1000-fold speed-ups. This mitigates the 0(n4) scaling that has limited this algorithm's applicability to vision problems and also prioritizes the less-myopic backward elimination stage by making it faster than forward selection. Experiments include "sparse eigenfaces" and gender classification on FERET data as well as pixel/part selection for OCR on MNIST data using Bayesian (GP) classification. Sparse- LDA is an attractive alternative to the more demanding Automatic Relevance Determination. State-of-the-art recognition is obtained while discarding the majority of pixels in all experiments. Our sparse models also show a better fit to data in terms of the "evidence" or marginal likelihood.

40 citations


Posted Content
TL;DR: It is shown that despite the maturity of the field, different approaches still deliver results that vary enough to allow improvements by using their combination, by choosing four well-motivated state-of-the-art recognition systems for which results on the standard MNIST benchmark are available.
Abstract: Although the recognition of isolated handwritten digits has been a research topic for many years, it continues to be of interest for the research community and for commercial applications. We show that despite the maturity of the field, different approaches still deliver results that vary enough to allow improvements by using their combination. We do so by choosing four well-motivated state-of-the-art recognition systems for which results on the standard MNIST benchmark are available. When comparing the errors made, we observe that the errors made differ between all four systems, suggesting the use of classifier combination. We then determine the error rate of a hypothetical system that combines the output of the four systems. The result obtained in this manner is an error rate of 0.35% on the MNIST data, the best result published so far. We furthermore discuss the statistical significance of the combined result and of the results of the individual classifiers.

Proceedings ArticleDOI
23 Sep 2007
TL;DR: The method is used to pre-train the first four layers of a deep convolutional network which achieves state-of-the-art performance on the MNIST dataset of handwritten digits.
Abstract: We describe an unsupervised learning algorithm for extracting sparse and locally shift-invariant features. We also devise a principled procedure for learning hierarchies of invariant features. Each feature detector is composed of a set of trainable convolutional filters followed by a max-pooling layer over non-overlapping windows, and a point-wise sigmoid non-linearity. A second stage of more invariant features is fed with patches provided by the first stage feature extractor, and is trained in the same way. The method is used to pre-train the first four layers of a deep convolutional network which achieves state-of-the-art performance on the MNIST dataset of handwritten digits. The final testing error rate is equal to 0.42%. Preliminary experiments on compression of bitonal document images show very promising results in terms of compression ratio and reconstruction error.

Proceedings ArticleDOI
20 Jun 2007
TL;DR: A representer theorem is proved for this transductive framework of distance metric learning, linking it with function estimation in an RKHS, and making it possible for generalization to unseen test samples.
Abstract: Distance metric learning and nonlinear dimensionality reduction are two interesting and active topics in recent years. However, the connection between them is not thoroughly studied yet. In this paper, a transductive framework of distance metric learning is proposed and its close connection with many nonlinear spectral dimensionality reduction methods is elaborated. Furthermore, we prove a representer theorem for our framework, linking it with function estimation in an RKHS, and making it possible for generalization to unseen test samples. In our framework, it suffices to solve a sparse eigenvalue problem, thus datasets with 105 samples can be handled. Finally, experiment results on synthetic data, several UCI databases and the MNIST handwritten digit database are shown.

Journal ArticleDOI
TL;DR: The results support the hypothesis that V4 cells function as robust shape descriptors in the early stages of object recognition.

Proceedings ArticleDOI
24 Aug 2007
TL;DR: Experimental results indicate the proposed QNN recognition system achieves excellent performance in terms of recognition rates and recognition reliability, and at the same time show the superiority and potential of QNN in solving pattern recognition problems.
Abstract: In this paper, a handwritten digital recognition system based on multi-level transfer function quantum neural networks (QNN) and multi-layer classifiers is proposed. The recognition system proposed consists of two layer sub-classifiers, namely first-layer QNN coarse classifier and second-layer QNN numeral pairs classifier. Handwritten digital recognition experiments are performed by using data from MNIST database. Experiment results indicate the proposed QNN recognition system achieves excellent performance in terms of recognition rates and recognition reliability, and at the same time show the superiority and potential of QNN in solving pattern recognition problems.

Proceedings ArticleDOI
29 Oct 2007
TL;DR: This classifier develops the idea of the permutation coding neural classifier (PCNC), a multipurpose image recognition system based on random local descriptors (RLD), with main advantage of the pairwise PCNC is its ability to deal with large displacements of the object in the image due to utilization of pairs of RLDs instead of individual R LDs.
Abstract: In this paper we propose pairwise permutation coding neural classifier (Pairwise PCNC). This classifier develops the idea of the permutation coding neural classifier (PCNC), a multipurpose image recognition system based on random local descriptors (RLD). Previous tests of PCNC demonstrated good results in different image recognition tasks including: handwritten digit recognition, face recognition, and micro work piece shape recognition. Main advantage of the pairwise PCNC is its ability to deal with large displacements of the object in the image due to utilization of pairs of RLDs instead of individual RLDs. Pairwise PCNC was tested on the MNIST database and comparative results suggest the potential of the proposed approach.

Proceedings ArticleDOI
15 Apr 2007
TL;DR: A handwritten digit recognition algorithm that uses 4×4 2D hidden Markov models to extract basic features from an unclassified image and achieves a 95.51 percent recognition rate with zero rejection on the MNIST database of handwritten digits.
Abstract: We propose a handwritten digit recognition algorithm that uses 4×4 2D hidden Markov models to extract basic features from an unclassified image. The novel idea given here is that we use powerful techniques from the emerging mathematical fields of tropical geometry and algebraic statistics to determine parameters for the model. The distance between the unclassified images and prototypes is calculated in stages, where estimates of the distance become finer as obviously distant prototypes are discarded from the pool of possible K-nearest neighbors. Our algorithm achieves a 95.51 percent recognition rate with zero rejection on the MNIST database of handwritten digits.

Proceedings ArticleDOI
29 Oct 2007
TL;DR: Simulation results on the MNIST database of handwritten digits show the proposed adaptive iterative learning mechanism based on feature selection and combination voting (AdaFSCV) can improve the classification accuracy and robustness with certain levels of trade-off of the computational cost.
Abstract: Feature selection is an active research area in machine learning for high dimensional dataset analysis. The idea is to perform the learning process solely on the top ranked feature spaces instead of the entire original feature space, and therefore to improve the understanding of the inherent characteristics of such dataset as well as reduce the computational cost. While most of the research efforts are focused on how to select the proper features for machine learning, we studied the following important problem in this paper: can the "unimportant features" (low rank features) also provide useful information to improve the overall learning capability? In this paper, we proposed an adaptive iterative learning mechanism based on feature selection and combination voting (AdaFSCV) to address this issue. Unlike the conventional way of discarding the unselected low rank features, we iteratively build classifiers in those feature spaces as well. Such iterative process will adaptively learn information in different feature spaces, and automatically stop when one classify can not provide better information than a random guess. Finally, a probability voting algorithm is proposed to combine all the votes from different classifiers to provide the final prediction results. Simulation results on the MNIST database of handwritten digits show this method can improve the classification accuracy and robustness with certain levels of trade-off of the computational cost.

Proceedings ArticleDOI
29 Oct 2007
TL;DR: This paper proposes an auto-associative neural network system (AANNS) based on multiple Autoencoders that has the functions ofAuto-association, incremental learning and local update, which are the foundations of cognitive science.
Abstract: Recently, a nonlinear dimension reduction technique, called Autoencoder, had been proposed. It can efficiently carry out mappings in both directions between the original data and low-dimensional code space. However, a single Autoencoder commonly maps all data into a single subspace. If the original data set have remarkable different categories (for example, characters and handwritten digits), then only one Autoencoder will not be efficient. To deal with the data of remarkable different categories, this paper proposes an auto-associative neural network system (AANNS) based on multiple Autoencoders. The novel technique has the functions of auto-association, incremental learning and local update. Excitingly, these functions are the foundations of cognitive science. Experimental results on benchmark MNIST digit dataset and handwritten character-digit dataset show the advantages of the proposed model.

Proceedings ArticleDOI
TL;DR: This work discusses how the properties of different models of image coding, i.e. sparseness, decorrelation, and statistical independence are related to each other, and proposes to evaluate the different models by verifiable performance measures.
Abstract: The optimal coding hypothesis proposes that the human visual system has adapted to the statistical properties of the environment by the use of relatively simple optimality criteria. We here (i) discuss how the properties of different models of image coding, i.e. sparseness, decorrelation, and statistical independence are related to each other (ii) propose to evaluate the different models by verifiable performance measures (iii) analyse the classification performance on images of handwritten digits (MNIST data base). We first employ the SPARSENET algorithm (Olshausen, 1998) to derive a local filter basis (on 13 × 13 pixels windows). We then filter the images in the database (28 × 28 pixels images of digits) and reduce the dimensionality of the resulting feature space by selecting the locally maximal filter responses. We then train a support vector machine on a training set to classify the digits and report results obtained on a separate test set. Currently, the best state-of-the-art result on the MNIST data base has an error rate of 0,4%. This result, however, has been obtained by using explicit knowledge that is specific to the data (elastic distortion model for digits). We here obtain an error rate of 0,55% which is second best but does not use explicit data specific knowledge. In particular it outperforms by far all methods that do not use data-specific knowledge.

Proceedings ArticleDOI
28 Jan 2007
TL;DR: Experiments performed on the MNIST handwritten digit database show that coupled architectures yield better recognition performances than non-coupled ones and than discriminative methods such as SVMs.
Abstract: We investigate in this paper the application of dynamic Bayesian networks (DBNs) to the recognition of handwritten digits. The main idea is to couple two separate HMMs into various architectures. First, a vertical HMM and a horizontal HMM are built observing the evolving streams of image columns and image rows respectively. Then, two coupled architectures are proposed to model interactions between these two streams and to capture the 2D nature of character images. Experiments performed on the MNIST handwritten digit database show that coupled architectures yield better recognition performances than non-coupled ones. Additional experiments conducted on artificially degraded (broken) characters demonstrate that coupled architectures better cope with such degradation than non coupled ones and than discriminative methods such as SVMs.

Proceedings ArticleDOI
09 Aug 2007
TL;DR: This paper examines the issue of reading the amount of money written on the checks and Genetic Programming (GP) technique is used for dealing with this problem, and a new type of input representation is proposed: histograms.
Abstract: In spite of evolution of electronic techniques, a large number of applications continue to rely on the use of paper as the dominant medium. Bank checks are a widely known example. When filled by hand, the processing of the written information requires either a human or a special software which has intelligent abilities. This paper examines the issue of reading the amount of money written on the checks. Genetic Programming (GP) technique is used for dealing with this problem. A new type of input representation is proposed: histograms. Several numerical experiments with GP are performed by using large datasets taken from the MNIST benchmarking set. Preliminary results show a good behavior of the method.

Book ChapterDOI
22 Aug 2007
TL;DR: A novel Bayesian classifier with smaller eigenvalues reset by threshold based on database is proposed in this paper, and the error rates of both handwritten number recognition with MNIST database and Bengali handwritten digit recognition are small by using the proposed method.
Abstract: A novel Bayesian classifier with smaller eigenvalues reset by threshold based on database is proposed in this paper. The threshold is used to substitute eigenvalues of scatter matrices which are smaller than the threshold to minimize the classification error rate with a given database, thus improving the performance of Bayesian classifier. Several experiments have shown its effectiveness. The error rates of both handwritten number recognition with MNIST database and Bengali handwritten digit recognition are small by using the proposed method. The steganalyszing JPEG images using this proposed classifier performs well.

Journal Article
Peng Li1
TL;DR: The experimental results indicate the method achieves excellent performance in terms of recognition rates and recognition reliability, and show the superiority and potential of QNN in solving pattern recognition problems.
Abstract: The handwritten digital recognition is an important problem of pattern recognition field. An approach to handwritten digital recognition is presented based on multi-level transfer function quantum neural networks (QNN) and multi-layer classifiers. The QNN is trained and tested by the MNIST database. The experimental results indicate the method achieves excellent performance in terms of recognition rates and recognition reliability, and show the superiority and potential of QNN in solving pattern recognition problems.

Dissertation
01 Jan 2007
TL;DR: This thesis analyzes the modelling of visual cognitive representations based on extracting cognitive components from the MNIST dataset of handwritten digits using simple unsupervised linear and non-linear matrix factorizations both non-negative and unconstrained based on gradient descent learning.
Abstract: This thesis analyzes the modelling of visual cognitive representations based on extracting cognitive components from the MNIST dataset of handwritten digits using simple unsupervised linear and non-linear matrix factorizations both non-negative and unconstrained based on gradient descent learning. We introduce two different classes of generative models for modelling the cognitive data: Mixture Models and Deep Network Models. Mixture models based on K-Means, Guassian and factor analyzer kernel functions are presented as simple generative models in a general framework. From simulations we analyze the generative properties of these models and show how they render insufficient to proper model the complex distribution of the visual cognitive data. Motivated by the introduction of deep belief nets by Hinton et al. [12] we propose a simpler generative deep network model based on cognitive components. A theoretical framework is presented as individual modules for building a generative hierarchical network model. We analyze the performance in terms of classification and generation of MNIST digits and show how our simplifications compared to Hinton et al. [12] leads to degraded performance. In this respect we outline the differences and conjecture obvious improvements.

Journal Article
TL;DR: A dynamic 3 dimensional Neuro System which was consisted of a learning network which was based on weightless neural network and a feedback module which could accumulate the characteristic.
Abstract: The back propagation algorithm took a long time to learn the input patterns and was difficult to train the additional or repeated learning patterns. So Aleksander proposed the binary neural network which could overcome the disadvantages of BP Network. But it had the limitation of repeated learning and was impossible to extract a generalized pattern. In this paper, we proposed a dynamic 3 dimensional Neuro System which was consisted of a learning network which was based on weightless neural network and a feedback module which could accumulate the characteristic. The proposed system was enable to train additional and repeated patterns. Also it could be produced a generalized pattern by putting a proper threshold into each learning-net's discriminator which was resulted from learning procedures. And then we reused the generalized pattern to elevate the recognition rate. In the last processing step to decide right category, we used maximum response detector. We experimented using the MNIST database of NIST and got 99.3% of right recognition rate for training data.

Proceedings ArticleDOI
Seiji Hotta1
23 Sep 2007
TL;DR: A classification method designed by combining a local averaging classifier and a tangent distance is proposed for handwritten digit pattern recognition, and the superior performance of the proposed method is verified with the experiments on benchmark datasets MNIST and USPS.
Abstract: In this paper, a classification method designed by combining a local averaging classifier and a tangent distance is proposed for handwritten digit pattern recognition. In practice, first the k-nearest neighbors of an input sample are selected in each class by using a two-sided tangent distance. Next, the mean vectors of the selected transformed-neighbor samples are computed in individual classes. Finally, the input sample is classified to the class that minimizes the one sided tangent distance between the input sample and the mean one. The superior performance of the proposed method is verified with the experiments on benchmark datasets MNIST and USPS.

Proceedings ArticleDOI
13 Dec 2007
TL;DR: A novel method of measuring the learning capability of a network parameter is proposed and is validated on MNIST handwritten numeral database using backpropagation learning algorithm.
Abstract: The artificial neural network is typically trained from initial weight/bias position. As training progresses the network parameters such as weights and biases are updated according to learning algorithm to reduce the performance index. Not all the network parameters are equally learning the input-output mapping. Some parameters would hold more discriminating capability while others are not so effective. We propose a novel method of measuring the learning capability of a network parameter. The learning capability for a parameter we call it as learnability is contribution of that parameter to reduce performance index as the network is training. The proposed method of measuring learnability is applied on network parameters freezing on feedforward neural network. Our method is validated on MNIST handwritten numeral database using backpropagation learning algorithm.

01 Jan 2007
TL;DR: A fully unsupervised algorithm for learning sparse and locally invariant features at all levels of the Hubel and Wiesel architecture, demonstrating good performance even with few labeled training samples.
Abstract: Understanding how the visual cortex builds invariant representations is one of the most challenging problems in visual neuroscience. The feed-forward, multi-stage Hubel and Wiesel architecture [1, 2, 3, 4, 5] stacks multiple levels of alternating layers of simple cells that perform feature extraction, and complex cells that pool together features of a given type within a local receptive field. These computational models have been successfully applied to handwriting recognition [1, 2], and generic object recognition [4, 5]. Learning features in existing models consists in handcrafting the first layers and training the upper layers by recording templates from the training set, which leads to inefficient representations [4, 5], or in training the entire architecture supervised, which requires large training sets [2, 3]. We propose a fully unsupervised algorithm for learning sparse and locally invariant features at all levels. Each simple-cell layer is composed of multiple convolution filters followed by a winner-take-all competition within a local area, and a sigmoid non-linearity. For training, each simple-cell layer is coupled with a feed-back layer whose role is to reconstruct the input of the simple-cell layer from its output. These coupled layers are trained simultaneously to minimize the average reconstruction error. The output of a simple-cell layer can be seen as a sparse overcomplete representation of its input. The complex cells add the simple cell activities of one filter within the area over which the winner-take-all operation is performed, yielding representations that are invariant to small displacements of the input stimulus. The training procedure is similar to [6], but the local winner-take-all competition ensures that the representation is spatially sparse (and the complex-cell representation locally invariant). The next stage of simple-cell and complex-cell layers is trained in an identical fashion on the outputs of the first layer of complex cells [7], resulting in higher level, more invariant representations, that are then fed to a supervised classifier. Such a procedure yields 0.64% error on MNIST dataset (handwritten digits), and 54% average recognition rate on the Caltech-101 dataset (101 object categories, 30 training samples per category), demonstrating good performance even with few labeled training samples.

01 Jan 2007
TL;DR: A new type of input representation for GP is proposed: histograms, which is very simple and can be adapted very easily for the GP requirements.
Abstract: Handwritten recognition is a popular problem which requires special Artificial Intelligent techniques for solving it. In this paper we use Genetic Programming (GP) for addressing the off-line variant of the handwritten digit recognition. We propose a new type of input representation for GP: histograms. This kind of representation is very simple and can be adapted very easily for the GP requirements. Several numerical experiments with GP are performed by using several large datasets taken from the well-known MNIST benchmarking set. Numerical experiments show that GP performs very well for the considered test problems.