scispace - formally typeset
Search or ask a question

Showing papers on "MNIST database published in 2008"


ReportDOI
01 Jun 2008
TL;DR: A framework for learning optimal dictionaries for simultaneous sparse signal representation and robust class classification is introduced, addressing for the first time the explicit incorporation of both reconstruction and discrimination terms in the non-parametric dictionary learning and sparse coding energy.
Abstract: : A framework for learning optimal dictionaries for simultaneous sparse signal representation and robust class classification is introduced in this paper This problem for dictionary learning is solved by a class-dependent supervised simultaneous orthogonal matching pursuit, which learns the intra-class structure while increasing the inter-class discrimination, interleaved with an efficient dictionary update obtained via singular value decomposition This framework addresses for the first time the explicit incorporation of both reconstruction and discrimination terms in the non-parametric dictionary learning and sparse coding energy The work contributes to the understanding of the importance of learned sparse representations for signal classification, showing the relevance of learning discriminative and at the same time reconstructive dictionaries in order to achieve accurate and robust classification The presentation of the underlying theory is complemented with examples with the standard MNIST and Caltech datasets, and results on the use of the sparse representation obtained from the learned dictionaries as local patch descriptors, replacing commonly used experimental ones

153 citations


Journal ArticleDOI
TL;DR: It is concluded that the learning of a sparse representation of local image patches combined with a local maximum operation for feature extraction can significantly improve recognition performance.
Abstract: In this brief paper, we propose a method of feature extraction for digit recognition that is inspired by vision research: a sparse-coding strategy and a local maximum operation. We show that our method, despite its simplicity, yields state-of-the-art classification results on a highly competitive digit-recognition benchmark. We first employ the unsupervised Sparsenet algorithm to learn a basis for representing patches of handwritten digit images. We then use this basis to extract local coefficients. In a second step, we apply a local maximum operation to implement local shift invariance. Finally, we train a support vector machine (SVM) on the resulting feature vectors and obtain state-of-the-art classification performance in the digit recognition task defined by the MNIST benchmark. We compare the different classification performances obtained with sparse coding, Gabor wavelets, and principal component analysis (PCA). We conclude that the learning of a sparse representation of local image patches combined with a local maximum operation for feature extraction can significantly improve recognition performance.

108 citations


01 Oct 2008
TL;DR: A unified boosting framework for simultaneous learn- ing and alignment is presented and a novel boosting algorithm for Multiple Pose Learning (mpl) is presented, where the goal is to simultaneously split data into groups and train classifiers for each.
Abstract: In object recognition in general and in face detection in par- ticular, data alignment is necessary to achieve good classification results with certain statistical learning approaches such as Viola-Jones. Data can be aligned in one of two ways: (1) by separating the data into coherent groups and training separate classifiers for each; (2) by adjusting training samples so they lie in correspondence. If done manually, both procedures are labor intensive and can significantly add to the cost of labeling. In this paper we present a unified boosting framework for simultaneous learn- ing and alignment. We present a novel boosting algorithm for Multiple Pose Learning (mpl), where the goal is to simultaneously split data into groups and train classifiers for each. We also review Multiple Instance Learning (mil), and in particular mil-boost, and describe how to use it to simultaneously train a classifier and bring data into correspondence. We show results on variations of LFW and MNIST, demonstrating the potential of these approaches.

101 citations


Proceedings ArticleDOI
23 Jun 2008
TL;DR: In this paper, an unsupervised approach called ldquoleast squares congealing quo (LQC) is proposed to align an ensemble of images in an un-supervised manner.
Abstract: In this paper, we present an approach we refer to as ldquoleast squares congealingrdquo which provides a solution to the problem of aligning an ensemble of images in an unsupervised manner. Our approach circumvents many of the limitations existing in the canonical ldquocongealingrdquo algorithm. Specifically, we present an algorithm that:- (i) is able to simultaneously, rather than sequentially, estimate warp parameter updates, (ii) exhibits fast convergence and (iii) requires no pre-defined step size. We present alignment results which show an improvement in performance for the removal of unwanted spatial variation when compared with the related work of Learned-Miller on two datasets, the MNIST hand written digit database and the MultiPIE face database.

84 citations


23 Jun 2008
TL;DR: An algorithm is presented which is able to simultaneously, rather than sequentially, estimate warp parameter updates, exhibits fast convergence and requires no pre-defined step size.
Abstract: In this paper, we present an approach we refer to as "least squares congealing" which provides a solution to the problem of aligning an ensemble of images in an unsupervised manner. Our approach circumvents many of the limitations existing in the canonical "congealing" algorithm. Specifically, we present an algorithm that:- (i) is able to simultaneously, rather than sequentially, estimate warp parameter updates, (ii) exhibits fast convergence and (iii) requires no pre-defined step size. We present alignment results which show an improvement in performance for the removal of unwanted spatial variation when compared with the related work of Learned-Miller on two datasets, the MNIST hand written digit database and the MultiPIE face database.

79 citations


Proceedings Article
01 Jan 2008
TL;DR: This paper proposes a graph based method to make use of the Universum data to help depict the prior information for possible classifiers and shows that the proposed method can obtain superior performances over conventional supervised and semi-supervised methods.
Abstract: The Universum data, defined as a collection of ”nonexamples” that do not belong to any class of interest, have been shown to encode some prior knowledge by representing meaningful concepts in the same domain as the problem at hand. In this paper, we address a novel semi-supervised classification problem, called semi-supervised Universum, that can simultaneously utilize the labeled data, unlabeled data and the Universum data to improve the classification performance. We propose a graph based method to make use of the Universum data to help depict the prior information for possible classifiers. Like conventional graph based semi-supervised methods, the graph regularization is also utilized to favor the consistency between the labels. Furthermore, since the proposed method is a graph based one, it can be easily extended to the multiclass case. The empirical experiments on the USPS and MNIST datasets are presented to show that the proposed method can obtain superior performances over conventional supervised and semi-supervised methods.

57 citations


Journal ArticleDOI
TL;DR: This paper focuses on the applicability of the features inspired by the visual ventral stream for handwritten character recognition, and an analysis is conducted to evaluate the robustness of this approach to orientation, scale and translation distortions.
Abstract: This paper focuses on the applicability of the features inspired by the visual ventral stream for handwritten character recognition. A set of scale and translation invariant C2 features are first extracted from all images in the dataset. Three standard classifiers kNN, ANN and SVM are then trained over a training set and then compared over a separate test set. In order to achieve higher recognition rate, a two stage classifier was designed with different preprocessing in the second stage. Experiments performed to validate the method on the well-known MNIST database, standard Farsi digits and characters, exhibit high recognition rates and compete with some of the best existing approaches. Moreover an analysis is conducted to evaluate the robustness of this approach to orientation, scale and translation distortions.

40 citations


Proceedings ArticleDOI
01 Dec 2008
TL;DR: This paper has used the combination of Adaboost classifiers to evaluate an individual of the population and the fitness function is defined by the error rate of this combination.
Abstract: This paper presents a fast method using simple genetic algorithms (GAs) for features selection. Unlike traditional approaches using GAs, we have used the combination of Adaboost classifiers to evaluate an individual of the population. So, the fitness function we have used is defined by the error rate of this combination. This approach has been implemented and tested on the MNIST database and the results confirm the effectiveness and the robustness of the proposed approach.

38 citations


Posted Content
TL;DR: A new online boosting algorithm for updating the weights of a boosted classifier is presented, which yields a closer approximation to the edges found by Freund and Schapire's AdaBoost algorithm than previous online boosting algorithms.
Abstract: We present a new online boosting algorithm for adapting the weights of a boosted classifier, which yields a closer approximation to Freund and Schapire's AdaBoost algorithm than previous online boosting algorithms. We also contribute a new way of deriving the online algorithm that ties together previous online boosting work. We assume that the weak hypotheses were selected beforehand, and only their weights are updated during online boosting. The update rule is derived by minimizing AdaBoost's loss when viewed in an incremental form. The equations show that optimization is computationally expensive. However, a fast online approximation is possible. We compare approximation error to batch AdaBoost on synthetic datasets and generalization error on face datasets and the MNIST dataset.

15 citations


Proceedings ArticleDOI
31 Mar 2008
TL;DR: The performance of two new algorithms for digit recognition based on extracted features on the performance of image's curvelet transform & achieving standard deviation and entropy of curvelet coefficients matrix in different scales & various angels are proposed.
Abstract: This paper proposes the performance of two new algorithms for digit recognition. These recognition systems are based on extracted features on the performance of image's curvelet transform & achieving standard deviation and entropy of curvelet coefficients matrix in different scales & various angels. In addition, the proposed recognition systems are obtained by using different scales information as feature vector. So, we could clarify the most important scales in aspect of having useful information .Finally by employing the Knn classifier we classify them into predefined classes. The classifier was trained and test with handwritten numeral database, MNIST The results of this test shows, that our correct recognition rate in "curvelet transform+ standard deviation" algorithm is 93% and in "curvelet transform+ entropy" algorithm is 82%.

13 citations


Book ChapterDOI
09 Sep 2008
TL;DR: When applied to handwritten digits recognition, namely in the MNIST database, the i¾?βAssociative Model exhibits competitive results against some of the most widely known algorithms currently available in scientific literature.
Abstract: In this paper we present a new model appropriate for pattern recognition tasks. This new model, called i¾?βAssociative Model, arises when taking theoretical elements from the i¾?βassociative memories, and they are merged with several new mathematical transforms. When applied to handwritten digits recognition, namely in the MNIST database, the i¾?βAssociative Model exhibits competitive results against some of the most widely known algorithms currently available in scientific literature.

Proceedings Article
01 Jan 2008
TL;DR: The discriminative structure found by the new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discrim inative (naive greedy) Bayesian network learning approach, but does so with a factor of ∼10 speedup.
Abstract: We introduce a simple empirical order-based greedy heuristic for learning discriminative Bayesian network structures. We propose two metrics for establishing the ordering of N features. They are based on the conditional mutual information. Given an ordering, we can find the discriminative classifier structure with O (Nq) score evaluations (where constant q is the maximum number of parents per node). We present classification results on the UCI repository (Merz, Murphy, & Aha 1997), for a phonetic classification task using the TIMIT database (Lamel, Kassel, & Seneff 1986), and for the MNIST handwritten digit recognition task (LeCun et al. 1998). The discriminative structure found by our new procedures significantly outperforms generatively produced structures, and achieves a classification accuracy on par with the best discriminative (naive greedy) Bayesian network learning approach, but does so with a factor of ∼10 speedup. We also show that the advantages of generative discriminatively structured Bayesian network classifiers still hold in the case of missing features.

Proceedings ArticleDOI
23 Jun 2008
TL;DR: Experiments on the MNIST and USPS datasets of handwritten digits and on a subset of the Caltech256 dataset show that, given a suitable context, DCP can achieve good results even in situation where density-based clustering techniques fail.
Abstract: We propose a new method to partition an unlabeled dataset, called Discriminative Context Partitioning (DCP). It is motivated by the idea of splitting the dataset based only on how well the resulting parts can be separated from a context class of disjoint data points. This is in contrast to typical clustering techniques like K-means that are based on a generative model by implicitly or explicitly searching for modes in the distribution of samples. The discriminative criterion in DCP avoids the problems that density based methods have when the a priori assumption of multimodality is violated, when the number of samples becomes small in relation to the dimensionality of the feature space, or if the cluster sizes are strongly unbalanced. We formulate DCPpsilas separation property as a large-margin criterion, and show how the resulting optimization problem can be solved efficiently. Experiments on the MNIST and USPS datasets of handwritten digits and on a subset of the Caltech256 dataset show that, given a suitable context, DCP can achieve good results even in situation where density-based clustering techniques fail.

Journal ArticleDOI
TL;DR: First, the distance between the unit in two training datasets, and then the samples that keep away from hyper-plane are discarded in order to compress the training dataset.

Proceedings ArticleDOI
11 Dec 2008
TL;DR: This work presents a new unsupervised learning method based on fuzzy c-means to learn sub models of a class using background samples to guide cluster split and merge operations, which results in more accurate clusters and helps to escape locally minimum solutions.
Abstract: Many data sets contain an abundance of background data or samples belonging to classes not currently under consideration. We present a new unsupervised learning method based on fuzzy c-means to learn sub models of a class using background samples to guide cluster split and merge operations. The proposed method demonstrates how background samples can be used to guide and improve the clustering process. The proposed method results in more accurate clusters and helps to escape locally minimum solutions. In addition, the number of clusters is determined for the class under consideration. The method demonstrates remarkable performance on both synthetic 2D and real world data from the MNIST dataset of hand written digits.

Journal Article
TL;DR: This paper presents a method of localized linear manifold self-organizing map, which is able to learn a set of ordered low-dimensional linear manifolds in the high-dimensional vector space, and achieves much better than other three methods in separating clusters.

Book ChapterDOI
01 Nov 2008
TL;DR: A kind of classical classifier called k-nearest neighbor rule (kNN) has been applied to many real-life problems because of its good performance and simple algorithm, but it does not perform well when the dimensionality of feature vectors is large.
Abstract: In pattern recognition, a kind of classical classifier called k-nearest neighbor rule (kNN) has been applied to many real-life problems because of its good performance and simple algorithm. In kNN, a test sample is classified by a majority vote of its k-closest training samples. This approach has the following advantages: (1) It was proved that the error rate of kNN approaches the Bayes error when both the number of training samples and the value of k are infinite (Duda et al., 2001). (2) kNN performs well even if different classes overlap each other. (3) It is easy to implement kNN due to its simple algorithm. However, kNN does not perform well when the dimensionality of feature vectors is large. As an example, Fig. 1 shows a test sample (belonging to class 5) of the MNIST dataset (LeCun et al., 1998) and its five closest training samples selected by using Euclidean distance. Because the selected five training samples include the three samples belonging to class 8, the test sample is misclassified into class 8. Such misclassification is often yielded by kNN in highdimensional pattern classification such as character and face recognition. Moreover, kNN requires a large number of training samples for high accuracy because kNN is a kind of memory-based classifiers. Consequently, the classification cost and memory requirement of kNN tend to be high.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: A simplified polynomial network (SPN) classifier is proposed that reduces the complexity ofPolynomial networks with little deterioration of classification accuracy and in experiments of handwritten digit recognition on USPS, SPN using features of 30.0 dimensions on average achieved higher classification accuracy
Abstract: Class-specific feature polynomial classifier (CFPC), a variant of a polynomial classifier (PC), yields high classification accuracy especially in high dimensional feature spaces. However, the computational cost for classification in such a high dimensional space is rather expensive. To overcome this difficulty, we propose a simplified polynomial network (SPN) classifier that reduces the complexity of polynomial networks with little deterioration of classification accuracy. In experiments of handwritten digit recognition on USPS, SPN using features of 30.0 dimensions on average achieved higher classification accuracy and a classification speed about 12.8 times faster than CFPC using features of 250 dimensions. In experiments on MNIST, SPN using features of 40.0 dimensions on average achieved a classification speed about 2.0 times faster than CFPC using features of 100 dimensions with nearly the same classification accuracy.

Book ChapterDOI
15 Sep 2008
TL;DR: An improved particle swarm optimization algorithm is proposed to train the fuzzy support vector machine (FSVM) for pattern multi-classification and results on MNIST character recognition show that the improved algorithm is feasible and effective for FSVM training.
Abstract: In this paper, an improved particle swarm optimization algorithm is proposed to train the fuzzy support vector machine (FSVM) for pattern multi-classification. In the improved algorithm, the particles studies not only from itself and the best one but also from the mean value of some other particles. In addition, adaptive mutation was introduced to reduce the rate of premature convergence. The experimental results on MNIST character recognition show that the improved algorithm is feasible and effective for FSVM training.