Showing papers on "MNIST database published in 2012"

PDF

Open Access

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Posted Content•

ADADELTA: An Adaptive Learning Rate Method

[...]

Matthew D. Zeiler

22 Dec 2012-arXiv: Learning

TL;DR: A novel per-dimension learning rate method for gradient descent called ADADELTA that dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent is presented.

...read moreread less

Abstract: We present a novel per-dimension learning rate method for gradient descent called ADADELTA. The method dynamically adapts over time using only first order information and has minimal computational overhead beyond vanilla stochastic gradient descent. The method requires no manual tuning of a learning rate and appears robust to noisy gradient information, different model architecture choices, various data modalities and selection of hyperparameters. We show promising results compared to other methods on the MNIST digit classification task using a single machine and on a large scale voice dataset in a distributed cluster environment.

...read moreread less

6,189 citations

Proceedings Article•DOI•

Multi-column deep neural networks for image classification

[...]

Dan Ciresan¹, Ueli Meier¹, Jürgen Schmidhuber¹•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

16 Jun 2012

TL;DR: In this paper, a biologically plausible, wide and deep artificial neural network architectures was proposed to match human performance on tasks such as the recognition of handwritten digits or traffic signs, achieving near-human performance.

...read moreread less

Abstract: Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible, wide and deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.

...read moreread less

3,717 citations

Journal Article•DOI•

The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]

[...]

Li Deng¹•Institutions (1)

Microsoft¹

18 Oct 2012-IEEE Signal Processing Magazine

TL;DR: “Best of the Web” presents the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in optical character recognition and machine learning research.

...read moreread less

1,626 citations

Journal Article•

The MNIST Database of Handwritten Digit Images for Machine Learning Research

[...]

Li Deng

01 Nov 2012-IEEE Signal Processing Magazine

TL;DR: In this article, the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in optical character recognition and machine learning research, are presented.

...read moreread less

Abstract: In this issue, “Best of the Web” presents the modified National Institute of Standards and Technology (MNIST) resources, consisting of a collection of handwritten digit images used extensively in optical character recognition and machine learning research. Handwritten digit recognition is an important problem in optical character recognition, and it has been used as a test case for theories of pattern recognition and machine learning algorithms for many years. Historically, to promote machine learning and pattern recognition research, several standard databases have emerged in which the handwritten digits are preprocessed, including segmentation and normalization, so that researchers can compare recognition results of their techniques on a common basis. The freely available MNIST database of handwritten digits has become a standard for fast-testing machine learning algorithms for this purpose. The simplicity of this task is analogous to the TIDigit (a speech database created by Texas Instruments) task in speech recognition. Just like there is a long list for more complex speech recognition tasks, there are many more difficult and challenging tasks for image recognition and computer vision, which will not be addressed in this column.

...read moreread less

1,466 citations

Posted Content•

Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms

[...]

Chris Thornton¹, Frank Hutter¹, Holger H. Hoos¹, Kevin Leyton-Brown¹•Institutions (1)

University of British Columbia¹

18 Aug 2012-arXiv: Learning

TL;DR: This work considers the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that attacks these issues separately and shows classification performance often much better than using standard selection and hyperparameter optimization methods.

...read moreread less

Abstract: Many different machine learning algorithms exist; taking into account each algorithm's hyperparameters, there is a staggeringly large number of possible alternatives overall. We consider the problem of simultaneously selecting a learning algorithm and setting its hyperparameters, going beyond previous work that addresses these issues in isolation. We show that this problem can be addressed by a fully automated approach, leveraging recent innovations in Bayesian optimization. Specifically, we consider a wide range of feature selection techniques (combining 3 search and 8 evaluator methods) and all classification approaches implemented in WEKA, spanning 2 ensemble methods, 10 meta-methods, 27 base classifiers, and hyperparameter settings for each classifier. On each of 21 popular datasets from the UCI repository, the KDD Cup 09, variants of the MNIST dataset and CIFAR-10, we show classification performance often much better than using standard selection/hyperparameter optimization methods. We hope that our approach will help non-expert users to more effectively identify machine learning algorithms and hyperparameter settings appropriate to their applications, and hence to achieve improved performance.

...read moreread less

1,004 citations

Journal Article•DOI•

A novel hybrid CNN-SVM classifier for recognizing handwritten digits

[...]

Xiao-Xiao Niu¹, Ching Y. Suen¹•Institutions (1)

Concordia University Wisconsin¹

01 Apr 2012-Pattern Recognition

TL;DR: A hybrid model of integrating the synergy of two superior classifiers: Convolutional Neural Network (CNN) and Support Vector Machine (SVM) which have proven results in recognizing different types of patterns is presented.

...read moreread less

585 citations

Proceedings Article•

Hamming Distance Metric Learning

[...]

Mohammad Norouzi¹, David J. Fleet¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: A new loss-augmented inference algorithm that is quadratic in the code length and inspired by latent structural SVMs is developed, showing strong retrieval performance on CIFAR-10 and MNIST, with promising classification results using no more than kNN on the binary codes.

...read moreread less

Abstract: Motivated by large-scale multimedia applications we propose to learn mappings from high-dimensional data to binary codes that preserve semantic similarity. Binary codes are well suited to large-scale applications as they are storage efficient and permit exact sub-linear kNN search. The framework is applicable to broad families of mappings, and uses a flexible form of triplet ranking loss. We overcome discontinuous optimization of the discrete mappings by minimizing a piecewise-smooth upper bound on empirical loss, inspired by latent structural SVMs. We develop a new loss-augmented inference algorithm that is quadratic in the code length. We show strong retrieval performance on CIFAR-10 and MNIST, with promising classification results using no more than kNN on the binary codes.

...read moreread less

562 citations

Journal Article•DOI•

An efficient learning procedure for deep boltzmann machines

[...]

Ruslan Salakhutdinov¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

01 Aug 2012-Neural Computation

TL;DR: A new learning algorithm for Boltzmann machines that contain many layers of hidden variables is presented and results on the MNIST and NORB data sets are presented showing that deep BoltZmann machines learn very good generative models of handwritten digits and 3D objects.

...read moreread less

Abstract: We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables. Data-dependent statistics are estimated using a variational approximation that tends to focus on a single mode, and data-independent statistics are estimated using persistent Markov chains. The use of two quite different techniques for estimating the two types of statistic that enter into the gradient of the log likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters. The learning can be made more efficient by using a layer-by-layer pretraining phase that initializes the weights sensibly. The pretraining also allows the variational inference to be initialized sensibly with a single bottom-up pass. We present results on the MNIST and NORB data sets showing that deep Boltzmann machines learn very good generative models of handwritten digits and 3D objects. We also show that the features discovered by deep Boltzmann machines are a very effective way to initialize the hidden layers of feedforward neural nets, which are then discriminatively fine-tuned.

...read moreread less

463 citations

Proceedings Article•DOI•

Scalable stacking and learning for building deep architectures

[...]

Li Deng¹, Dong Yu¹, John Platt¹•Institutions (1)

Microsoft¹

25 Mar 2012

TL;DR: The Deep Stacking Network (DSN) is presented, which overcomes the problem of parallelizing learning algorithms for deep architectures and provides a method of stacking simple processing modules in buiding deep architectures, with a convex learning problem in each module.

...read moreread less

Abstract: Deep Neural Networks (DNNs) have shown remarkable success in pattern recognition tasks. However, parallelizing DNN training across computers has been difficult. We present the Deep Stacking Network (DSN), which overcomes the problem of parallelizing learning algorithms for deep architectures. The DSN provides a method of stacking simple processing modules in buiding deep architectures, with a convex learning problem in each module. Additional fine tuning further improves the DSN, while introducing minor non-convexity. Full learning in the DSN is batch-mode, making it amenable to parallel training over many machines and thus be scalable over the potentially huge size of the training data. Experimental results on both the MNIST (image) and TIMIT (speech) classification tasks demonstrate that the DSN learning algorithm developed in this work is not only parallelizable in implementation but it also attains higher classification accuracy than the DNN.

...read moreread less

208 citations

Journal Article•DOI•

Hybrid Linear Modeling via Local Best-Fit Flats

[...]

Teng Zhang¹, Arthur Szlam², Yi Wang¹, Gilad Lerman¹•Institutions (2)

University of Minnesota¹, Courant Institute of Mathematical Sciences²

01 Dec 2012-International Journal of Computer Vision

TL;DR: This work presents a simple and fast geometric method for modeling data by a union of affine subspaces, and gives extensive experimental evidence demonstrating the state of the art accuracy and speed of the suggested algorithms on these problems.

...read moreread less

Abstract: We present a simple and fast geometric method for modeling data by a union of affine subspaces. The method begins by forming a collection of local best-fit affine subspaces, i.e., subspaces approximating the data in local neighborhoods. The correct sizes of the local neighborhoods are determined automatically by the Jones' β 2 numbers (we prove under certain geometric conditions that our method finds the optimal local neighborhoods). The collection of subspaces is further processed by a greedy selection procedure or a spectral method to generate the final model. We discuss applications to tracking-based motion segmentation and clustering of faces under different illuminating conditions. We give extensive experimental evidence demonstrating the state of the art accuracy and speed of the suggested algorithms on these problems and also on synthetic hybrid linear data as well as the MNIST handwritten digits data; and we demonstrate how to use our algorithms for fast determination of the number of affine subspaces.

...read moreread less

Posted Content•

Learning Invariant Representations with Local Transformations

[...]

Kihyuk Sohn¹, Honglak Lee¹•Institutions (1)

University of Michigan¹

27 Jun 2012-arXiv: Learning

TL;DR: This paper presents the transformation-invariant restricted Boltzmann machine that compactly represents data by its weights and their transformations, which achieves invariance of the feature representation via probabilistic max pooling.

...read moreread less

Abstract: Learning invariant representations is an important problem in machine learning and pattern recognition. In this paper, we present a novel framework of transformation-invariant feature learning by incorporating linear transformations into the feature learning algorithms. For example, we present the transformation-invariant restricted Boltzmann machine that compactly represents data by its weights and their transformations, which achieves invariance of the feature representation via probabilistic max pooling. In addition, we show that our transformation-invariant feature learning framework can also be extended to other unsupervised learning methods, such as autoencoders or sparse coding. We evaluate our method on several image classification benchmark datasets, such as MNIST variations, CIFAR-10, and STL-10, and show competitive or superior classification performance when compared to the state-of-the-art. Furthermore, our method achieves state-of-the-art performance on phone classification tasks with the TIMIT dataset, which demonstrates wide applicability of our proposed algorithms to other domains.

...read moreread less

Proceedings Article•

A Better Way to Pretrain Deep Boltzmann Machines

[...]

Geoffrey E. Hinton¹, Ruslan Salakhutdinov¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: A different method of pretraining DBMs is developed that distributes the modelling work more evenly over the hidden layers and demonstrates that the new pretraining algorithm allows us to learn better generative models.

...read moreread less

Abstract: We describe how the pretraining algorithm for Deep Boltzmann Machines (DBMs) is related to the pretraining algorithm for Deep Belief Networks and we show that under certain conditions, the pretraining procedure improves the variational lower bound of a two-hidden-layer DBM. Based on this analysis, we develop a different method of pretraining DBMs that distributes the modelling work more evenly over the hidden layers. Our results on the MNIST and NORB datasets demonstrate that the new pretraining algorithm allows us to learn better generative models.

...read moreread less

Proceedings Article•

One-Shot Learning with a Hierarchical Nonparametric Bayesian Model

[...]

Ruslan Salakhutdinov¹, Josh Tenenbaum², Antonio Torralba²•Institutions (2)

University of Toronto¹, Massachusetts Institute of Technology²

27 Jun 2012

TL;DR: A hierarchical Bayesian model that learns categories from single training examples that transfers acquired knowledge from previously learned categories to a novel category, in the form of a prior over category means and variances is developed.

...read moreread less

Abstract: We develop a hierarchical Bayesian model that learns categories from single training examples. The model transfers acquired knowledge from previously learned categories to a novel category, in the form of a prior over category means and variances. The model discovers how to group categories into meaningful super-categories that express different priors for new classes. Given a single example of a novel category, we can efficiently infer which super-category the novel category belongs to, and thereby estimate not only the new category's mean but also an appropriate similarity metric based on parameters inherited from the super-category. On MNIST and MSR Cambridge image datasets the model learns useful representations of novel categories based on just a single training example, and performs significantly better than simpler hierarchical Bayesian approaches. It can also discover new categories in a completely unsupervised fashion, given just one or a few examples.

...read moreread less

Journal Article•DOI•

Efficient and effective algorithms for training single-hidden-layer neural networks

[...]

Dong Yu¹, Li Deng¹•Institutions (1)

Microsoft¹

01 Apr 2012-Pattern Recognition Letters

TL;DR: Experiments show that the algorithms proposed in this paper obtain significantly better classification accuracy than ELM when the same number of hidden units is used, and at the expense of 5 times or less the training cost incurred by the ELM training.

...read moreread less

Posted Content•

Multi-column Deep Neural Networks for Image Classification

[...]

Dan Ciresan¹, Ueli Meier¹, Jürgen Schmidhuber¹•Institutions (1)

Dalle Molle Institute for Artificial Intelligence Research¹

13 Feb 2012-arXiv: Computer Vision and Pattern Recognition

TL;DR: On the very competitive MNIST handwriting benchmark, this method is the first to achieve near-human performance and improves the state-of-the-art on a plethora of common image classification benchmarks.

...read moreread less

Abstract: Traditional methods of computer vision and machine learning cannot match human performance on tasks such as the recognition of handwritten digits or traffic signs. Our biologically plausible deep artificial neural network architectures can. Small (often minimal) receptive fields of convolutional winner-take-all neurons yield large network depth, resulting in roughly as many sparsely connected neural layers as found in mammals between retina and visual cortex. Only winner neurons are trained. Several deep neural columns become experts on inputs preprocessed in different ways; their predictions are averaged. Graphics cards allow for fast training. On the very competitive MNIST handwriting benchmark, our method is the first to achieve near-human performance. On a traffic sign recognition benchmark it outperforms humans by a factor of two. We also improve the state-of-the-art on a plethora of common image classification benchmarks.

...read moreread less

Journal Article•DOI•

A Folded Neural Network Autoencoder for Dimensionality Reduction

[...]

Jing Wang¹, Haibo He¹, Danil V. Prokhorov²•Institutions (2)

University of Rhode Island¹, Toyota²

01 Jan 2012-Procedia Computer Science

TL;DR: A new structure, folded autoencoder based on symmetric structure of conventional autoen coder, for dimensionality reduction, which reduces the number of weights to be tuned and thus reduces the computational cost.

...read moreread less

Proceedings Article•DOI•

An introduction to deep learning

[...]

Francis Quintal Lauzon¹•Institutions (1)

École Normale Supérieure¹

02 Jul 2012

TL;DR: It is demonstrated that this kind of representation coupled to a SVM improves classification error on MNIST over the usual deep learning approach where a logistic regression layer is added to the stack of denoising autoencoders.

...read moreread less

Abstract: Deep learning allows automatically learning multiple levels of representations of the underlying distribution of the data to be modeled. In this work, a specific implementation called stacked denoising autoencoders is explored. We contribute by demonstrating that this kind of representation coupled to a SVM improves classification error on MNIST over the usual deep learning approach where a logistic regression layer is added to the stack of denoising autoencoders.

...read moreread less

Book Chapter•DOI•

Deep Big Multilayer Perceptrons for Digit Recognition

[...]

Dan Ciresan¹, Dan Ciresan², Ueli Meier¹, Ueli Meier², Luca Maria Gambardella², Luca Maria Gambardella¹, Jürgen Schmidhuber², Jürgen Schmidhuber¹ - Show less +4 more•Institutions (2)

University of Lugano¹, Dalle Molle Institute for Artificial Intelligence Research²

01 Jan 2012

TL;DR: All you need to achieve this until 2011 best result are many hidden layers, many neurons per layer, numerous deformed training images to avoid overfitting, and graphics cards to greatly speed up learning.

...read moreread less

Abstract: The competitive MNIST handwritten digit recognition benchmark has a long history of broken records since 1998. The most recent advancement by others dates back 8 years (error rate 0.4 old on-line back-propagation for plain multi-layer perceptrons yields a very low 0.35% error rate on the MNIST handwritten digits benchmark with a single MLP and 0.31% with a committee of seven MLP. All we need to achieve this until 2011 best result are many hidden layers, many neurons per layer, numerous deformed training images to avoid overfitting, and graphics cards to greatly speed up learning.

...read moreread less

Journal Article•DOI•

Investigation of efficient features for image recognition by neural networks

[...]

Alexander Goltsev¹, Vladimir Gritsenko¹•Institutions (1)

National Academy of Sciences of Ukraine¹

01 Apr 2012-Neural Networks

TL;DR: Experimental comparison between the LiRA perceptron and the modular assembly neural network is accomplished, which shows that recognition capability of the modules is somewhat better.

...read moreread less

Proceedings Article•DOI•

K-means implementation on FPGA for high-dimensional data using triangle inequality

[...]

Zhongduo Lin¹, Charles Lo¹, Paul Chow¹•Institutions (1)

University of Toronto¹

25 Oct 2012

TL;DR: This paper proposes a hardware architecture for K-means with triangle inequality optimization on FPGA, an optimal 8-bit square calculator for 6-LUT architectures is described to minimize the hardware cost and an approximation solution is proposed to avoid square root calculation in the original Triangle inequality optimization.

...read moreread less

Abstract: One of the challenges to data mining raised by technology development is that both data size and dimensionality is growing rapidly. K-means, one of the most popular clustering algorithms in data mining, suffers in computational time when used for large data sets and data with high dimensionality. In this paper, we propose a hardware architecture for K-means with triangle inequality optimization on FPGA. An optimal 8-bit square calculator for 6-LUT architectures is described to minimize the hardware cost and an approximation solution is proposed to avoid square root calculation in the original triangle inequality optimization. Our software and hardware experiments are tested with the MNIST benchmark and uniform random numbers of various size. This approximation results in 2% more distance calculations for MNIST and 5% for uniform random numbers than the original optimization. Compared to the baseline hardware system without optimization, our approach achieves up to 77% improvement in processing time with about 10% logic overhead. We demonstrate that the hardware can achieve 55-fold speed up compared to software for the 1024 MNIST.

...read moreread less

Journal Article•DOI•

Latent Log-Linear Models for Handwritten Digit Classification

[...]

Thomas Deselaers¹, Tobias Gass², Georg Heigold¹, Hermann Ney³•Institutions (3)

Google¹, ETH Zurich², RWTH Aachen University³

01 Jun 2012-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, an extension of log-linear models incorporating latent variables is proposed, which can be used for image deformation-aware loglinear models, which are fully discriminative, can be trained efficiently and can be controlled.

...read moreread less

Abstract: We present latent log-linear models, an extension of log-linear models incorporating latent variables, and we propose two applications thereof: log-linear mixture models and image deformation-aware log-linear models. The resulting models are fully discriminative, can be trained efficiently, and the model complexity can be controlled. Log-linear mixture models offer additional flexibility within the log-linear modeling framework. Unlike previous approaches, the image deformation-aware model directly considers image deformations and allows for a discriminative training of the deformation parameters. Both are trained using alternating optimization. For certain variants, convergence to a stationary point is guaranteed and, in practice, even variants without this guarantee converge and find models that perform well. We tune the methods on the USPS data set and evaluate on the MNIST data set, demonstrating the generalization capabilities of our proposed models. Our models, although using significantly fewer parameters, are able to obtain competitive results with models proposed in the literature.

...read moreread less

Book Chapter•DOI•

Training restricted boltzmann machines with multi-tempering: harnessing parallelization

[...]

Philemon Brakel¹, Sander Dieleman¹, Benjamin Schrauwen¹•Institutions (1)

Ghent University¹

11 Sep 2012

TL;DR: Two extensions of the parallel tempering algorithm are looked at, which is a Markov Chain Monte Carlo method to approximate the likelihood gradient, directed at a more effective exchange of information among the parallel sampling chains.

...read moreread less

Abstract: Restricted Boltzmann Machines (RBM's) are unsupervised probabilistic neural networks that can be stacked to form Deep Belief Networks. Given the recent popularity of RBM's and the increasing availability of parallel computing architectures, it becomes interesting to investigate learning algorithms for RBM's that benefit from parallel computations. In this paper, we look at two extensions of the parallel tempering algorithm, which is a Markov Chain Monte Carlo method to approximate the likelihood gradient. The first extension is directed at a more effective exchange of information among the parallel sampling chains. The second extension estimates gradients by averaging over chains from different temperatures. We investigate the efficiency of the proposed methods and demonstrate their usefulness on the MNIST dataset. Especially the weighted averaging seems to benefit Maximum Likelihood learning.

...read moreread less

Journal Article•

Spaun: A Perception-Cognition-Action Model Using Spiking Neurons

[...]

Terrence C. Stewart, Feng-Xuan Choo, Chris Eliasmith

01 Jan 2012-Cognitive Science

TL;DR: An overview of the Semantic Pointer Architecture: Unified Network (Spaun) model is presented and it is demonstrated that this biologically plausible spiking neuron model has the following features: Task Flexibility: No changes are made to the model between tasks.

...read moreread less

Proceedings Article•DOI•

Handwritten Character Classification using the Hotspot Feature Extraction Technique

[...]

Olarik Surinta, Lambertus Schomaker, Marco A. Wiering

01 Jan 2012

TL;DR: This study aims to investigate the novel feature extraction technique called the hotspot technique in order to use it for representing handwritten characters and digits and revealed that the hotspots technique provides the largest average classification rates.

...read moreread less

Abstract: Feature extraction techniques can be important in character recognition, because they can enhance the efficacy of recognition in comparison to featureless or pixel-based approaches. This study aims to investigate the novel feature extraction technique called the hotspot technique in order to use it for representing handwritten characters and digits. In the hotspot technique, the distance values between the closest black pixels and the hotspots in each direction are used as representation for a character. The hotspot technique is applied to three data sets including Thai handwritten characters (65 classes), Bangla numeric (10 classes), and MNIST (10 classes). The hotspot technique consists of two parameters including the number of hotspots and the number of chain code directions. The data sets are then classified by the k-Nearest Neighbors algorithm using the Euclidean distance as function for computing distances between data points. In this study, the classification rates obtained from the hotspot, mark direction, and direction of chain code techniques are compared. The results revealed that the hotspot technique provides the largest average classification rates.

...read moreread less

Proceedings Article•DOI•

Efficient discriminative learning of parametric nearest neighbor classifiers

[...]

Ziming Zhang¹, Paul Sturgess¹, Sunando Sengupta¹, Nigel T. Crook¹, Philip H. S. Torr¹ - Show less +1 more•Institutions (1)

Oxford Brookes University¹

16 Jun 2012

TL;DR: A novel local classifier, Parametric Nearest Neighbor (P-NN) and its extension Ensemble of P-NN (EP-NN), which parameterize the nearest neighbor algorithm based on the minimum weighted squared Euclidean distances between the data points and the prototypes.

...read moreread less

Abstract: Linear SVMs are efficient in both training and testing, however the data in real applications is rarely linearly separable. Non-linear kernel SVMs are too computationally intensive for applications with large-scale data sets. Recently locally linear classifiers have gained popularity due to their efficiency whilst remaining competitive with kernel methods. The vanilla nearest neighbor algorithm is one of the simplest locally linear classifiers, but it lacks robustness due to the noise often present in real-world data. In this paper, we introduce a novel local classifier, Parametric Nearest Neighbor (P-NN) and its extension Ensemble of P-NN (EP-NN). We parameterize the nearest neighbor algorithm based on the minimum weighted squared Euclidean distances between the data points and the prototypes, where a prototype is represented by a locally linear combination of some data points. Meanwhile, our method attempts to jointly learn both the prototypes and the classifier parameters discriminatively via max-margin. This makes our classifiers suitable to approximate the classification decision boundaries locally based on nonlinear functions. During testing, the computational complexity of both classifiers is linear in the product of the dimension of data and the number of prototypes. Our classification results on MNIST, USPS, LETTER, and Chars 74K are comparable and in some cases are better than many other methods such as the state-of-the-art locally linear classifiers.

...read moreread less

Book Chapter•DOI•

Online handwriting recognition using multi convolution neural networks

[...]

Dũng Viṿt Phạm¹•Institutions (1)

Vietnam Maritime University¹

16 Dec 2012

TL;DR: A library written by C# language for the online handwriting recognition system using UNIPEN-online handwritten training set and a proposed handwriting segmentation algorithm is carried out which can extract sentences, words and characters from handwritten text.

...read moreread less

Abstract: This paper presents a library written by C# language for the online handwriting recognition system using UNIPEN-online handwritten training set. The recognition engine based on convolution neural networks and yields recognition rates to 99% to MNIST training set, 97% to UNIPEN's digit training set (1a), 89% to a collection of 44022 capital letters and digits (1a,1b) and 89% to lower case letters (1c). These networks are combined to create a larger system which can recognize 62 English characters and digits. A proposed handwriting segmentation algorithm is carried out which can extract sentences, words and characters from handwritten text. The characters then are given as the input to the network.

...read moreread less

Proceedings Article•

Cascaded heterogeneous convolutional neural networks for handwritten digit recognition

[...]

Chunpeng Wu¹, Wei Fan¹, Yuan He¹, Jun Sun¹, Satoshi Naoi¹ - Show less +1 more•Institutions (1)

Fujitsu¹

01 Nov 2012

TL;DR: This paper presents a handwritten digit recognition method based on cascaded heterogeneous convolutional neural networks (CNNs) that achieves an error rate 0.23% using only 5 C-NNs, on par with human vision system.

...read moreread less

Abstract: This paper presents a handwritten digit recognition method based on cascaded heterogeneous convolutional neural networks (CNNs). The reliability and complementation of heterogeneous CNNs are investigated in our method. Each CNN recognizes a proportion of input samples with high-confidence, and feeds the rejected samples into the next CNN. The samples rejected by the last CNN are recognized by a voting committee of all CNNs. Experiments on MNIST dataset show that our method achieves an error rate 0.23% using only 5 C-NNs, on par with human vision system. Using heterogeneous networks can reduce the number of CNNs needed to reach certain performance compared with networks built from the same type. Further improvements include fine-tuning the rejection threshold of each CNN and adding CNNs of more types.

...read moreread less

Proceedings Article•DOI•

Pattern recognition computation in a spiking neural network with temporal encoding and learning

[...]

Qiang Yu¹, Kay Chen Tan¹, Huajin Tang²•Institutions (2)

National University of Singapore¹, Agency for Science, Technology and Research²

10 Jun 2012

TL;DR: A spiking neural network of integrate-and-fire neurons to perform pattern recognition is presented and the synaptic dynamics is shown to be compatible with many experimental observations on induction of long-term modifications, like spike-timing-dependent plasticity (STDP).

...read moreread less

Abstract: Many conventional methods have been widely studied to solve the pattern recognition task, but most of them lack the biological plausibility. This paper presents a spiking neural network of integrate-and-fire neurons to perform pattern recognition. A biologically plausible supervised synaptic learning rule is used so that neurons can efficiently make a decision. The whole system contains encoding, learning and readout. It can classify complex patterns of activities stored in a vector, as well as the real-world stimuli. We test the performance of the network with digital images from the MNIST and images of alphabetic letters. It turns out to be able to classify these patterns correctly. In addition, the synaptic dynamics is shown to be compatible with many experimental observations on induction of long-term modifications, like spike-timing-dependent plasticity (STDP).

...read moreread less

Book Chapter•DOI•

Modified Chain Code Histogram Feature for Handwritten Character Recognition

[...]

Jitendra Jain¹, Soyuj Kumar Sahoo¹, S. R. Mahadeva Prasanna¹, G. Siva Reddy¹•Institutions (1)

Indian Institute of Technology Guwahati¹

02 Jan 2012

TL;DR: An improved recognition rate is observed at higher end by using combination of both the features which shows the effectiveness of dynamic directional feature in the classification of handwritten character patterns.

...read moreread less

Abstract: In this work, we have proposed modified chain code histogram (CCH) based feature extraction method for handwritten character recognition (HCR) applications. This modified approach explores the dynamic nature of directional information, available in character patterns, by introducing the Differential CCH which is termed as Delta (Δ) CCH. A comparable and higher recognition rate is reported which emphasizes that the dynamic nature of directional information captured by the ΔCCH is as important as that of CCH. All the experiments are conducted on MNIST handwritten numeral database. Finally, an improved recognition rate is observed at higher end by using combination of both the features which shows the effectiveness of dynamic directional feature in the classification of handwritten character patterns.

...read moreread less