Rotation Invariant Digit Recognition Using Convolutional Neural Network
01 Jan 2018-pp 91-102
TL;DR: This work proposes an idea of using multiple instance of CNN to enhance the overall rotation invariant capabilities of the architecture even for higher degrees of rotation in the input image, and requires less number of images for training, and therefore reduces the training time.
Abstract: Deep learning architectures use a set of layers to learn hierarchical features from the input. The learnt features are discriminative, and thus can be used for classification tasks. Convolutional neural networks (CNNs) are one of the widely used deep learning architectures. CNN extracts prominent features from the input by passing it through the layers of convolution and nonlinear activation. These features are invariant to scaling and small amount of distortions in the input image, but they offer rotation invariance only for smaller degrees of rotation. We propose an idea of using multiple instance of CNN to enhance the overall rotation invariant capabilities of the architecture even for higher degrees of rotation in the input image. The architecture is then applied to handwritten digit classification and captcha recognition. The proposed method requires less number of images for training, and therefore reduces the training time. Moreover, our method offers an additional advantage of finding the approximate orientation of the object in an image, without any additional computational complexity.
Citations
More filters
Journal Article•
[...]
TL;DR: The proposed networks have fewer free parameters and better generalization ability than the feedforward neural networks, and outperform the conventional convolutional neural networks.
Abstract: This article addresses the problem of rotation invariant face detection using convolutional neural networks. Recently, we developed a new class of convolutional neural networks for visual pattern recognition. These networks have a simple network architecture and use shunting inhibitory neurons as the basic computing elements for feature extraction. Three networks with different connection schemes have been developed for in-plane rotation invariant face detection: fully-connected, toeplitz-connected, and binary-connected networks. The three networks are trained using a variant of Levenberg-Marquardt algorithm and tested on a set of 40,000 rotated face patterns. As a face/non-face classifier, these networks achieve 97.3% classification accuracy for a rotation angle in the range ±90° and 95.9% for full in-plane rotation. The proposed networks have fewer free parameters and better generalization ability than the feedforward neural networks, and outperform the conventional convolutional neural networks.
9 citations
[...]
TL;DR: Results reveal that the CNN-ELM-DL4J approach outperforms the conventional CNN models in terms of accuracy and computational time.
Abstract: Optical character recognition is gaining immense importance in the domain of deep learning. With each passing day, handwritten digits (0–9) data are increasing rapidly, and plenty of research has been conducted thus far. However, there is still a need to develop a robust model that can fetch useful information and investigate self-build handwritten digit data efficiently and effectively. The convolutional neural network (CNN) models incorporating a sigmoid activation function with a large number of derivatives have low efficiency in terms of feature extraction. Here, we designed a novel CNN model integrated with the extreme learning machine (ELM) algorithm. In this model, the sigmoid activation function is upgraded as the rectified linear unit (ReLU) activation function, and the CNN unit along with the ReLU activation function are used as a feature extractor. The ELM unit works as the image classifier, which makes the perfect symmetry for handwritten digit recognition. A deeplearning4j (DL4J) framework-based CNN-ELM model was developed and trained using the Modified National Institute of Standards and Technology (MNIST) database. Validation of the model was performed through self-build handwritten digits and USPS test datasets. Furthermore, we observed the variation of accuracies by adding various hidden layers in the architecture. Results reveal that the CNN-ELM-DL4J approach outperforms the conventional CNN models in terms of accuracy and computational time.
8 citations
[...]
TL;DR: This letter presents three types of rotation‐invariance learning methods and applies them to five popular CNN architectures and indicates that multi‐task learning on ResNet‐50 is the best combination.
2 citations
[...]
TL;DR: An accurate, scalable data-free approach based on eigenvectors and Convolutional Neural Networks for rotated object detection, which detects multiple objects at any angle in an image efficiently and determines actual image orientation without any prior information is proposed.
Abstract: In this paper, we propose an accurate, scalable data-free approach based on eigenvectors and Convolutional Neural Networks (CNNs) for rotated object detection. Detecting an arbitrarily diverted object poses a challenging problem, as features extracted by CNNs are variant to small changes in shift and scale. They lack in performance for images at orientation different from input data. Hence, we introduce a novel two-step architecture, which detects multiple objects at any angle in an image efficiently. We utilize eigenvector analysis on the input image based on bright pixel distribution. The vertical and horizontal vectors are used as a reference to detect the deviation of an image from the original orientation. This analysis gives four orientations of the input image which, that pass through a pre-trained YOLOv3 with proposed decision criteria. Our approach referred to as "Eigen Vectors based Rotation Invariant Multi-Object Deep Detector" (EVRI-MODD), produces rotation invariant detection without any additional training on augmented data and also determines actual image orientation without any prior information. The proposed network achieves high performance on Pascal-VOC 2012 dataset. We evaluate our network performance on three differently rotated angles, 90°, 180°, and 270°, and achieves a significant gain in accuracy by 48%, 50%, and 47% respectively, over YOLOv3.
1 citations
Cites background or methods from "Rotation Invariant Digit Recognitio..."
[...]
[...]
[...]
[...]
[...]
TL;DR: In this paper , a deep learning-based technique, namely, EfficientDet-D4, was proposed to detect and categorize the numerals into their respective classes from zero to nine, achieving an average accuracy of 99.83%.
Abstract: Handwritten digit recognition (HDR) shows a significant application in the area of information processing. However, correct recognition of such characters from images is a complicated task due to immense variations in the writing style of people. Moreover, the occurrence of several image artifacts like the existence of intensity variations, blurring, and noise complicates this process. In the proposed method, we have tried to overcome the aforementioned limitations by introducing a deep learning- (DL-) based technique, namely, EfficientDet-D4, for numeral categorization. Initially, the input images are annotated to exactly show the region of interest (ROI). In the next phase, these images are used to train the EfficientNet-B4-based EfficientDet-D4 model to detect and categorize the numerals into their respective classes from zero to nine. We have tested the proposed model over the MNIST dataset to demonstrate its efficacy and attained an average accuracy value of 99.83%. Furthermore, we have accomplished the cross-dataset evaluation on the USPS database and achieved an accuracy value of 99.10%. Both the visual and reported experimental results show that our method can accurately classify the HDR from images even with the varying writing style and under the presence of various sample artifacts like noise, blurring, chrominance, position, and size variations of numerals. Moreover, the introduced approach is capable of generalizing well to unseen cases which confirms that the EfficientDet-D4 model is an effective solution to numeral recognition.
1 citations
References
More filters
[...]
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.
34,930 citations
[...]
TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.
Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously, and show better overall reconstruction quality.
4,680 citations
[...]
TL;DR: A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations.
Abstract: This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network ( DBN ) to handle skeletal dynamics, and a 3D Convolutional Neural Network ( 3DCNN ) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.
320 citations
[...]
TL;DR: Radiologists' reading procedure was modelled in order to instruct the artificial neural network to recognize the predefined image patterns and those of interest to experts and an unconventional method of using rotation and shift invariance is proposed to enhance the neural net performance.
Abstract: We have developed several training methods in conjunction with a convolution neural network for general medical image pattern recognition. An unconventional method of using rotation and shift invariance is also proposed to enhance the neural net performance. The structure of the artificial neural network is a simplified network structure of the neocognitron. Two-dimensional local connection as a group is the fundamental architecture for the signal propagation in the convolution neural network. Weighting coefficients of convolution kernels are formed by the neural network through backpropagated training for this artificial neural net. In addition, radiologists' reading procedure was modelled in order to instruct the artificial neural network to recognize the predefined image patterns and those of interest to experts. Our training techniques involve (a) radiologists' rating for each suspected image area, (b) backpropagation of generalized distribution, (c) trainer imposed functions, (d) shift and rotation invariance of diagnosis interpretation, and (e) consistency of clinical input data using appropriate background reduction functions. We have tested these methods for detecting lung nodules on chest radiographs and microcalcications on mammograms. The performance studies have shown the potential use of this technique in a clinical environment. We also used a profile double-matching technique for initial nodule search and used a wavelet high pass filtering technique to enhance subtle clustered microcalcifications. We set searching parameters at a highly sensitive level to identify all potential disease areas. The artificial convolution neural network acts as a final detection classifier to determine whether a disease pattern is shown on the suspected image area.
238 citations
[...]
TL;DR: A deep neural network topology that incorporates a simple to implement transformationinvariant pooling operator (TI-POOLING) that is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes is presented.
Abstract: In this paper we present a deep neural network topology that incorporates a simple to implement transformationinvariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number of model parameters and more training data, and results in significantly increased training time and larger chance of under-or overfitting. The main reason for these drawbacks is that that the learned model needs to capture adequate features for all the possible transformations of the input. On the other hand, we formulate features in convolutional neural networks to be transformation-invariant. We achieve that using parallel siamese architectures for the considered transformation set and applying the TI-POOLING operator on their outputs before the fully-connected layers. We show that this topology internally finds the most optimal "canonical" instance of the input image for training and therefore limits the redundancy in learned features. This more efficient use of training data results in better performance on popular benchmark datasets with smaller number of parameters when comparing to standard convolutional neural networks with dataset augmentation and to other baselines.
208 citations
Related Papers (5)
[...]
[...]
[...]
[...]
[...]