scispace - formally typeset
Search or ask a question
Book ChapterDOI

Rotation Invariant Digit Recognition Using Convolutional Neural Network

TL;DR: This work proposes an idea of using multiple instance of CNN to enhance the overall rotation invariant capabilities of the architecture even for higher degrees of rotation in the input image, and requires less number of images for training, and therefore reduces the training time.
Abstract: Deep learning architectures use a set of layers to learn hierarchical features from the input. The learnt features are discriminative, and thus can be used for classification tasks. Convolutional neural networks (CNNs) are one of the widely used deep learning architectures. CNN extracts prominent features from the input by passing it through the layers of convolution and nonlinear activation. These features are invariant to scaling and small amount of distortions in the input image, but they offer rotation invariance only for smaller degrees of rotation. We propose an idea of using multiple instance of CNN to enhance the overall rotation invariant capabilities of the architecture even for higher degrees of rotation in the input image. The architecture is then applied to handwritten digit classification and captcha recognition. The proposed method requires less number of images for training, and therefore reduces the training time. Moreover, our method offers an additional advantage of finding the approximate orientation of the object in an image, without any additional computational complexity.
Citations
More filters
Journal ArticleDOI
21 Oct 2020-Symmetry
TL;DR: Results reveal that the CNN-ELM-DL4J approach outperforms the conventional CNN models in terms of accuracy and computational time.
Abstract: Optical character recognition is gaining immense importance in the domain of deep learning. With each passing day, handwritten digits (0–9) data are increasing rapidly, and plenty of research has been conducted thus far. However, there is still a need to develop a robust model that can fetch useful information and investigate self-build handwritten digit data efficiently and effectively. The convolutional neural network (CNN) models incorporating a sigmoid activation function with a large number of derivatives have low efficiency in terms of feature extraction. Here, we designed a novel CNN model integrated with the extreme learning machine (ELM) algorithm. In this model, the sigmoid activation function is upgraded as the rectified linear unit (ReLU) activation function, and the CNN unit along with the ReLU activation function are used as a feature extractor. The ELM unit works as the image classifier, which makes the perfect symmetry for handwritten digit recognition. A deeplearning4j (DL4J) framework-based CNN-ELM model was developed and trained using the Modified National Institute of Standards and Technology (MNIST) database. Validation of the model was performed through self-build handwritten digits and USPS test datasets. Furthermore, we observed the variation of accuracies by adding various hidden layers in the architecture. Results reveal that the CNN-ELM-DL4J approach outperforms the conventional CNN models in terms of accuracy and computational time.

21 citations

Journal Article
TL;DR: The proposed networks have fewer free parameters and better generalization ability than the feedforward neural networks, and outperform the conventional convolutional neural networks.
Abstract: This article addresses the problem of rotation invariant face detection using convolutional neural networks. Recently, we developed a new class of convolutional neural networks for visual pattern recognition. These networks have a simple network architecture and use shunting inhibitory neurons as the basic computing elements for feature extraction. Three networks with different connection schemes have been developed for in-plane rotation invariant face detection: fully-connected, toeplitz-connected, and binary-connected networks. The three networks are trained using a variant of Levenberg-Marquardt algorithm and tested on a set of 40,000 rotated face patterns. As a face/non-face classifier, these networks achieve 97.3% classification accuracy for a rotation angle in the range ±90° and 95.9% for full in-plane rotation. The proposed networks have fewer free parameters and better generalization ability than the feedforward neural networks, and outperform the conventional convolutional neural networks.

9 citations

Journal ArticleDOI
TL;DR: This letter presents three types of rotation‐invariance learning methods and applies them to five popular CNN architectures and indicates that multi‐task learning on ResNet‐50 is the best combination.

5 citations

Proceedings ArticleDOI
25 Mar 2022
TL;DR: In this article , an R-CNN is used for object detection in which there is a single object in the whole image, and the part filters in a DPM it is a deformable part-based model.
Abstract: Dynamic Multiple object detection is significantly a difficult task in object detection. In this paper, we’ve done object detection in which there is a single object in the whole image. The CNN network has taken an image and has given the output of two things (i.e.) The class or category of the objects like a dog, person, etc. And the second is the bounding box coordinates and the detection of multiple objects in a single frame, the existing methods do not perform well. An R-CNN is a region-based convolutional network it has achieved success in region-based feature extraction, and the part filters in a DPM it is a deformable part-based model that is very suitable for detecting hidden objects.

2 citations

Proceedings ArticleDOI
01 Feb 2020
TL;DR: An accurate, scalable data-free approach based on eigenvectors and Convolutional Neural Networks for rotated object detection, which detects multiple objects at any angle in an image efficiently and determines actual image orientation without any prior information is proposed.
Abstract: In this paper, we propose an accurate, scalable data-free approach based on eigenvectors and Convolutional Neural Networks (CNNs) for rotated object detection. Detecting an arbitrarily diverted object poses a challenging problem, as features extracted by CNNs are variant to small changes in shift and scale. They lack in performance for images at orientation different from input data. Hence, we introduce a novel two-step architecture, which detects multiple objects at any angle in an image efficiently. We utilize eigenvector analysis on the input image based on bright pixel distribution. The vertical and horizontal vectors are used as a reference to detect the deviation of an image from the original orientation. This analysis gives four orientations of the input image which, that pass through a pre-trained YOLOv3 with proposed decision criteria. Our approach referred to as "Eigen Vectors based Rotation Invariant Multi-Object Deep Detector" (EVRI-MODD), produces rotation invariant detection without any additional training on augmented data and also determines actual image orientation without any prior information. The proposed network achieves high performance on Pascal-VOC 2012 dataset. We evaluate our network performance on three differently rotated angles, 90°, 180°, and 270°, and achieves a significant gain in accuracy by 48%, 50%, and 47% respectively, over YOLOv3.

1 citations


Cites background or methods from "Rotation Invariant Digit Recognitio..."

  • ...Motivated from the work by [12], multiple orientations of the same image are fed to YOLOv3....

    [...]

  • ...In recent works like RIMCNNN [12], the image is fed into the network for N times, where N depends on the degree of rotation....

    [...]

  • ...Rotational Invariance using Multiple instance of Convolutional Neural Network (RIMCNN) [12] proposes the idea of using same trained CNN multiple times....

    [...]

  • ...Further, if two or more classes have maximum likelihood, then select the one with maximum objectness score [12]....

    [...]

References
More filters
Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.
Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously, and show better overall reconstruction quality.

6,122 citations

Journal ArticleDOI
TL;DR: A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations.
Abstract: This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network ( DBN ) to handle skeletal dynamics, and a 3D Convolutional Neural Network ( 3DCNN ) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.

401 citations

Journal ArticleDOI
TL;DR: Radiologists' reading procedure was modelled in order to instruct the artificial neural network to recognize the predefined image patterns and those of interest to experts and an unconventional method of using rotation and shift invariance is proposed to enhance the neural net performance.

291 citations

Proceedings ArticleDOI
27 Jun 2016
TL;DR: A deep neural network topology that incorporates a simple to implement transformationinvariant pooling operator (TI-POOLING) that is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes is presented.
Abstract: In this paper we present a deep neural network topology that incorporates a simple to implement transformationinvariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number of model parameters and more training data, and results in significantly increased training time and larger chance of under-or overfitting. The main reason for these drawbacks is that that the learned model needs to capture adequate features for all the possible transformations of the input. On the other hand, we formulate features in convolutional neural networks to be transformation-invariant. We achieve that using parallel siamese architectures for the considered transformation set and applying the TI-POOLING operator on their outputs before the fully-connected layers. We show that this topology internally finds the most optimal "canonical" instance of the input image for training and therefore limits the redundancy in learned features. This more efficient use of training data results in better performance on popular benchmark datasets with smaller number of parameters when comparing to standard convolutional neural networks with dataset augmentation and to other baselines.

272 citations