Rotation Invariant Digit Recognition Using Convolutional Neural Network

doi:10.1007/978-981-10-7895-8_8

Home
/
Papers
/
Rotation Invariant Digit Recognition Using Convolutional Neural Network

Book Chapter•DOI•

Rotation Invariant Digit Recognition Using Convolutional Neural Network

Ayushi Jain¹, Gorthi R. K. Sai Subrahmanyam², Deepak Mishra¹•Institutions (2)

Indian Institute of Space Science and Technology¹, Indian Institutes of Technology²

01 Jan 2018-pp 91-102

TL;DR: This work proposes an idea of using multiple instance of CNN to enhance the overall rotation invariant capabilities of the architecture even for higher degrees of rotation in the input image, and requires less number of images for training, and therefore reduces the training time.

read less

Abstract: Deep learning architectures use a set of layers to learn hierarchical features from the input. The learnt features are discriminative, and thus can be used for classification tasks. Convolutional neural networks (CNNs) are one of the widely used deep learning architectures. CNN extracts prominent features from the input by passing it through the layers of convolution and nonlinear activation. These features are invariant to scaling and small amount of distortions in the input image, but they offer rotation invariance only for smaller degrees of rotation. We propose an idea of using multiple instance of CNN to enhance the overall rotation invariant capabilities of the architecture even for higher degrees of rotation in the input image. The architecture is then applied to handwritten digit classification and captcha recognition. The proposed method requires less number of images for training, and therefore reduces the training time. Moreover, our method offers an additional advantage of finding the approximate orientation of the object in an image, without any additional computational complexity.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

An Effective and Improved CNN-ELM Classifier for Handwritten Digits Recognition and Classification

[...]

Saqib Ali, Jianqiang Li, Yan Pei, Muhammad Saqlain Aslam, Zeeshan Shaukat, Muhammad Azeem - Show less +2 more

21 Oct 2020-Symmetry

TL;DR: Results reveal that the CNN-ELM-DL4J approach outperforms the conventional CNN models in terms of accuracy and computational time.

...read moreread less

Abstract: Optical character recognition is gaining immense importance in the domain of deep learning. With each passing day, handwritten digits (0–9) data are increasing rapidly, and plenty of research has been conducted thus far. However, there is still a need to develop a robust model that can fetch useful information and investigate self-build handwritten digit data efficiently and effectively. The convolutional neural network (CNN) models incorporating a sigmoid activation function with a large number of derivatives have low efficiency in terms of feature extraction. Here, we designed a novel CNN model integrated with the extreme learning machine (ELM) algorithm. In this model, the sigmoid activation function is upgraded as the rectified linear unit (ReLU) activation function, and the CNN unit along with the ReLU activation function are used as a feature extractor. The ELM unit works as the image classifier, which makes the perfect symmetry for handwritten digit recognition. A deeplearning4j (DL4J) framework-based CNN-ELM model was developed and trained using the Modified National Institute of Standards and Technology (MNIST) database. Validation of the model was performed through self-build handwritten digits and USPS test datasets. Furthermore, we observed the variation of accuracies by adding various hidden layers in the architecture. Results reveal that the CNN-ELM-DL4J approach outperforms the conventional CNN models in terms of accuracy and computational time.

...read moreread less

21 citations

Journal Article•

Rotation Invariant Face Detection Using Convolutional Neural Networks

[...]

Fok Hing Chi Tivive, Abdesselam Bouzerdoum

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: The proposed networks have fewer free parameters and better generalization ability than the feedforward neural networks, and outperform the conventional convolutional neural networks.

...read moreread less

Abstract: This article addresses the problem of rotation invariant face detection using convolutional neural networks. Recently, we developed a new class of convolutional neural networks for visual pattern recognition. These networks have a simple network architecture and use shunting inhibitory neurons as the basic computing elements for feature extraction. Three networks with different connection schemes have been developed for in-plane rotation invariant face detection: fully-connected, toeplitz-connected, and binary-connected networks. The three networks are trained using a variant of Levenberg-Marquardt algorithm and tested on a set of 40,000 rotated face patterns. As a face/non-face classifier, these networks achieve 97.3% classification accuracy for a rotation angle in the range ±90° and 95.9% for full in-plane rotation. The proposed networks have fewer free parameters and better generalization ability than the feedforward neural networks, and outperform the conventional convolutional neural networks.

...read moreread less

9 citations

Journal Article•DOI•

Combination of Convolutional Neural Network Architecture and its Learning Method for Rotation‐Invariant Handwritten Digit Recognition

[...]

Kazuya Urazoe¹, Nobutaka Kuroki¹, Tetsuya Hirose², Masahiro Numa¹•Institutions (2)

Kobe University¹, Osaka University²

01 Jan 2021-Ieej Transactions on Electrical and Electronic Engineering

TL;DR: This letter presents three types of rotation‐invariance learning methods and applies them to five popular CNN architectures and indicates that multi‐task learning on ResNet‐50 is the best combination.

...read moreread less

5 citations

Proceedings Article•DOI•

A Vibrant Multiple Object Detection Using Machine Learning Techniques

[...]

25 Mar 2022

TL;DR: In this article , an R-CNN is used for object detection in which there is a single object in the whole image, and the part filters in a DPM it is a deformable part-based model.

...read moreread less

Abstract: Dynamic Multiple object detection is significantly a difficult task in object detection. In this paper, we’ve done object detection in which there is a single object in the whole image. The CNN network has taken an image and has given the output of two things (i.e.) The class or category of the objects like a dog, person, etc. And the second is the bounding box coordinates and the detection of multiple objects in a single frame, the existing methods do not perform well. An R-CNN is a region-based convolutional network it has achieved success in region-based feature extraction, and the part filters in a DPM it is a deformable part-based model that is very suitable for detecting hidden objects.

...read moreread less

2 citations

Proceedings Article•DOI•

Eigen Vectors based Rotation Invariant Multi-Object Deep Detector

[...]

Bharat Giddwani, Dheeraj Varma, Mohana Murali¹, Rama Krishna Sai Subrahmanyam Gorthi¹•Institutions (1)

Indian Institutes of Technology¹

01 Feb 2020

TL;DR: An accurate, scalable data-free approach based on eigenvectors and Convolutional Neural Networks for rotated object detection, which detects multiple objects at any angle in an image efficiently and determines actual image orientation without any prior information is proposed.

...read moreread less

Abstract: In this paper, we propose an accurate, scalable data-free approach based on eigenvectors and Convolutional Neural Networks (CNNs) for rotated object detection. Detecting an arbitrarily diverted object poses a challenging problem, as features extracted by CNNs are variant to small changes in shift and scale. They lack in performance for images at orientation different from input data. Hence, we introduce a novel two-step architecture, which detects multiple objects at any angle in an image efficiently. We utilize eigenvector analysis on the input image based on bright pixel distribution. The vertical and horizontal vectors are used as a reference to detect the deviation of an image from the original orientation. This analysis gives four orientations of the input image which, that pass through a pre-trained YOLOv3 with proposed decision criteria. Our approach referred to as "Eigen Vectors based Rotation Invariant Multi-Object Deep Detector" (EVRI-MODD), produces rotation invariant detection without any additional training on augmented data and also determines actual image orientation without any prior information. The proposed network achieves high performance on Pascal-VOC 2012 dataset. We evaluate our network performance on three differently rotated angles, 90°, 180°, and 270°, and achieves a significant gain in accuracy by 48%, 50%, and 47% respectively, over YOLOv3.

...read moreread less

1 citations

Cites background or methods from "Rotation Invariant Digit Recognitio..."

...Motivated from the work by [12], multiple orientations of the same image are fed to YOLOv3....
[...]
...In recent works like RIMCNNN [12], the image is fed into the network for N times, where N depends on the degree of rotation....
[...]
...Rotational Invariance using Multiple instance of Convolutional Neural Network (RIMCNN) [12] proposes the idea of using same trained CNN multiple times....
[...]
...Further, if two or more classes have maximum likelihood, then select the one with maximum objectness score [12]....
[...]

References

PDF

Open Access

More filters

Journal Article•DOI•

Gradient-based learning applied to document recognition

[...]

Yann LeCun¹, Léon Bottou², Léon Bottou³, Yoshua Bengio², Yoshua Bengio⁴, Yoshua Bengio⁵, Patrick Haffner² - Show less +3 more•Institutions (5)

Bell Labs¹, AT&T², École Normale Supérieure³, Alcatel-Lucent⁴, École Polytechnique de Montréal⁵

01 Jan 1998

TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.

...read moreread less

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

...read moreread less

42,067 citations

Journal Article•DOI•

Image Super-Resolution Using Deep Convolutional Networks

[...]

Chao Dong¹, Chen Change Loy¹, Kaiming He², Xiaoou Tang¹•Institutions (2)

The Chinese University of Hong Kong¹, Microsoft²

01 Feb 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: Zhang et al. as discussed by the authors proposed a deep learning method for single image super-resolution (SR), which directly learns an end-to-end mapping between the low/high-resolution images.

...read moreread less

Abstract: We propose a deep learning method for single image super-resolution (SR). Our method directly learns an end-to-end mapping between the low/high-resolution images. The mapping is represented as a deep convolutional neural network (CNN) that takes the low-resolution image as the input and outputs the high-resolution one. We further show that traditional sparse-coding-based SR methods can also be viewed as a deep convolutional network. But unlike traditional methods that handle each component separately, our method jointly optimizes all layers. Our deep CNN has a lightweight structure, yet demonstrates state-of-the-art restoration quality, and achieves fast speed for practical on-line usage. We explore different network structures and parameter settings to achieve trade-offs between performance and speed. Moreover, we extend our network to cope with three color channels simultaneously, and show better overall reconstruction quality.

...read moreread less

6,122 citations

Journal Article•DOI•

Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition

[...]

Di Wu, Lionel Pigou¹, Pieter-Jan Kindermans², Nam Le³, Ling Shao⁴, Joni Dambre¹, Jean-Marc Odobez³ - Show less +3 more•Institutions (4)

Ghent University¹, Technical University of Berlin², Idiap Research Institute³, Northumbria University⁴

01 Aug 2016-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations.

...read moreread less

Abstract: This paper describes a novel method called Deep Dynamic Neural Networks (DDNN) for multimodal gesture recognition. A semi-supervised hierarchical dynamic framework based on a Hidden Markov Model (HMM) is proposed for simultaneous gesture segmentation and recognition where skeleton joint information, depth and RGB images, are the multimodal input observations. Unlike most traditional approaches that rely on the construction of complex handcrafted features, our approach learns high-level spatio-temporal representations using deep neural networks suited to the input modality: a Gaussian-Bernouilli Deep Belief Network ( DBN ) to handle skeletal dynamics, and a 3D Convolutional Neural Network ( 3DCNN ) to manage and fuse batches of depth and RGB images. This is achieved through the modeling and learning of the emission probabilities of the HMM required to infer the gesture sequence. This purely data driven approach achieves a Jaccard index score of 0.81 in the ChaLearn LAP gesture spotting challenge. The performance is on par with a variety of state-of-the-art hand-tuned feature-based approaches and other learning-based methods, therefore opening the door to the use of deep learning techniques in order to further explore multimodal time series data.

...read moreread less

401 citations

Journal Article•DOI•

Artificial convolution neural network for medical image pattern recognition

[...]

Shih-Chung B. Lo¹, Heang Ping Chan², Jyh-Shyan Lin¹, Huai Li¹, Matthew T. Freedman¹, Seong Ki Mun¹ - Show less +2 more•Institutions (2)

Georgetown University Medical Center¹, University of Michigan²

15 Dec 1995-Neural Networks

TL;DR: Radiologists' reading procedure was modelled in order to instruct the artificial neural network to recognize the predefined image patterns and those of interest to experts and an unconventional method of using rotation and shift invariance is proposed to enhance the neural net performance.

...read moreread less

291 citations

Proceedings Article•DOI•

TI-POOLING: Transformation-Invariant Pooling for Feature Learning in Convolutional Neural Networks

[...]

Dmitry Laptev¹, Nikolay Savinov¹, Joachim M. Buhmann¹, Marc Pollefeys¹•Institutions (1)

ETH Zurich¹

27 Jun 2016

TL;DR: A deep neural network topology that incorporates a simple to implement transformationinvariant pooling operator (TI-POOLING) that is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes is presented.

...read moreread less

Abstract: In this paper we present a deep neural network topology that incorporates a simple to implement transformationinvariant pooling operator (TI-POOLING). This operator is able to efficiently handle prior knowledge on nuisance variations in the data, such as rotation or scale changes. Most current methods usually make use of dataset augmentation to address this issue, but this requires larger number of model parameters and more training data, and results in significantly increased training time and larger chance of under-or overfitting. The main reason for these drawbacks is that that the learned model needs to capture adequate features for all the possible transformations of the input. On the other hand, we formulate features in convolutional neural networks to be transformation-invariant. We achieve that using parallel siamese architectures for the considered transformation set and applying the TI-POOLING operator on their outputs before the fully-connected layers. We show that this topology internally finds the most optimal "canonical" instance of the input image for training and therefore limits the redundancy in learned features. This more efficient use of training data results in better performance on popular benchmark datasets with smaller number of parameters when comparing to standard convolutional neural networks with dataset augmentation and to other baselines.

...read moreread less

272 citations