scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Scale and Rotation Corrected CNNs (SRC-CNNs) for Scale and Rotation Invariant Character Recognition: SRC-CNN for Scale and Rotation Invariant Character Recognition

TL;DR: It is demonstrated how the basic PCA based rotation and scale invariant image recognition can be integrated to CNN for achieving better rotational and scale invariances in classification.
Abstract: Last decade has witnessed rapid growth for the popularity of Convolutional Neural Networks (CNNs), in detecting and classifying objects. The self trainable nature of CNNs makes them the strongest candidate as a classifier and a feature extractor. However, many of the existing CNN architectures fail recognizing texts or objects under input rotation and scaling. This paper introduces an elegant approach, 'Scale and Rotation Corrected CNN (SRC-CNN)' for scale and rotation invariant text recognition, exploiting the concept of principal component of characters. Prior to training and testing with baseline CNN, 'SRC-CNN' maps each character image to a reference orientation and scale, which is again derived from the character image itself. SRC-CNN is capable of recognizing characters in a document, even though they differ in orientation and scale greatly. The proposed method does not demand any training with samples which are scaled or rotated. The performance of proposed approach is validated on different character data sets like MNIST, MNIST_rot_12k and English alphabets and compared with state of the art rotation invariant classification networks. SRC-CNN is a generalized approach and can be extended for rotation and scale invariant classification of many other datasets as well, choosing any appropriate baseline CNN. Here we have demonstrated the generality of the proposed SRC-CNN on MNIST Fashion data set and found to perform well in rotation and scale invariant classification of objects as well. This paper demonstrates how the basic PCA based rotation and scale invariant image recognition can be integrated to CNN for achieving better rotational and scale invariances in classification.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper, the watershed algorithm was used for marker-driven segmentation of corneal endothelial cells and an encoder-decoder convolutional neural network trained in a sliding window set up to predict the probability of cell centers (markers) and cell borders.
Abstract: Quantitive information about corneal endothelium cells’ morphometry is vital for assessing cornea pathologies. Nevertheless, in clinical, everyday routine dominates qualitative assessment based on visual inspection of the microscopy images. Although several systems exist for automatic segmentation of corneal endothelial cells, they exhibit certain limitations. The main one is sensitivity to low contrast and uneven illumination, resulting in over-segmentation. Subsequently, image segmentation results often require manual editing of missing or false cell edges. Therefore, this paper further investigates the problem of corneal endothelium cell segmentation. A fully automatic pipeline is proposed that incorporates the watershed algorithm for marker-driven segmentation of corneal endothelial cells and an encoder-decoder convolutional neural network trained in a sliding window set up to predict the probability of cell centers (markers) and cell borders. The predicted markers are used for watershed segmentation of edge probability maps outputted by a neural network. The proposed method's performance on a heterogeneous dataset comprising four publicly available corneal endothelium image datasets is analyzed. The performance of three convolutional neural network models (i.e., U-Net, SegNet, and W-Net) incorporated in the proposed pipeline is examined. The results of the proposed pipeline are analyzed and compared to the state-of-the-art competitor. The obtained results are promising. Regardless of the convolutional neural model incorporated into the proposed pipeline, it notably outperforms the competitor. The proposed method scored 97.72% of cell detection accuracy, compared to 87.38% achieved by the competitor. The advantage of the introduced method is also apparent for cell size, DICE coefficient, and Modified Hausdorff distance.

12 citations

Journal ArticleDOI
TL;DR: This paper presents a deep image restoration model that restores adversarial examples so that the target model is classified correctly again and proves that its results are better than other rival methods.
Abstract: These days, deep learning and computer vision are much-growing fields in this modern world of information technology. Deep learning algorithms and computer vision have achieved great success in different applications like image classification, speech recognition, self-driving vehicles, disease diagnostics, and many more. Despite success in various applications, it is found that these learning algorithms face severe threats due to adversarial attacks. Adversarial examples are inputs like images in the computer vision field, which are intentionally slightly changed or perturbed. These changes are humanly imperceptible. But are misclassified by a model with high probability and severely affects the performance or prediction. In this scenario, we present a deep image restoration model that restores adversarial examples so that the target model is classified correctly again. We proved that our defense method against adversarial attacks based on a deep image restoration model is simple and state-of-the-art by providing strong experimental results evidence. We have used MNIST and CIFAR10 datasets for experiments and analysis of our defense method. In the end, we have compared our method to other state-ofthe-art defense methods and proved that our results are better than other rival methods.

4 citations

References
More filters
Journal ArticleDOI
TL;DR: This work proposes Deep Rotation Equivariant Network consisting of cycle layers, isotonic layers and decycle layers, and evaluates DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrates that it can improve the performance of state-of-the-art architectures.
Abstract: Recently, learning equivariant representations has attracted considerable research attention. Dieleman et al. introduce four operations which can be inserted into convolutional neural network to learn deep representations equivariant to rotation. However, feature maps should be copied and rotated four times in each layer in their approach, which causes much running time and memory overhead. In order to address this problem, we propose Deep Rotation Equivariant Network consisting of cycle layers, isotonic layers and decycle layers. Our proposed layers apply rotation transformation on filters rather than feature maps, achieving a speed up of more than 2 times with even less memory overhead. We evaluate DRENs on Rotated MNIST and CIFAR-10 datasets and demonstrate that it can improve the performance of state-of-the-art architectures.

46 citations


"Scale and Rotation Corrected CNNs (..." refers background or methods in this paper

  • ...Deep Rotation Equivariant Network ( DREN ) [8] replaces ordinary convolutional and pooling layers by cyclic, isotonic and decyclic layers to ensure rotational equivariance by exploiting the properties of permutations of rotated copies of same filters....

    [...]

  • ...Exploiting Cyclic Symmetry in Convolutional Neural Networks [3] describes an approach similar to DREN, but it uses cyclic permutations of feature maps instead of convolution filter kernels....

    [...]

  • ...Unlike STN [5], ORN [18], RIFD - CNN [2], DREN [8] e....

    [...]

  • ...Unlike STN [5], ORN [18], RIFD - CNN [2], DREN [8] e.t.c, SRCCNN does not demand any architectural change in the baseline CNNwhich results in an increase in number of network parameters and complexity....

    [...]

Book ChapterDOI
05 Nov 2016
TL;DR: This paper proposes a new method to construct FOFMMs by using a continuous parameter t, which has the same number of zeros as OFMMs with the same degree, but the zeros of FO FMMs polynomial are more uniformly distributed than which of OFM Ms and the first zero is closer to the origin.
Abstract: In this paper, we generalize the orthogonal Fourier-Mellin moments (OFMMs) to the fractional orthogonal Fourier-Mellin moments (FOFMMs), which are based on the fractional radial polynomials. We propose a new method to construct FOFMMs by using a continuous parameter \( t \) \( \left( {t > 0} \right) \). The fractional radial polynomials of FOFMMs have the same number of zeros as OFMMs with the same degree. But the zeros of FOFMMs polynomial are more uniformly distributed than which of OFMMs and the first zero is closer to the origin. A recursive method is also given to reduce computation time and improve numerical stability. Experimental results show that the proposed FOFMMs have better performance.

45 citations


"Scale and Rotation Corrected CNNs (..." refers background in this paper

  • ...Scale Invariant Feature Transform ( SIFT ) [9], Speeded-Up Robust Features ( SURF ) [1], Fourier Mellin Moments [17] are some examples of rotational invariant features, that can be extracted from the images....

    [...]

Posted Content
TL;DR: The Polar Transformer Network (PTN) as discussed by the authors combines the Spatial Transformer Networks (STN) and canonical coordinate representations and achieves state-of-the-art performance on rotated MNIST and SIM2MNIST datasets.
Abstract: Convolutional neural networks (CNNs) are inherently equivariant to translation. Efforts to embed other forms of equivariance have concentrated solely on rotation. We expand the notion of equivariance in CNNs through the Polar Transformer Network (PTN). PTN combines ideas from the Spatial Transformer Network (STN) and canonical coordinate representations. The result is a network invariant to translation and equivariant to both rotation and scale. PTN is trained end-to-end and composed of three distinct stages: a polar origin predictor, the newly introduced polar transformer module and a classifier. PTN achieves state-of-the-art on rotated MNIST and the newly introduced SIM2MNIST dataset, an MNIST variation obtained by adding clutter and perturbing digits with translation, rotation and scaling. The ideas of PTN are extensible to 3D which we demonstrate through the Cylindrical Transformer Network.

35 citations

Posted Content
TL;DR: This paper proposes a multi-scale CNN method to encourage the recognition of both types of features and evaluates it on a challenging image classification task involving task-relevant characteristics at multiple scales, and shows that the results show that the multi- scale CNN outperforms single-scaleCNN.
Abstract: Convolutional Neural Networks (CNNs) require large image corpora to be trained on classification tasks. The variation in image resolutions, sizes of objects and patterns depicted, and image scales, hampers CNN training and performance, because the task-relevant information varies over spatial scales. Previous work attempting to deal with such scale variations focused on encouraging scale-invariant CNN representations. However, scale-invariant representations are incomplete representations of images, because images contain scale-variant information as well. This paper addresses the combined development of scale-invariant and scale-variant representations. We propose a multi- scale CNN method to encourage the recognition of both types of features and evaluate it on a challenging image classification task involving task-relevant characteristics at multiple scales. The results show that our multi-scale CNN outperforms single-scale CNN. This leads to the conclusion that encouraging the combined development of a scale-invariant and scale-variant representation in CNNs is beneficial to image recognition performance.

29 citations


"Scale and Rotation Corrected CNNs (..." refers background in this paper

  • ...The training here takesmuch time and the filters learnedwill be learning scale variant features [11]....

    [...]

Proceedings Article
11 Nov 2017
TL;DR: A novel network architecture, the weight-shared multi-stage network (WSMS-Net), and focuses on acquiring the scale invariance by constructing of multiple stages of CNNs, which achieves higher classification accuracy on CIFAR-10, CIFar-100 and ImageNet datasets.
Abstract: Deep convolutional neural networks (CNNs) have become one of the most successful methods for image processing tasks in past few years. Recent studies on modern residual architectures, enabling CNNs to be much deeper, have achieved much better results thanks to their high expressive ability by numerous parameters. In general, CNNs are known to have the robustness to the small parallel shift of objects in images by their local receptive fields, weight parameters shared by each unit, and pooling layers sandwiching them. However, CNNs have a limited robustness to the other geometric transformations such as scaling and rotation, and this lack becomes an obstacle to performance improvement even now. This paper proposes a novel network architecture, the weight-shared multi-stage network (WSMS-Net), and focuses on acquiring the scale invariance by constructing of multiple stages of CNNs. The WSMS-Net is easily combined with existing deep CNNs, enables existing deep CNNs to acquire a robustness to the scaling, and therefore, achieves higher classification accuracy on CIFAR-10, CIFAR-100 and ImageNet datasets.

7 citations


"Scale and Rotation Corrected CNNs (..." refers background in this paper

  • ...[13] introduces a multistage architecture consisting of multiple CNNs arranged in parallel....

    [...]