Scale and Rotation Corrected CNNs (SRC-CNNs) for Scale and Rotation Invariant Character Recognition: SRC-CNN for Scale and Rotation Invariant Character Recognition
18 Dec 2018-
TL;DR: It is demonstrated how the basic PCA based rotation and scale invariant image recognition can be integrated to CNN for achieving better rotational and scale invariances in classification.
Abstract: Last decade has witnessed rapid growth for the popularity of Convolutional Neural Networks (CNNs), in detecting and classifying objects. The self trainable nature of CNNs makes them the strongest candidate as a classifier and a feature extractor. However, many of the existing CNN architectures fail recognizing texts or objects under input rotation and scaling. This paper introduces an elegant approach, 'Scale and Rotation Corrected CNN (SRC-CNN)' for scale and rotation invariant text recognition, exploiting the concept of principal component of characters. Prior to training and testing with baseline CNN, 'SRC-CNN' maps each character image to a reference orientation and scale, which is again derived from the character image itself. SRC-CNN is capable of recognizing characters in a document, even though they differ in orientation and scale greatly. The proposed method does not demand any training with samples which are scaled or rotated. The performance of proposed approach is validated on different character data sets like MNIST, MNIST_rot_12k and English alphabets and compared with state of the art rotation invariant classification networks. SRC-CNN is a generalized approach and can be extended for rotation and scale invariant classification of many other datasets as well, choosing any appropriate baseline CNN. Here we have demonstrated the generality of the proposed SRC-CNN on MNIST Fashion data set and found to perform well in rotation and scale invariant classification of objects as well. This paper demonstrates how the basic PCA based rotation and scale invariant image recognition can be integrated to CNN for achieving better rotational and scale invariances in classification.
Citations
More filters
[...]
TL;DR: This paper presents a deep image restoration model that restores adversarial examples so that the target model is classified correctly again and proves that its results are better than other rival methods.
Abstract: These days, deep learning and computer vision are much-growing fields in this modern world of information technology. Deep learning algorithms and computer vision have achieved great success in different applications like image classification, speech recognition, self-driving vehicles, disease diagnostics, and many more. Despite success in various applications, it is found that these learning algorithms face severe threats due to adversarial attacks. Adversarial examples are inputs like images in the computer vision field, which are intentionally slightly changed or perturbed. These changes are humanly imperceptible. But are misclassified by a model with high probability and severely affects the performance or prediction. In this scenario, we present a deep image restoration model that restores adversarial examples so that the target model is classified correctly again. We proved that our defense method against adversarial attacks based on a deep image restoration model is simple and state-of-the-art by providing strong experimental results evidence. We have used MNIST and CIFAR10 datasets for experiments and analysis of our defense method. In the end, we have compared our method to other state-ofthe-art defense methods and proved that our results are better than other rival methods.
4 citations
[...]
TL;DR: In this paper, the watershed algorithm was used for marker-driven segmentation of corneal endothelial cells and an encoder-decoder convolutional neural network trained in a sliding window set up to predict the probability of cell centers (markers) and cell borders.
Abstract: Quantitive information about corneal endothelium cells’ morphometry is vital for assessing cornea pathologies. Nevertheless, in clinical, everyday routine dominates qualitative assessment based on visual inspection of the microscopy images. Although several systems exist for automatic segmentation of corneal endothelial cells, they exhibit certain limitations. The main one is sensitivity to low contrast and uneven illumination, resulting in over-segmentation. Subsequently, image segmentation results often require manual editing of missing or false cell edges. Therefore, this paper further investigates the problem of corneal endothelium cell segmentation. A fully automatic pipeline is proposed that incorporates the watershed algorithm for marker-driven segmentation of corneal endothelial cells and an encoder-decoder convolutional neural network trained in a sliding window set up to predict the probability of cell centers (markers) and cell borders. The predicted markers are used for watershed segmentation of edge probability maps outputted by a neural network. The proposed method's performance on a heterogeneous dataset comprising four publicly available corneal endothelium image datasets is analyzed. The performance of three convolutional neural network models (i.e., U-Net, SegNet, and W-Net) incorporated in the proposed pipeline is examined. The results of the proposed pipeline are analyzed and compared to the state-of-the-art competitor. The obtained results are promising. Regardless of the convolutional neural model incorporated into the proposed pipeline, it notably outperforms the competitor. The proposed method scored 97.72% of cell detection accuracy, compared to 87.38% achieved by the competitor. The advantage of the introduced method is also apparent for cell size, DICE coefficient, and Modified Hausdorff distance.
1 citations
References
More filters
Proceedings Article•
[...]
TL;DR: This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.
Abstract: Convolutional Neural Networks define an exceptionally powerful class of models, but are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We show that the use of spatial transformers results in models which learn invariance to translation, scale, rotation and more generic warping, resulting in state-of-the-art performance on several benchmarks, and for a number of classes of transformations.
4,869 citations
Posted Content•
[...]
TL;DR: Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits.
Abstract: We present Fashion-MNIST, a new dataset comprising of 28x28 grayscale images of 70,000 fashion products from 10 categories, with 7,000 images per category. The training set has 60,000 images and the test set has 10,000 images. Fashion-MNIST is intended to serve as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms, as it shares the same image size, data format and the structure of training and testing splits. The dataset is freely available at this https URL
3,707 citations
"Scale and Rotation Corrected CNNs (..." refers methods in this paper
[...]
[...]
[...]
Posted Content•
[...]
TL;DR: Harmonic Networks as mentioned in this paper replace regular CNN filters with circular harmonics, returning a maximal response and orientation for every receptive field patch, which can encode complicated rotational invariants.
Abstract: Translating or rotating an input image should not affect the results of many computer vision tasks. Convolutional neural networks (CNNs) are already translation equivariant: input image translations produce proportionate feature map translations. This is not the case for rotations. Global rotation equivariance is typically sought through data augmentation, but patch-wise equivariance is more difficult. We present Harmonic Networks or H-Nets, a CNN exhibiting equivariance to patch-wise translation and 360-rotation. We achieve this by replacing regular CNN filters with circular harmonics, returning a maximal response and orientation for every receptive field patch.
H-Nets use a rich, parameter-efficient and low computational complexity representation, and we show that deep feature maps within the network encode complicated rotational invariants. We demonstrate that our layers are general enough to be used in conjunction with the latest architectures and techniques, such as deep supervision and batch normalization. We also achieve state-of-the-art classification on rotated-MNIST, and competitive results on other benchmark challenges.
292 citations
Proceedings Article•
[...]
TL;DR: This work introduces four operations which can be inserted into neural network models as layers, andWhich can be combined to make these models partially equivariant to rotations, and which enable parameter sharing across different orientations.
Abstract: Many classes of images exhibit rotational symmetry. Convolutional neural networks are sometimes trained using data augmentation to exploit this, but they are still required to learn the rotation equivariance properties from the data. Encoding these properties into the network architecture, as we are already used to doing for translation equivariance by using convolutional layers, could result in a more efficient use of the parameter budget by relieving the model from learning them. We introduce four operations which can be inserted into neural network models as layers, and which can be combined to make these models partially equivariant to rotations. They also enable parameter sharing across different orientations. We evaluate the effect of these architectural modifications on three datasets which exhibit rotational symmetry and demonstrate improved performance with smaller models.
235 citations
"Scale and Rotation Corrected CNNs (..." refers methods in this paper
[...]
[...]
TL;DR: RotEqNet as discussed by the authors is a convolutional neural network (CNN) architecture encoding rotation equivariance, invariance and covariance, instead of treating as any other variation, leading to a reduction in the size of the required model.
Abstract: In many computer vision tasks, we expect a particular behavior of the output with respect to rotations of the input image. If this relationship is explicitly encoded, instead of treated as any other variation, the complexity of the problem is decreased, leading to a reduction in the size of the required model. In this paper, we propose the Rotation Equivariant Vector Field Networks (RotEqNet), a Convolutional Neural Network (CNN) architecture encoding rotation equivariance, invariance and covariance. Each convolutional filter is applied at multiple orientations and returns a vector field representing magnitude and angle of the highest scoring orientation at every spatial location. We develop a modified convolution operator relying on this representation to obtain deep architectures. We test RotEqNet on several problems requiring different responses with respect to the inputs’ rotation: image classification, biomedical image segmentation, orientation estimation and patch matching. In all cases, we show that RotEqNet offers extremely compact models in terms of number of parameters and provides results in line to those of networks orders of magnitude larger.
177 citations
Related Papers (5)
[...]
[...]
[...]