CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review
Alhassan Mumuni,Fuseini Mumuni +1 more
- Vol. 2, Iss: 5, pp 1-23
TLDR
In this article, a review of the most promising approaches to extend CNN architectures to handle nontrivial geometric transformations is presented, as well as the application domains of the various approaches.Abstract:
One of the main challenges in machine vision relates to the problem of obtaining robust representation of visual features that remain unaffected by geometric transformations. This challenge arises naturally in many practical machine vision tasks. For example, in mobile robot applications like simultaneous localization and mapping (SLAM) and visual tracking, object shapes change depending on their orientation in the 3D world, camera proximity, viewpoint, or perspective. In addition, natural phenomena such as occlusion, deformation, and clutter can cause geometric appearance changes of the underlying objects, leading to geometric transformations of the resulting images. Recently, deep learning techniques have proven very successful in visual recognition tasks but they typically perform poorly with small data or when deployed in environments that deviate from training conditions. While convolutional neural networks (CNNs) have inherent representation power that provides a high degree of invariance to geometric image transformations, they are unable to satisfactorily handle nontrivial transformations. In view of this limitation, several techniques have been devised to extend CNNs to handle these situations. This article reviews some of the most promising approaches to extend CNN architectures to handle nontrivial geometric transformations. Key strengths and weaknesses, as well as the application domains of the various approaches are also highlighted. The review shows that although an adequate model for generalized geometric transformations has not yet been formulated, several techniques exist for solving specific problems. Using these methods, it is possible to develop task-oriented solutions to deal with nontrivial transformations.read more
Citations
More filters
Journal ArticleDOI
An Overview on Visual SLAM: From Tradition to Semantic
Weifeng Chen,Guang Peng Shang,A. Xiaolan Ji,Chengjun Zhou,Xiyang Wang,Chonghui Xu,Zhenxiong Li,Kai Hu +7 more
TL;DR: This paper introduces the development of VSLAM technology from two aspects: traditional V SLAM and semantic VSLam combined with deep learning, and focuses on the developmentof semantic V SLam based on deep learning.
Journal ArticleDOI
Fire-YOLO: A Small Target Object Detection Method for Fire Inspection
TL;DR: In this paper , an improved Fire-YOLO deep learning algorithm is proposed for the detection of small targets, fire-like and smoke-like targets in forest fire images, as well as fire detection under different natural lights.
Journal ArticleDOI
A Comparison of Pooling Methods for Convolutional Neural Networks
Afia Zafar,Muhammad Aamir,Nazri Mohd Nawi,Alif Syamil Arshad,Saman Riaz,Abdulrahman Alruban,Ashit Kumar Dutta,Sultan Almotairi +7 more
TL;DR: A critical understanding of traditional and modern pooling techniques is provided and the strengths and weaknesses for readers are highlighted.
Journal ArticleDOI
Data augmentation: A comprehensive survey of modern approaches
TL;DR: Data augmentation is the most effective way of alleviating the problem of data collection and annotation processes and consumes a lot of time and resources as mentioned in this paper , which is the main goal of data augmentation, to increase the volume, quality and diversity of training data.
Journal ArticleDOI
A Real-Time Complex Road AI Perception Based on 5G-V2X for Smart City Security
TL;DR: Experimental results show that the proposed real-time road perception method combined with the 5G-V2X framework has a faster processing speed and can sense road conditions robustly under various complex actual conditions.
References
More filters
Proceedings ArticleDOI
Deep Residual Learning for Image Recognition
TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI
Deep learning
TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.