scispace - formally typeset
Open AccessJournal ArticleDOI

CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review

Alhassan Mumuni, +1 more
- Vol. 2, Iss: 5, pp 1-23
TLDR
In this article, a review of the most promising approaches to extend CNN architectures to handle nontrivial geometric transformations is presented, as well as the application domains of the various approaches.
Abstract
One of the main challenges in machine vision relates to the problem of obtaining robust representation of visual features that remain unaffected by geometric transformations. This challenge arises naturally in many practical machine vision tasks. For example, in mobile robot applications like simultaneous localization and mapping (SLAM) and visual tracking, object shapes change depending on their orientation in the 3D world, camera proximity, viewpoint, or perspective. In addition, natural phenomena such as occlusion, deformation, and clutter can cause geometric appearance changes of the underlying objects, leading to geometric transformations of the resulting images. Recently, deep learning techniques have proven very successful in visual recognition tasks but they typically perform poorly with small data or when deployed in environments that deviate from training conditions. While convolutional neural networks (CNNs) have inherent representation power that provides a high degree of invariance to geometric image transformations, they are unable to satisfactorily handle nontrivial transformations. In view of this limitation, several techniques have been devised to extend CNNs to handle these situations. This article reviews some of the most promising approaches to extend CNN architectures to handle nontrivial geometric transformations. Key strengths and weaknesses, as well as the application domains of the various approaches are also highlighted. The review shows that although an adequate model for generalized geometric transformations has not yet been formulated, several techniques exist for solving specific problems. Using these methods, it is possible to develop task-oriented solutions to deal with nontrivial transformations.

read more

Citations
More filters
Journal ArticleDOI

An Overview on Visual SLAM: From Tradition to Semantic

TL;DR: This paper introduces the development of VSLAM technology from two aspects: traditional V SLAM and semantic VSLam combined with deep learning, and focuses on the developmentof semantic V SLam based on deep learning.
Journal ArticleDOI

Fire-YOLO: A Small Target Object Detection Method for Fire Inspection

TL;DR: In this paper , an improved Fire-YOLO deep learning algorithm is proposed for the detection of small targets, fire-like and smoke-like targets in forest fire images, as well as fire detection under different natural lights.
Journal ArticleDOI

A Comparison of Pooling Methods for Convolutional Neural Networks

TL;DR: A critical understanding of traditional and modern pooling techniques is provided and the strengths and weaknesses for readers are highlighted.
Journal ArticleDOI

Data augmentation: A comprehensive survey of modern approaches

Alhassan G. Mumuni, +1 more
- 01 Nov 2022 - 
TL;DR: Data augmentation is the most effective way of alleviating the problem of data collection and annotation processes and consumes a lot of time and resources as mentioned in this paper , which is the main goal of data augmentation, to increase the volume, quality and diversity of training data.
Journal ArticleDOI

A Real-Time Complex Road AI Perception Based on 5G-V2X for Smart City Security

TL;DR: Experimental results show that the proposed real-time road perception method combined with the 5G-V2X framework has a faster processing speed and can sense road conditions robustly under various complex actual conditions.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.