CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review

doi:10.1007/S42979-021-00735-0

Open AccessJournal ArticleDOI

CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review

- Vol. 2, Iss: 5, pp 1-23

TLDR

In this article, a review of the most promising approaches to extend CNN architectures to handle nontrivial geometric transformations is presented, as well as the application domains of the various approaches.

Abstract:

One of the main challenges in machine vision relates to the problem of obtaining robust representation of visual features that remain unaffected by geometric transformations. This challenge arises naturally in many practical machine vision tasks. For example, in mobile robot applications like simultaneous localization and mapping (SLAM) and visual tracking, object shapes change depending on their orientation in the 3D world, camera proximity, viewpoint, or perspective. In addition, natural phenomena such as occlusion, deformation, and clutter can cause geometric appearance changes of the underlying objects, leading to geometric transformations of the resulting images. Recently, deep learning techniques have proven very successful in visual recognition tasks but they typically perform poorly with small data or when deployed in environments that deviate from training conditions. While convolutional neural networks (CNNs) have inherent representation power that provides a high degree of invariance to geometric image transformations, they are unable to satisfactorily handle nontrivial transformations. In view of this limitation, several techniques have been devised to extend CNNs to handle these situations. This article reviews some of the most promising approaches to extend CNN architectures to handle nontrivial geometric transformations. Key strengths and weaknesses, as well as the application domains of the various approaches are also highlighted. The review shows that although an adequate model for generalized geometric transformations has not yet been formulated, several techniques exist for solving specific problems. Using these methods, it is possible to develop task-oriented solutions to deal with nontrivial transformations.

CNN Architectures for Geometric Transformation-Invariant Feature Representation in Computer Vision: A Review

Citations

An Overview on Visual SLAM: From Tradition to Semantic

Fire-YOLO: A Small Target Object Detection Method for Fire Inspection

A Comparison of Pooling Methods for Convolutional Neural Networks

Data augmentation: A comprehensive survey of modern approaches

A Real-Time Complex Road AI Perception Based on 5G-V2X for Smart City Security

References

Deep Residual Learning for Image Recognition

ImageNet Classification with Deep Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

Deep learning

Related Papers (5)

An improved object detection algorithm based on multi-scaled and deformable convolutional neural networks

Probabilistic Modeling in Computer Vision

CAGD-Based Computer Vision

A differential geometric approach to computer vision and its applications in control

Robust computational vision