Deep Learning in Object Recognition, Detection, and Segmentation

Open AccessBook

Deep Learning in Object Recognition, Detection, and Segmentation

Chats0

TLDR

A historical overview of deep learning is provided and its applications in object recognition, detection, and segmentation, which are key challenges of computer vision and have numerous applications to images and videos are focused on.

Abstract:

As a major breakthrough in artificial intelligence, deep learning has achieved very impressive success in solving grand challenges in many fields including speech recognition, natural language processing, computer vision, image and video processing, and multimedia. This article provides a historical overview of deep learning and focus on its applications in object recognition, detection, and segmentation, which are key challenges of computer vision and have numerous applications to images and videos. The discussed research topics on object recognition include image classification on ImageNet, face recognition, and video classification. The detection part covers general object detection on ImageNet, pedestrian detection, face landmark detection face alignment, and human landmark detection pose estimation. On the segmentation side, thearticle discusses the most recent progress on scene labeling, semantic segmentation, face parsing, human parsing and saliency detection. Object recognition is considered as whole-image classification, while detection and segmentation are pixelwise classification tasks. Their fundamental differences will be discussed in this article. Fully convolutional neural networks and highly efficient forward and backward propagation algorithms specially designed for pixelwise classification task will be introduced. The covered application domains are also much diversified. Human and face images have regular structures, while general object and scene images have much more complex variations in geometric structures and layout. Videos include the temporal dimension. Therefore, they need to be processed with different deep models. All the selected domain applications have received tremendous attentions in the computer vision and multimedia communities. Through concrete examples of these applications, we explain the key points which make deep learning outperform conventional computer vision systems. 1 Different than traditional pattern recognition systems, which heavily rely on manually designed features, deep learning automatically learns hierarchical feature representations from massive training data and disentangles hidden factors of input data through multi-level nonlinear mappings. 2 Different than existing pattern recognition systems which sequentially design or train their key components, deep learning is able to jointly optimize all the components and crate synergy through close interactions among them. 3 While most machine learning models can be approximated with neural networks with shallow structures, for some tasks, the expressive power of deep models increases exponentially as their architectures go deep. Deep models are especially good at learning global contextual feature representation with their deep structures. 4 Benefitting from the large learning capacity of deep models, some classical computer vision challenges can be recast as high-dimensional data transform problems and can be solved from new perspectives. Finally, some open questions and future works regarding to deep learning in object recognition, detection, and segmentation will be discussed.

Deep Learning in Object Recognition, Detection, and Segmentation

Citations

Plant diseases and pests detection based on deep learning: a review.

Deep Learning-Based Real-Time Multiple-Object Detection and Tracking from Aerial Imagery via a Flying Robot with GPU-Based Embedded Devices.

Network representation learning: a systematic literature review

A critical and comprehensive review on power quality disturbance detection and classification

Deep Learning Framework for Vehicle and Pedestrian Detection in Rural Roads on an Embedded GPU

References

ImageNet Classification with Deep Convolutional Neural Networks

Long short-term memory

Very Deep Convolutional Networks for Large-Scale Image Recognition

Very Deep Convolutional Networks for Large-Scale Image Recognition

ImageNet: A large-scale hierarchical image database

Related Papers (5)

Understanding Deep Learning Techniques for Image Segmentation

Deep CNN and Deep GAN in Computational Visual Perception-Driven Image Analysis

Object Detection with Deep Learning: A Review

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Deep Learning Advances in Computer Vision with 3D Data: A Survey