Open AccessBook
Deep Learning in Object Recognition, Detection, and Segmentation
Reads0
Chats0
TLDR
A historical overview of deep learning is provided and its applications in object recognition, detection, and segmentation, which are key challenges of computer vision and have numerous applications to images and videos are focused on.Abstract:
As a major breakthrough in artificial intelligence, deep learning has achieved very impressive success in solving grand challenges in many fields including speech recognition, natural language processing, computer vision, image and video processing, and multimedia. This article provides a historical overview of deep learning and focus on its applications in object recognition, detection, and segmentation, which are key challenges of computer vision and have numerous applications to images and videos. The discussed research topics on object recognition include image classification on ImageNet, face recognition, and video classification. The detection part covers general object detection on ImageNet, pedestrian detection, face landmark detection face alignment, and human landmark detection pose estimation. On the segmentation side, thearticle discusses the most recent progress on scene labeling, semantic segmentation, face parsing, human parsing and saliency detection. Object recognition is considered as whole-image classification, while detection and segmentation are pixelwise classification tasks. Their fundamental differences will be discussed in this article. Fully convolutional neural networks and highly efficient forward and backward propagation algorithms specially designed for pixelwise classification task will be introduced. The covered application domains are also much diversified. Human and face images have regular structures, while general object and scene images have much more complex variations in geometric structures and layout. Videos include the temporal dimension. Therefore, they need to be processed with different deep models. All the selected domain applications have received tremendous attentions in the computer vision and multimedia communities. Through concrete examples of these applications, we explain the key points which make deep learning outperform conventional computer vision systems. 1 Different than traditional pattern recognition systems, which heavily rely on manually designed features, deep learning automatically learns hierarchical feature representations from massive training data and disentangles hidden factors of input data through multi-level nonlinear mappings. 2 Different than existing pattern recognition systems which sequentially design or train their key components, deep learning is able to jointly optimize all the components and crate synergy through close interactions among them. 3 While most machine learning models can be approximated with neural networks with shallow structures, for some tasks, the expressive power of deep models increases exponentially as their architectures go deep. Deep models are especially good at learning global contextual feature representation with their deep structures. 4 Benefitting from the large learning capacity of deep models, some classical computer vision challenges can be recast as high-dimensional data transform problems and can be solved from new perspectives. Finally, some open questions and future works regarding to deep learning in object recognition, detection, and segmentation will be discussed.read more
Citations
More filters
Journal ArticleDOI
Plant diseases and pests detection based on deep learning: a review.
Jun Liu,Xuewei Wang +1 more
TL;DR: In this article, the authors provide a definition of plant diseases and pests detection problem, and put forward a comparison with traditional plant disease and pest detection methods, and discuss possible challenges and research ideas for the challenges, and several suggestions are given.
Journal ArticleDOI
Deep Learning-Based Real-Time Multiple-Object Detection and Tracking from Aerial Imagery via a Flying Robot with GPU-Based Embedded Devices.
Sabir Hossain,Deok Jin Lee +1 more
TL;DR: A very effective method for target detection and tracking from aerial imagery via drones using onboard powered sensors and devices based on a deep learning framework is proposed and demonstrated by real-time experiments with a small multi-rotor drone.
Journal ArticleDOI
Network representation learning: a systematic literature review
Bentian Li,Dechang Pi,Dechang Pi +2 more
TL;DR: This survey comprehensively presents an overview of a large number of network representation learning algorithms from two clear points of view of homogeneous network and heterogeneous network.
Journal ArticleDOI
A critical and comprehensive review on power quality disturbance detection and classification
TL;DR: This paper presents a comprehensive review of the work done until now in the field of power quality disturbance detection and classification and different combinations of signal processing techniques with machine learning techniques have been reviewed.
Journal ArticleDOI
Deep Learning Framework for Vehicle and Pedestrian Detection in Rural Roads on an Embedded GPU
TL;DR: The accuracy and processing time were in some cases improved when all the models suggested in the research were applied, and the pednet network model provides a high performance in pedestrian recognition, however, the sdd-mobilenet v2 and ssd-inception v2 models are better at detecting other objects such as vehicles in complex scenarios.
References
More filters
Proceedings Article
ImageNet Classification with Deep Convolutional Neural Networks
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Journal ArticleDOI
Long short-term memory
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Proceedings Article
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan,Andrew Zisserman +1 more
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Proceedings ArticleDOI
ImageNet: A large-scale hierarchical image database
TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.