You Only Look Once: Unified, Real-Time Object Detection

doi:10.1109/CVPR.2016.91

Open AccessProceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, +3 more

- pp 779-788

Chats0

TLDR

Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

Abstract:

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background. Finally, YOLO learns very general representations of objects. It outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning for visual understanding

Yanming Guo, +5 more

- 26 Apr 2016 -

Neurocomputing

TL;DR: The state-of-the-art in deep learning algorithms in computer vision is reviewed by highlighting the contributions and challenges from over 210 recent research papers, and the future trends and challenges in designing and training deep neural networks are summarized.

...read moreread less

Book ChapterDOI

Cornernet: Detecting objects as paired keypoints

Hei Law, +1 more

TL;DR: CornerNet as mentioned in this paper detects an object bounding box as a pair of keypoints, the top-left corner and the bottom-right corner, using a single convolution neural network.

...read moreread less

Journal ArticleDOI

SECOND: Sparsely Embedded Convolutional Detection

Yan Yan, +2 more

- 06 Oct 2018 -

Sensors

TL;DR: An improved sparse convolution method for Voxel-based 3D convolutional networks is investigated, which significantly increases the speed of both training and inference and introduces a new form of angle loss regression to improve the orientation estimation performance.

...read moreread less

Posted Content

DSSD : Deconvolutional Single Shot Detector.

Cheng-Yang Fu, +4 more

- 23 Jan 2017 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This paper combines a state-of-the-art classifier with a fast detection framework and augments SSD+Residual-101 with deconvolution layers to introduce additional large-scale context in object detection and improve accuracy, especially for small objects.

...read moreread less

Proceedings ArticleDOI

Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression

Hamid Rezatofighi, +5 more

TL;DR: In this paper, a generalized IoU (GIoU) metric is proposed for non-overlapping bounding boxes, which can be directly used as a regression loss.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Going deeper with convolutions

Christian Szegedy, +8 more

TL;DR: Inception as mentioned in this paper is a deep convolutional neural network architecture that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less

Proceedings ArticleDOI

Histograms of oriented gradients for human detection

Navneet Dalal, +1 more

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.

...read moreread less

Journal ArticleDOI

ImageNet Large Scale Visual Recognition Challenge

Olga Russakovsky, +11 more

- 01 Dec 2015 -

International Journal of Computer Vision

TL;DR: The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) as mentioned in this paper is a benchmark in object category classification and detection on hundreds of object categories and millions of images, which has been run annually from 2010 to present, attracting participation from more than fifty institutions.

...read moreread less

Journal ArticleDOI

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 01 Jun 2017 -

IEEE Transactions on Pattern Analysis an...

TL;DR: This work introduces a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals and further merge RPN and Fast R-CNN into a single network by sharing their convolutionAL features.

...read moreread less

Posted Content

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren, +3 more

- 04 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: Faster R-CNN as discussed by the authors proposes a Region Proposal Network (RPN) to generate high-quality region proposals, which are used by Fast R-NN for detection.

...read moreread less

Collapse

You Only Look Once: Unified, Real-Time Object Detection

Citations

Deep learning for visual understanding

Cornernet: Detecting objects as paired keypoints

SECOND: Sparsely Embedded Convolutional Detection

DSSD : Deconvolutional Single Shot Detector.

Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression

References

Going deeper with convolutions

Histograms of oriented gradients for human detection

ImageNet Large Scale Visual Recognition Challenge

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Related Papers (5)

Deep Residual Learning for Image Recognition

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

ImageNet Classification with Deep Convolutional Neural Networks

Microsoft COCO: Common Objects in Context

Very Deep Convolutional Networks for Large-Scale Image Recognition

Trending Questions (2)