You Only Look Once: Unified, Real-Time Object Detection

Open AccessPosted Content

You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon, +3 more

- 08 Jun 2015 -

arXiv: Computer Vision and Pattern Recog...

Chats0

TLDR

YOLO as discussed by the authors predicts bounding boxes and class probabilities directly from full images in one evaluation, which can be optimized end-to-end directly on detection performance, and achieves state-of-the-art performance.

Abstract:

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is far less likely to predict false detections where nothing exists. Finally, YOLO learns very general representations of objects. It outperforms all other detection methods, including DPM and R-CNN, by a wide margin when generalizing from natural images to artwork on both the Picasso Dataset and the People-Art Dataset.

Citations

PDF

Open Access

More filters

Posted Content

YOLO9000: Better, Faster, Stronger

Joseph Redmon, +1 more

- 25 Dec 2016 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: YOLO9000, a state-of-the-art, real-time object detection system that can detect over 9000 object categories, is introduced and a method to jointly train on object detection and classification is proposed, both novel and drawn from prior work.

...read moreread less

Journal ArticleDOI

Deep learning for visual understanding

Yanming Guo, +5 more

- 26 Apr 2016 -

Neurocomputing

TL;DR: The state-of-the-art in deep learning algorithms in computer vision is reviewed by highlighting the contributions and challenges from over 210 recent research papers, and the future trends and challenges in designing and training deep neural networks are summarized.

...read moreread less

Proceedings ArticleDOI

DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks

Orest Kupyn, +4 more

TL;DR: DeblurGAN achieves state-of-the art performance both in the structural similarity measure and visual appearance and is 5 times faster than the closest competitor - Deep-Deblur.

...read moreread less

Posted Content

DenseCap: Fully Convolutional Localization Networks for Dense Captioning

Justin Johnson, +2 more

- 24 Nov 2015 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A Fully Convolutional Localization Network (FCLN) architecture is proposed that processes an image with a single, efficient forward pass, requires no external regions proposals, and can be trained end-to-end with asingle round of optimization.

...read moreread less

Journal ArticleDOI

An overview of deep learning in medical imaging focusing on MRI

Alexander Lundervold, +4 more

- 01 May 2019 -

Zeitschrift Fur Medizinische Physik

TL;DR: This paper indicates how deep learning has been applied to the entire MRI processing chain, from acquisition to image retrieval, from segmentation to disease prediction, and provides a starting point for people interested in experimenting and contributing to the field of deep learning for medical imaging.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Posted Content

Network In Network

Min Lin, +2 more

- 16 Dec 2013 -

arXiv: Neural and Evolutionary Computing

TL;DR: With enhanced local modeling via the micro network, the proposed deep network structure NIN is able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers.

...read moreread less

Proceedings Article

DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition

Jeff Donahue, +6 more

TL;DR: DeCAF as discussed by the authors is an open-source implementation of these deep convolutional activation features, along with all associated network parameters, to enable vision researchers to conduct experimentation with deep representations across a range of visual concept learning paradigms.

...read moreread less

Proceedings ArticleDOI

An extended set of Haar-like features for rapid object detection

Rainer Lienhart, +1 more

TL;DR: This paper introduces a novel set of rotated Haar-like features that significantly enrich the simple features of Viola et al. scheme based on a boosted cascade of simple feature classifiers.

...read moreread less

Book ChapterDOI

Edge Boxes: Locating Object Proposals from Edges

C. Lawrence Zitnick, +1 more

TL;DR: A novel method for generating object bounding box proposals using edges is proposed, showing results that are significantly more accurate than the current state-of-the-art while being faster to compute.

...read moreread less

Posted Content

Going Deeper with Convolutions

Christian Szegedy, +8 more

- 17 Sep 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A deep convolutional neural network architecture codenamed Inception is proposed that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14).

...read moreread less