scispace - formally typeset
Open AccessPosted Content

Region Proposal by Guided Anchoring

Reads0
Chats0
TLDR
Guided anchors as discussed by the authors leverages semantic features to guide the anchor shape and scales and aspect ratios at different locations to improve the performance of region anchors in object detection, achieving a 9.1% higher recall on MS COCO with 90% fewer anchors.
Abstract
Region anchors are the cornerstone of modern object detection techniques. State-of-the-art detectors mostly rely on a dense anchoring scheme, where anchors are sampled uniformly over the spatial domain with a predefined set of scales and aspect ratios. In this paper, we revisit this foundational stage. Our study shows that it can be done much more effectively and efficiently. Specifically, we present an alternative scheme, named Guided Anchoring, which leverages semantic features to guide the anchoring. The proposed method jointly predicts the locations where the center of objects of interest are likely to exist as well as the scales and aspect ratios at different locations. On top of predicted anchor shapes, we mitigate the feature inconsistency with a feature adaption module. We also study the use of high-quality proposals to improve detection performance. The anchoring scheme can be seamlessly integrated into proposal methods and detectors. With Guided Anchoring, we achieve 9.1% higher recall on MS COCO with 90% fewer anchors than the RPN baseline. We also adopt Guided Anchoring in Fast R-CNN, Faster R-CNN and RetinaNet, respectively improving the detection mAP by 2.2%, 2.7% and 1.2%. Code will be available at this https URL.

read more

Citations
More filters
Journal ArticleDOI

A Survey of Deep Learning-Based Object Detection

TL;DR: This survey provides a comprehensive overview of a variety of object detection methods in a systematic manner, covering the one-stage and two-stage detectors, and lists the traditional and new applications.
Posted Content

Learning Spatial Fusion for Single-Shot Object Detection

TL;DR: This work proposes a novel and data driven strategy for pyramidal feature fusion, referred to as adaptively spatial feature fusion (ASFF), which learns the way to spatially filter conflictive information to suppress the inconsistency, thus improving the scale-invariance of features, and introduces nearly free inference overhead.
Proceedings ArticleDOI

VarifocalNet: An IoU-aware Dense Object Detector

TL;DR: VarifocalNet as mentioned in this paper proposes to learn an IoU-aware classification score (IACS) as a joint representation of object presence confidence and localization accuracy, which can achieve a more accurate ranking of candidate detections based on the IACS.
Posted Content

DetNAS: Backbone Search for Object Detection

TL;DR: This work presents DetNAS to use Neural Architecture Search (NAS) for the design of better backbones for object detection and empirically finds that networks searched on object detection shows consistent superiority compared to those searched on ImageNet classification.
Journal ArticleDOI

A survey and performance evaluation of deep learning methods for small object detection

TL;DR: A comprehensive review of recently developed deep learning methods for small object detection can be found in this article, where the authors summarize challenges and solutions of small-object detection, and present major deep learning techniques, including fusing feature maps, adding context information, balancing foreground-background examples, and creating sufficient positive examples.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Proceedings ArticleDOI

Histograms of oriented gradients for human detection

TL;DR: It is shown experimentally that grids of histograms of oriented gradient (HOG) descriptors significantly outperform existing feature sets for human detection, and the influence of each stage of the computation on performance is studied.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

TL;DR: RCNN as discussed by the authors combines CNNs with bottom-up region proposals to localize and segment objects, and when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost.
Related Papers (5)