scispace - formally typeset
Open AccessProceedings ArticleDOI

Soft-NMS — Improving Object Detection with One Line of Code

TLDR
Soft-NMS as mentioned in this paper decays the detection scores of all other objects as a continuous function of their overlap with M. As per the design of the algorithm, if an object lies within the predefined overlap threshold, it leads to a miss.
Abstract
Non-maximum suppression is an integral part of the object detection pipeline. First, it sorts all detection boxes on the basis of their scores. The detection box M with the maximum score is selected and all other detection boxes with a significant overlap (using a pre-defined threshold) with M are suppressed. This process is recursively applied on the remaining boxes. As per the design of the algorithm, if an object lies within the predefined overlap threshold, it leads to a miss. To this end, we propose Soft-NMS, an algorithm which decays the detection scores of all other objects as a continuous function of their overlap with M. Hence, no object is eliminated in this process. Soft-NMS obtains consistent improvements for the coco-style mAP metric on standard datasets like PASCAL VOC2007 (1.7% for both R-FCN and Faster-RCNN) and MS-COCO (1.3% for R-FCN and 1.1% for Faster-RCNN) by just changing the NMS algorithm without any additional hyper-parameters. Using Deformable-RFCN, Soft-NMS improves state-of-the-art in object detection from 39.8% to 40.9% with a single model. Further, the computational complexity of Soft-NMS is the same as traditional NMS and hence it can be efficiently implemented. Since Soft-NMS does not require any extra training and is simple to implement, it can be easily integrated into any object detection pipeline. Code for Soft-NMS is publicly available on GitHub http://bit.ly/2nJLNMu.

read more

Citations
More filters
Posted Content

YOLOv4: Optimal Speed and Accuracy of Object Detection

TL;DR: This work uses new features: WRC, CSP, CmBN, SAT, Mish activation, Mosaic data augmentation, C mBN, DropBlock regularization, and CIoU loss, and combine some of them to achieve state-of-the-art results: 43.5% AP for the MS COCO dataset at a realtime speed of ~65 FPS on Tesla V100.
Posted Content

End-to-End Object Detection with Transformers

TL;DR: This work presents a new method that views object detection as a direct set prediction problem, and demonstrates accuracy and run-time performance on par with the well-established and highly-optimized Faster RCNN baseline on the challenging COCO object detection dataset.
Posted Content

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows.

TL;DR: Wang et al. as mentioned in this paper proposed a new vision Transformer called Swin Transformer, which is computed with shifted windows to address the differences between the two domains, such as large variations in the scale of visual entities and the high resolution of pixels in images compared to words in text.
Book ChapterDOI

End-to-End Object Detection with Transformers

TL;DR: DetR as mentioned in this paper proposes a set-based global loss that forces unique predictions via bipartite matching, and a transformer encoder-decoder architecture to directly output the final set of predictions in parallel.
Journal ArticleDOI

Deep Learning for Generic Object Detection: A Survey

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.
References
More filters
Proceedings ArticleDOI

Object retrieval with large vocabularies and fast spatial matching

TL;DR: To improve query performance, this work adds an efficient spatial verification stage to re-rank the results returned from the bag-of-words model and shows that this consistently improves search quality, though by less of a margin when the visual vocabulary is large.
Book ChapterDOI

Edge Boxes: Locating Object Proposals from Edges

TL;DR: A novel method for generating object bounding box proposals using edges is proposed, showing results that are significantly more accurate than the current state-of-the-art while being faster to compute.
Journal ArticleDOI

W/sup 4/: real-time surveillance of people and their activities

TL;DR: W/sup 4/ employs a combination of shape analysis and tracking to locate people and their parts and to create models of people's appearance so that they can be tracked through interactions such as occlusions.
Proceedings ArticleDOI

Pedestrian detection: A benchmark

TL;DR: The Caltech Pedestrian Dataset is introduced, which is two orders of magnitude larger than existing datasets and proposes improved evaluation metrics, demonstrating that commonly used per-window measures are flawed and can fail to predict performance on full images.
Journal ArticleDOI

Edge and Curve Detection for Visual Scene Analysis

TL;DR: Simple sets of parallel operations are described which can be used to detect texture edges, "spots," and "streaks" in digitized pictures and it is shown that a composite output is constructed in which edges between differently textured regions are detected, and isolated objects are also detected, but the objects composing the textures are ignored.