Soft-NMS — Improving Object Detection with One Line of Code

doi:10.1109/ICCV.2017.593

Open AccessProceedings ArticleDOI

Soft-NMS — Improving Object Detection with One Line of Code

- pp 5562-5570

TLDR

Soft-NMS as mentioned in this paper decays the detection scores of all other objects as a continuous function of their overlap with M. As per the design of the algorithm, if an object lies within the predefined overlap threshold, it leads to a miss.

Abstract:

Non-maximum suppression is an integral part of the object detection pipeline. First, it sorts all detection boxes on the basis of their scores. The detection box M with the maximum score is selected and all other detection boxes with a significant overlap (using a pre-defined threshold) with M are suppressed. This process is recursively applied on the remaining boxes. As per the design of the algorithm, if an object lies within the predefined overlap threshold, it leads to a miss. To this end, we propose Soft-NMS, an algorithm which decays the detection scores of all other objects as a continuous function of their overlap with M. Hence, no object is eliminated in this process. Soft-NMS obtains consistent improvements for the coco-style mAP metric on standard datasets like PASCAL VOC2007 (1.7% for both R-FCN and Faster-RCNN) and MS-COCO (1.3% for R-FCN and 1.1% for Faster-RCNN) by just changing the NMS algorithm without any additional hyper-parameters. Using Deformable-RFCN, Soft-NMS improves state-of-the-art in object detection from 39.8% to 40.9% with a single model. Further, the computational complexity of Soft-NMS is the same as traditional NMS and hence it can be efficiently implemented. Since Soft-NMS does not require any extra training and is simple to implement, it can be easily integrated into any object detection pipeline. Code for Soft-NMS is publicly available on GitHub http://bit.ly/2nJLNMu.

Soft-NMS — Improving Object Detection with One Line of Code

Citations

Locating and Counting Heads in Crowds With a Depth Prior

OAF-Net: An Occlusion-Aware Anchor-Free Network for Pedestrian Detection in a Crowd

An Efficient Method for DPM Code Localization Based on Depthwise Separable Convolution

An Efficient Spatio-Temporal Pyramid Transformer for Action Detection

Progressive End-to-End Object Detection in Crowded Scenes

References

Deep Residual Learning for Image Recognition

Histograms of oriented gradients for human detection

A Computational Approach to Edge Detection

You Only Look Once: Unified, Real-Time Object Detection

Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation

Related Papers (5)

SSD: Single Shot MultiBox Detector

Deep Residual Learning for Image Recognition

Feature Pyramid Networks for Object Detection

You Only Look Once: Unified, Real-Time Object Detection

Fast R-CNN