scispace - formally typeset
Search or ask a question
Topic

Bounding overwatch

About: Bounding overwatch is a research topic. Over the lifetime, 966 publications have been published within this topic receiving 15156 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a Scale-Sensitive IOU(SIOU) loss for object detection in multi-scale targets, especially the remote sensing images to solve the problem where the gradients of current loss functions tend to be smooth and cannot distinguish some special bounding boxes during training procedure, which may cause unreasonable loss value calculation and impact the convergence speed.
Abstract: Regression loss function in object detection model plays an important factor during training procedure. The IoU based loss functions, such as CIOU loss, achieve remarkable performance, but still have some inherent shortages that may cause slow convergence speed. The paper proposes a Scale-Sensitive IOU(SIOU) loss for the object detection in multi-scale targets, especially the remote sensing images to solve the problem where the gradients of current loss functions tend to be smooth and cannot distinguish some special bounding boxes during training procedure in multi-scale object detection, which may cause unreasonable loss value calculation and impact the convergence speed. A new geometric factor affecting the loss value calculation, namely area difference, is introduced to extend the existing three factors in CIOU loss; By introducing an area regulatory factor $\gamma $ to the loss function, it could adjust the loss values of the bounding boxes and distinguish different boxes quantitatively. Furthermore, we also apply our SIOU loss to the oriented bounding box detection and get better optimization. Through extensive experiments, the detection accuracies of YOLOv4, Faster R-CNN and SSD with SIOU loss improve much more than the previous loss functions on two horizontal bounding box datasets, i.e, NWPU VHR-10 and DIOR, and on the oriented bounding box dataset, DOTA, which are all remote sensing datasets. Therefore, the proposed loss function has the state-of-the-art performance on multi-scale object detection.

10 citations

Patent
23 Apr 2019
TL;DR: In this paper, a system for preprocessing an image for object recognition may include at least one memory storing instructions and at least a processor configured to execute the instructions to perform operations.
Abstract: The present disclosure relates to image preprocessing to improve object recognition. In one implementation, a system for preprocessing an image for object recognition may include at least one memory storing instructions and at least one processor configured to execute the instructions to perform operations. The operations may include receiving the image, detecting a plurality of bounding boxes within the image, grouping the plurality of bounding boxes into a plurality of groups such that bounding boxes within a group have shared areas exceeding an area threshold, deriving a first subset of the plurality of bounding boxes by selecting bounding boxes having highest class confidence scores from at least one group, selecting a bounding box from the first subset having a highest score based on area and class confidence score, and outputting the selected bounding box.

10 citations

Book ChapterDOI
TL;DR: Wang et al. as mentioned in this paper proposed generalized multiple instance learning and smooth maximum approximation to integrate the bounding box tightness prior into the deep neural network in an end-to-end manner.
Abstract: This paper presents a weakly supervised image segmentation method that adopts tight bounding box annotations. It proposes generalized multiple instance learning (MIL) and smooth maximum approximation to integrate the bounding box tightness prior into the deep neural network in an end-to-end manner. In generalized MIL, positive bags are defined by parallel crossing lines with a set of different angles, and negative bags are defined as individual pixels outside of any bounding boxes. Two variants of smooth maximum approximation, i.e., $\alpha$-softmax function and $\alpha$-quasimax function, are exploited to conquer the numeral instability introduced by maximum function of bag prediction. The proposed approach was evaluated on two pubic medical datasets using Dice coefficient. The results demonstrate that it outperforms the state-of-the-art methods. The codes are available at \url{https://github.com/wangjuan313/wsis-boundingbox}.

10 citations

Journal ArticleDOI
TL;DR: Wang et al. as discussed by the authors proposed a corner-guided anchor-free single-stage 3D object detection model (CG-SSD) to estimate the locations of partially visible and invisible corners to obtain a more accurate object feature representation, especially for small or partial occluded objects.
Abstract: Detecting accurate 3D bounding boxes of the object from point clouds is a major task in autonomous driving perception. At present, the anchor-based or anchor-free models that use LiDAR point clouds for 3D object detection use the center assigner strategy to infer the 3D bounding boxes. However, in the real-world scene, due to the occlusions and the effective detection range of the LiDAR system, only part of the object surface can be covered by the collected point clouds, and there are no measured 3D points corresponding to the physical object center. Obtaining the object by aggregating the incomplete surface point clouds will bring a loss of accuracy in direction and dimension estimation. To address this problem, we propose a corner-guided anchor-free single-stage 3D object detection model (CG-SSD). Firstly, the point clouds within a single frame are assigned to regular 3D grids. 3D sparse convolution backbone network composed of residual layers and sub-manifold sparse convolutional layers are used to construct bird’s eye view (BEV) features for further deeper feature mining by a lite U-shaped network; Secondly, a novel corner-guided auxiliary module (CGAM) with adaptive corner classification algorithm is proposed to incorporate corner supervision signals into the neural network. CGAM is explicitly designed and trained to estimate locations of partially visible and invisible corners to obtain a more accurate object feature representation, especially for small or partial occluded objects; Finally, the deep features from both the backbone networks and CGAM module are concatenated and fed into the head module to predict the classification and 3D bounding boxes of the objects in the scene. The experiments demonstrate that CG-SSD achieves the state-of-art performance on the ONCE benchmark for supervised 3D object detection using single frame point cloud data, with 62.77% mAP. Additionally, the experiments on ONCE and Waymo Open Dataset show that CGAM can be extended to most anchor-based models which use the BEV feature to detect objects, as a plug-in and bring +1.17%∼+14.23% AP improvement. The code is available at https://github.com/mrqrs/CG-SSD.

10 citations

Journal ArticleDOI
TL;DR: Without bells and whistles, BBENet outperforms the existing methods by a large margin with comparable speed, achieving the state-of-the-art single-shot detector.
Abstract: Accurate single-shot object detection is an extremely challenging task in real environments because of complex scenes, occlusion, ambiguities, blur, and shadow, i.e., these factors are called uncertainty problem. It leads to unreliable labeling of bounding box annotation and makes detectors arduous to learn bounding box localization. Previous methods viewed the ground truth box coordinates as a rigid distribution omitting localization uncertainty in real datasets. This article proposes a novel bounding box encoding algorithm integrated into the single-shot detector (BBENet) to consider the flexible distribution of bounding box localization. First, discretized ground truth labels are generated by decomposing each object’s boundary into multiple boundaries. The new representation of ground truth boxes is more arbitrary and flexible to cover any case of complex scenes. During training, the detector directly learns discretized box locations instead of continuous domain. Second, the bounding box encoding algorithm reorganizes bounding box predictions to be more accurate. Furthermore, another problem in existing methods is inconsistency in estimating detection quality. The single-shot detection consists of classification and localization tasks, but the popular detectors consider the classification score as the final detection quality. Thus, it lacks localization quality and hinders the overall performance because both tasks have a positive correlation. To overcome this problem, BBENet introduces detection quality by combining the localization and classification quality to rank detection during nonmaximum suppression. The localization quality is computed based on how uncertain the predicted boxes are, which is a new perspective in detection literature. The proposed BBENet is evaluated on three benchmark datasets, i.e., MS-COCO, Pascal VOC, and CrowdHuman. Without bells and whistles, BBENet outperforms the existing methods by a large margin with comparable speed, achieving the state-of-the-art single-shot detector.

10 citations


Network Information
Related Topics (5)
Robustness (computer science)
94.7K papers, 1.6M citations
85% related
Optimization problem
96.4K papers, 2.1M citations
85% related
Matrix (mathematics)
105.5K papers, 1.9M citations
82% related
Nonlinear system
208.1K papers, 4M citations
81% related
Artificial neural network
207K papers, 4.5M citations
80% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023714
20221,629
2021155
202075
201973
201850