Rotated region based CNN for ship detection

doi:10.1109/ICIP.2017.8296411

Home
/
Papers
/
Rotated region based CNN for ship detection

Proceedings Article•DOI•

Rotated region based CNN for ship detection

Zikun Liu¹, Hu Jingao¹, Lubin Weng¹, Yiping Yang¹•Institutions (1)

Chinese Academy of Sciences¹

01 Sep 2017-pp 900-904

TL;DR: Experimental results on the public ship dataset HRSC2016 confirm that RR-CNN outperforms baselines by a large margin and can learn and accurately extract features of rotated regions and locate rotated objects precisely.

read less

Abstract: The state-of-the-art object detection networks for natural images have recently demonstrated impressive performances. However the complexity of ship detection in high resolution satellite images exposes the limited capacity of these networks for strip-like rotated assembled object detection which are common in remote sensing images. In this paper, we embrace this observation and introduce the rotated region based CNN (RR-CNN), which can learn and accurately extract features of rotated regions and locate rotated objects precisely. RR-CNN has three important new components including a rotated region of interest (RRoI) pooling layer, a rotated bounding box regression model and a multi-task method for non-maximal suppression (NMS) between different classes. Experimental results on the public ship dataset HRSC2016 confirm that RR-CNN outperforms baselines by a large margin.

...read moreread less

Citations

PDF

Open Access

More filters

Posted Content•

Object Detection in 20 Years: A Survey

[...]

Zhengxia Zou¹, Zhenwei Shi², Yuhong Guo, Jieping Ye¹•Institutions (2)

University of Michigan¹, Beihang University²

13 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019), and makes an in-deep analysis of their challenges as well as technical improvements in recent years.

...read moreread less

Abstract: Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.

...read moreread less

802 citations

Cites background from "Rotated region based CNN for ship d..."

...proved the ROI Pooling layer for better rotation invariance [272, 409]....
[...]

Proceedings Article•DOI•

Learning RoI Transformer for Oriented Object Detection in Aerial Images

[...]

Jian Ding¹, Nan Xue¹, Yang Long¹, Gui-Song Xia¹, Qikai Lu¹ - Show less +1 more•Institutions (1)

Wuhan University¹

01 Jun 2019

TL;DR: The core idea of RoI Transformer is to apply spatial transformations on RoIs and learn the transformation parameters under the supervision of oriented bounding box (OBB) annotations.

...read moreread less

Abstract: Object detection in aerial images is an active yet challenging task in computer vision because of the bird’s-eye view perspective, the highly complex backgrounds, and the variant appearances of objects. Especially when detecting densely packed objects in aerial images, methods relying on horizontal proposals for common object detection often introduce mismatches between the Region of Interests (RoIs) and objects. This leads to the common misalignment between the final object classification confidence and localization accuracy. In this paper, we propose a RoI Transformer to address these problems. The core idea of RoI Transformer is to apply spatial transformations on RoIs and learn the transformation parameters under the supervision of oriented bounding box (OBB) annotations. RoI Transformer is with lightweight and can be easily embedded into detectors for oriented object detection. Simply apply the RoI Transformer to light head RCNN has achieved state-of-the-art performances on two common and challenging aerial datasets, i.e., DOTA and HRSC2016, with a neglectable reduction to detection speed. Our RoI Transformer exceeds the deformable Position Sensitive RoI pooling when oriented bounding-box annotations are available. Extensive experiments have also validated the flexibility and effectiveness of our RoI Transformer.

...read moreread less

634 citations

Proceedings Article•DOI•

SCRDet: Towards More Robust Detection for Small, Cluttered and Rotated Objects

[...]

Xue Yang¹, Jirui Yang², Junchi Yan¹, Yue Zhang², Tengfei Zhang², Zhi Guo², Xian Sun, Kun Fu² - Show less +4 more•Institutions (2)

Shanghai Jiao Tong University¹, Chinese Academy of Sciences²

01 Oct 2019

TL;DR: A sampling fusion network is devised which fuses multi-layer feature with effective anchor sampling, to improve the sensitivity to small objects, and the IoU constant factor is added to the smooth L1 loss to address the boundary problem for the rotating bounding box.

...read moreread less

Abstract: Object detection has been a building block in computer vision. Though considerable progress has been made, there still exist challenges for objects with small size, arbitrary direction, and dense distribution. Apart from natural images, such issues are especially pronounced for aerial images of great importance. This paper presents a novel multi-category rotation detector for small, cluttered and rotated objects, namely SCRDet. Specifically, a sampling fusion network is devised which fuses multi-layer feature with effective anchor sampling, to improve the sensitivity to small objects. Meanwhile, the supervised pixel attention network and the channel attention network are jointly explored for small and cluttered object detection by suppressing the noise and highlighting the objects feature. For more accurate rotation estimation, the IoU constant factor is added to the smooth L1 loss to address the boundary problem for the rotating bounding box. Extensive experiments on two remote sensing public datasets DOTA, NWPU VHR-10 as well as natural image datasets COCO, VOC2007 and scene text data ICDAR2015 show the state-of-the-art performance of our detector. The code and models will be available at https://github.com/DetectionTeamUCAS.

...read moreread less

552 citations

Additional excerpts

...vehicle [36], ship [41, 42, 28, 43, 27], aircraft [25] etc....
[...]

Proceedings Article•DOI•

Rotation-Sensitive Regression for Oriented Scene Text Detection

[...]

Minghui Liao¹, Zhen Zhu¹, Baoguang Shi¹, Gui-Song Xia², Xiang Bai¹ - Show less +1 more•Institutions (2)

Huazhong University of Science and Technology¹, Wuhan University²

18 Jun 2018

TL;DR: The proposed method named Rotation-sensitive Regression Detector (RRD) achieves state-of-the-art performance on several oriented scene text benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17, and COCO-Text, and achieves a significant improvement on a ship collection dataset, demonstrating its generality on oriented object detection.

...read moreread less

Abstract: Text in natural images is of arbitrary orientations, requiring detection in terms of oriented bounding boxes. Normally, a multi-oriented text detector often involves two key tasks: 1) text presence detection, which is a classification problem disregarding text orientation; 2) oriented bounding box regression, which concerns about text orientation. Previous methods rely on shared features for both tasks, resulting in degraded performance due to the incompatibility of the two tasks. To address this issue, we propose to perform classification and regression on features of different characteristics, extracted by two network branches of different designs. Concretely, the regression branch extracts rotation-sensitive features by actively rotating the convolutional filters, while the classification branch extracts rotation-invariant features by pooling the rotation-sensitive features. The proposed method named Rotation-sensitive Regression Detector (RRD) achieves state-of-the-art performance on several oriented scene text benchmark datasets, including ICDAR 2015, MSRA-TD500, RCTW-17, and COCO-Text. Furthermore, RRD achieves a significant improvement on a ship collection dataset, demonstrating its generality on oriented object detection.

...read moreread less

415 citations

Additional excerpts

...For detailed description, refer to [30] ....
[...]

Journal Article•DOI•

Gliding Vertex on the Horizontal Bounding Box for Multi-Oriented Object Detection

[...]

Yongchao Xu¹, Fu Mingtao¹, Qimeng Wang¹, Yukang Wang¹, Kai Chen², Gui-Song Xia³, Xiang Bai¹ - Show less +3 more•Institutions (3)

Huazhong University of Science and Technology¹, Shanghai Jiao Tong University², Wuhan University³

01 Apr 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: An obliquity factor based on area ratio between the object and its horizontal bounding box, guiding the selection of horizontal or oriented detection for each object is introduced, and five extra target variables are added to the regression head of faster R-CNN, which requires ignorable extra computation time.

...read moreread less

Abstract: Object detection has recently experienced substantial progress. Yet, the widely adopted horizontal bounding box representation is not appropriate for ubiquitous oriented objects such as objects in aerial images and scene texts. In this paper, we propose a simple yet effective framework to detect multi-oriented objects. Instead of directly regressing the four vertices, we glide the vertex of the horizontal bounding box on each corresponding side to accurately describe a multi-oriented object. Specifically, We regress four length ratios characterizing the relative gliding offset on each corresponding side. This may facilitate the offset learning and avoid the confusion issue of sequential label points for oriented objects. To further remedy the confusion issue for nearly horizontal objects, we also introduce an obliquity factor based on area ratio between the object and its horizontal bounding box, guiding the selection of horizontal or oriented detection for each object. We add these five extra target variables to the regression head of faster R-CNN, which requires ignorable extra computation time. Extensive experimental results demonstrate that without bells and whistles, the proposed method achieves superior performances on multiple multi-oriented object detection benchmarks including object detection in aerial images, scene text detection, pedestrian detection in fisheye images.

...read moreread less

395 citations

Additional excerpts

...Otherwise, the proposed method using FPN runs at 9.4 FPS instead of 10.0 FPS. TABLE 2 Quantitative Comparison With Some State-of-the-Art Methods on HRSC2016 Methods RC2 [44] R2PN [25] RRD [32] RoI Trans.* [5] Ours* Ours mAP 75.7 79.6 84.3 86.2 87.4 88.2 *indicates that Light-head R-CNN is adopted....
[...]
...Methods RC2 [44] R(2)PN [25] RRD [32] RoI Trans....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

Collapse

References

PDF

Open Access

More filters

Proceedings Article•DOI•

Deep Residual Learning for Image Recognition

[...]

Kaiming He¹, Xiangyu Zhang¹, Shaoqing Ren¹, Jian Sun¹•Institutions (1)

Microsoft¹

27 Jun 2016

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.

...read moreread less

Abstract: Deeper neural networks are more difficult to train. We present a residual learning framework to ease the training of networks that are substantially deeper than those used previously. We explicitly reformulate the layers as learning residual functions with reference to the layer inputs, instead of learning unreferenced functions. We provide comprehensive empirical evidence showing that these residual networks are easier to optimize, and can gain accuracy from considerably increased depth. On the ImageNet dataset we evaluate residual nets with a depth of up to 152 layers—8× deeper than VGG nets [40] but still having lower complexity. An ensemble of these residual nets achieves 3.57% error on the ImageNet test set. This result won the 1st place on the ILSVRC 2015 classification task. We also present analysis on CIFAR-10 with 100 and 1000 layers. The depth of representations is of central importance for many visual recognition tasks. Solely due to our extremely deep representations, we obtain a 28% relative improvement on the COCO object detection dataset. Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions1, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

...read moreread less

123,388 citations

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

"Rotated region based CNN for ship d..." refers background or methods in this paper

...RR-CNN approach includes auxiliary structures and a backbone network which could be any classical model, such as Alexnet [1] and VGG-16 [2], etc....
[...]
...Recently, advances in object detection are driven by progresses of backbone deep convolutional neural networks [1, 2, 3, 4] and improvements of object detection frameworks including R-CNN [5], Fast R-CNN [6], Faster R-CNN [7] and SSD [8], etc....
[...]

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

49,914 citations

Proceedings Article•DOI•

Densely Connected Convolutional Networks

[...]

Gao Huang¹, Zhuang Liu², Laurens van der Maaten³, Kilian Q. Weinberger¹•Institutions (3)

Cornell University¹, Tsinghua University², Facebook³

21 Jul 2017

TL;DR: DenseNet as mentioned in this paper proposes to connect each layer to every other layer in a feed-forward fashion, which can alleviate the vanishing gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters.

...read moreread less

Abstract: Recent work has shown that convolutional networks can be substantially deeper, more accurate, and efficient to train if they contain shorter connections between layers close to the input and those close to the output. In this paper, we embrace this observation and introduce the Dense Convolutional Network (DenseNet), which connects each layer to every other layer in a feed-forward fashion. Whereas traditional convolutional networks with L layers have L connections—one between each layer and its subsequent layer—our network has L(L+1)/2 direct connections. For each layer, the feature-maps of all preceding layers are used as inputs, and its own feature-maps are used as inputs into all subsequent layers. DenseNets have several compelling advantages: they alleviate the vanishing-gradient problem, strengthen feature propagation, encourage feature reuse, and substantially reduce the number of parameters. We evaluate our proposed architecture on four highly competitive object recognition benchmark tasks (CIFAR-10, CIFAR-100, SVHN, and ImageNet). DenseNets obtain significant improvements over the state-of-the-art on most of them, whilst requiring less memory and computation to achieve high performance. Code and pre-trained models are available at https://github.com/liuzhuang13/DenseNet.

...read moreread less

27,821 citations