DOTA: A Large-Scale Dataset for Object Detection in Aerial Images
read more
Citations
Deep Learning for Generic Object Detection: A Survey
Object Detection in 20 Years: A Survey
Object detection in optical remote sensing images: A survey and a new benchmark
A Survey of Deep Learning-Based Object Detection
Learning RoI Transformer for Oriented Object Detection in Aerial Images
References
Deep Residual Learning for Image Recognition
ImageNet: A large-scale hierarchical image database
Going deeper with convolutions
Microsoft COCO: Common Objects in Context
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Related Papers (5)
Frequently Asked Questions (16)
Q2. What are the main arguments for a good aerial image dataset?
The authors argue that a good aerial image dataset should possess four properties, namely, 1) a large number of images, 2) many instances per categories, 3) properly oriented object annotation, and 4) many different classes of objects, which make it approach to real-world applications.
Q3. What is the way to annotate oriented objects?
An option for annotating oriented objects is θ-based oriented bounding box which is adopted in some text detection benchmarks [37], namely (xc, yc, w, h, θ), where θ denotes the angle from the horizontal direction of the standard bounding box.
Q4. How do the authors generate ground truths for HBB experiments?
Ground truths for HBB experiments are generated by calculating the axis-aligned bounding boxes over original annotated bounding boxes.
Q5. How many instances are used in the dataset?
The fully annotated DOTA dataset contains 188,282 instances, each of which is labeled by an arbitrary quadrilateral, instead of an axis-aligned bounding box, as is typically used for object annotation in natural scenes.
Q6. What is the way to filter outliers?
Spatial resolution can also be used to filter mislabeled outliers in their dataset, as intra-class varieties of actual sizes for most categories are limited.
Q7. How do the authors ensure that the training data and test data distributions approximately match?
In order to ensure that the training data and test data distributions approximately match, the authors randomly select half of the original images as the training set, 1/6 as validation set, and 1/3 as the testing set.
Q8. How many images are in the original dataset?
The original size of images in their dataset ranges from about 800 × 800 to about 4k × 4k while most images in regular datasets (e.g. PASCALVOC and MSCOCO) are no more than 1k × 1k.
Q9. What are the main shortcomings of existing aerial image datasets?
existing aerial image datasets [41, 18, 16, 25] share in common several shortcomings: insufficient data and classes, lack of detailed annotations, as well as low image resolution.
Q10. What is the way to detect objects in aerial images?
existing annotated datasets for object detection in aerial images, such as UCAS-AOD [41] and NWPU VHR-10 [2], tend to use images in ideal conditions (clear backgrounds and without densely distributed instances), which cannot adequately reflect the problem complexity.
Q11. What is the way to describe the vertices of the newly generated parts?
For the vertices of the newly generated parts, the authors need to ensure they can be described as an oriented bounding box with 4 vertices in the clockwise order with a fitting method.
Q12. How many categories are selected and annotated in DOTA?
Fifteen categories are chosen and annotated in their DOTA dataset, including plane, ship, storage tank, baseball diamond, tennis court, swimming pool, ground track field, harbor, bridge, large vehicle, small vehicle, helicopter, roundabout, soccer ball field and basketball court.
Q13. What is the main argument for a largescale and challenging aerial object detection benchmark?
a largescale and challenging aerial object detection benchmark, being as close as possible to real-world applications, is imperative for promoting research in this field.
Q14. What makes DOTA unique among the above mentioned large-scale general object detection benchmarks?
what makes DOTA unique among the above mentioned large-scale general object detection benchmarks is that the objects in DOTA are annotated with properly oriented bounding boxes (OBB for short).
Q15. What is the way to annotate objects in aerial images?
bounding boxes labeled in this way cannot accurately or compactly outline oriented instances such as text and objects in aerial images.
Q16. What is the description of the dataset?
The authors assume this dataset is challenging but similar to natural aerial scenes, which are more appropriate for practical applications.