scispace - formally typeset
Open AccessProceedings ArticleDOI

Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection

TLDR
This paper proposes a simple approach to generate large annotated instance datasets with minimal effort and outperforms existing synthesis approaches and when combined with real images improves relative performance by more than 21% on benchmark datasets.
Abstract
A major impediment in rapidly deploying object detection models for instance detection is the lack of large annotated datasets. For example, finding a large labeled dataset containing instances in a particular kitchen is unlikely. Each new environment with new instances requires expensive data collection and annotation. In this paper, we propose a simple approach to generate large annotated instance datasets with minimal effort. Our key insight is that ensuring only patch-level realism provides enough training signal for current object detector models. We automatically ‘cut’ object instances and ‘paste’ them on random backgrounds. A naive way to do this results in pixel artifacts which result in poor performance for trained models. We show how to make detectors ignore these artifacts during training and generate data that gives competitive performance on real data. Our method outperforms existing synthesis approaches and when combined with real images improves relative performance by more than 21% on benchmark datasets. In a cross-domain setting, our synthetic data combined with just 10% real data outperforms models trained on all real data.

read more

Citations
More filters
Proceedings ArticleDOI

CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features

TL;DR: CutMix as discussed by the authors augments the training data by cutting and pasting patches among training images, where the ground truth labels are also mixed proportionally to the area of the patches.
Journal ArticleDOI

Deep Learning for Generic Object Detection: A Survey

TL;DR: A comprehensive survey of the recent achievements in this field brought about by deep learning techniques, covering many aspects of generic object detection: detection frameworks, object feature representation, object proposal generation, context modeling, training strategies, and evaluation metrics.
Proceedings ArticleDOI

Training Deep Networks with Synthetic Data: Bridging the Reality Gap by Domain Randomization

TL;DR: This work presents a system for training deep neural networks for object detection using synthetic images that relies upon the technique of domain randomization, in which the parameters of the simulator are randomized in non-realistic ways to force the neural network to learn the essential features of the object of interest.
Posted Content

Simple Copy-Paste is a Strong Data Augmentation Method for Instance Segmentation

TL;DR: A systematic study of the Copy-Paste augmentation for instance segmentation where the authors randomly paste objects onto an image finds that the simple mechanism of pasting objects randomly is good enough and can provide solid gains on top of strong baselines.
Proceedings ArticleDOI

Conditional Generative Adversarial Network for Structured Domain Adaptation

TL;DR: A principled way to conduct structured domain adaption for semantic segmentation by integrating GAN into the FCN framework to mitigate the gap between source and target domains is proposed.
References
More filters
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Very Deep Convolutional Networks for Large-Scale Image Recognition

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Proceedings ArticleDOI

Fully convolutional networks for semantic segmentation

TL;DR: The key insight is to build “fully convolutional” networks that take input of arbitrary size and produce correspondingly-sized output with efficient inference and learning.
Proceedings ArticleDOI

You Only Look Once: Unified, Real-Time Object Detection

TL;DR: Compared to state-of-the-art detection systems, YOLO makes more localization errors but is less likely to predict false positives on background, and outperforms other detection methods, including DPM and R-CNN, when generalizing from natural images to other domains like artwork.
Related Papers (5)