scispace - formally typeset
Open Access

The Caltech-UCSD Birds-200-2011 Dataset

Reads0
Chats0
TLDR
CUB-200-2011 as mentioned in this paper is an extended version of CUB200, which roughly doubles the number of images per category and adds new part localization annotations, annotated with bounding boxes, part locations, and at-ribute labels.
Abstract
CUB-200-2011 is an extended version of CUB-200 [7], a challenging dataset of 200 bird species. The extended version roughly doubles the number of images per category and adds new part localization annotations. All images are annotated with bounding boxes, part locations, and at- tribute labels. Images and annotations were filtered by mul- tiple users of Mechanical Turk. We introduce benchmarks and baseline experiments for multi-class categorization and part localization.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

Spatial transformer networks

TL;DR: This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.
Posted Content

CNN Features off-the-shelf: an Astounding Baseline for Recognition

TL;DR: A series of experiments conducted for different recognition tasks using the publicly available code and model of the OverFeat network which was trained to perform object classification on ILSVRC13 suggest that features obtained from deep learning with convolutional nets should be the primary candidate in most visual recognition tasks.
Journal ArticleDOI

Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations

TL;DR: The Visual Genome dataset as mentioned in this paper contains over 108k images where each image has an average of $35$35 objects, $26$26 attributes, and $21$21 pairwise relationships between objects.
Proceedings ArticleDOI

CNN Features Off-the-Shelf: An Astounding Baseline for Recognition

TL;DR: In this paper, features extracted from the OverFeat network are used as a generic image representation to tackle the diverse range of recognition tasks of object image classification, scene recognition, fine grained recognition, attribute detection and image retrieval applied to a diverse set of datasets.
Proceedings ArticleDOI

CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features

TL;DR: CutMix as discussed by the authors augments the training data by cutting and pasting patches among training images, where the ground truth labels are also mixed proportionally to the area of the patches.
References
More filters
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Journal ArticleDOI

The Pascal Visual Object Classes (VOC) Challenge

TL;DR: The state-of-the-art in evaluated methods for both classification and detection are reviewed, whether the methods are statistically different, what they are learning from the images, and what the methods find easy or confuse.
Proceedings ArticleDOI

A discriminatively trained, multiscale, deformable part model

TL;DR: A discriminatively trained, multiscale, deformable part model for object detection, which achieves a two-fold improvement in average precision over the best performance in the 2006 PASCAL person detection challenge and outperforms the best results in the 2007 challenge in ten out of twenty categories.

Caltech-256 Object Category Dataset

TL;DR: A challenging set of 256 object categories containing a total of 30607 images is introduced and the clutter category is used to train an interest detector which rejects uninformative background regions.
Proceedings ArticleDOI

Learning to detect unseen object classes by between-class attribute transfer

TL;DR: The experiments show that by using an attribute layer it is indeed possible to build a learning object detection system that does not require any training images of the target classes, and assembled a new large-scale dataset, “Animals with Attributes”, of over 30,000 animal images that match the 50 classes in Osherson's classic table of how strongly humans associate 85 semantic attributes with animal classes.
Related Papers (5)