scispace - formally typeset
Open AccessProceedings ArticleDOI

Large-Scale Long-Tailed Recognition in an Open World

TLDR
An integrated OLTR algorithm is developed that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world.
Abstract
Real world data often have a long-tailed and open-ended distribution. A practical recognition system must classify among majority and minority classes, generalize from a few known instances, and acknowledge novelty upon a never seen instance. We define Open Long-Tailed Recognition (OLTR) as learning from such naturally distributed data and optimizing the classification accuracy over a balanced test set which include head, tail, and open classes. OLTR must handle imbalanced classification, few-shot learning, and open-set recognition in one integrated algorithm, whereas existing classification approaches focus only on one aspect and deliver poorly over the entire class spectrum. The key challenges are how to share visual knowledge between head and tail classes and how to reduce confusion between tail and open classes. We develop an integrated OLTR algorithm that maps an image to a feature space such that visual concepts can easily relate to each other based on a learned metric that respects the closed-world classification while acknowledging the novelty of the open world. Our so-called dynamic meta-embedding combines a direct image feature and an associated memory feature, with the feature norm indicating the familiarity to known classes. On three large-scale OLTR datasets we curate from object-centric ImageNet, scene-centric Places, and face-centric MS1M data, our method consistently outperforms the state-of-the-art. Our code, datasets, and models enable future OLTR research and are publicly available at \url{https://liuziwei7.github.io/projects/LongTail.html}.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings Article

Decoupling Representation and Classifier for Long-Tailed Recognition

TL;DR: It is shown that it is possible to outperform carefully designed losses, sampling strategies, even complex modules with memory, by using a straightforward approach that decouples representation and classification.
Proceedings ArticleDOI

BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed a unified Bilateral-Branch Network (BBN) to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately.
Proceedings ArticleDOI

Towards Open World Object Detection

TL;DR: In this paper, the authors propose a novel computer vision problem called "Open World Object Detection", where a model is tasked to identify objects that have not been introduced to it as "unknown" and incrementally learn these identified unknown categories without forgetting previously learned classes, when the corresponding labels are progressively received.
Posted Content

Long-tail learning via logit adjustment

TL;DR: These techniques revisit the classic idea of logit adjustment based on the label frequencies, either applied post-hoc to a trained model, or enforced in the loss during training, to encourage a large relative margin between logits of rare versus dominant labels.
Proceedings ArticleDOI

Meta-Learning to Detect Rare Objects

TL;DR: A conceptually simple but powerful meta-learning based framework that simultaneously tackles few- shot classification and few-shot localization in a unified, coherent way and introduces a weight prediction meta-model that enables predicting the parameters of category-specific components from few examples.
References
More filters
Proceedings ArticleDOI

Deep Residual Learning for Image Recognition

TL;DR: In this article, the authors proposed a residual learning framework to ease the training of networks that are substantially deeper than those used previously, which won the 1st place on the ILSVRC 2015 classification task.
Proceedings Article

ImageNet Classification with Deep Convolutional Neural Networks

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Proceedings Article

Attention is All you Need

TL;DR: This paper proposed a simple network architecture based solely on an attention mechanism, dispensing with recurrence and convolutions entirely and achieved state-of-the-art performance on English-to-French translation.
Proceedings ArticleDOI

ImageNet: A large-scale hierarchical image database

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.
Book ChapterDOI

Microsoft COCO: Common Objects in Context

TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Related Papers (5)