Deep Learning for Fine-Grained Image Analysis: A Survey

Open AccessPosted Content

Deep Learning for Fine-Grained Image Analysis: A Survey

- 06 Jul 2019 -

arXiv: Computer Vision and Pattern Recog...

TLDR

This survey aims to give a survey on recent advances of deep learning based FGIA techniques in a systematic way, and organizes the existing studies of FGia techniques into three major categories: fine-grained image recognition, fine- grained image retrieval and fine-Grained image generation.

Abstract:

Computer vision (CV) is the process of using machines to understand and analyze imagery, which is an integral branch of artificial intelligence. Among various research areas of CV, fine-grained image analysis (FGIA) is a longstanding and fundamental problem, and has become ubiquitous in diverse real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, \eg, species of birds or models of cars. The small inter-class variations and the large intra-class variations caused by the fine-grained nature makes it a challenging problem. During the booming of deep learning, recent years have witnessed remarkable progress of FGIA using deep learning techniques. In this paper, we aim to give a survey on recent advances of deep learning based FGIA techniques in a systematic way. Specifically, we organize the existing studies of FGIA techniques into three major categories: fine-grained image recognition, fine-grained image retrieval and fine-grained image generation. In addition, we also cover some other important issues of FGIA, such as publicly available benchmark datasets and its related domain specific applications. Finally, we conclude this survey by highlighting several directions and open problems which need be further explored by the community in the future.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition

Boyan Zhou, +3 more

TL;DR: Zhang et al. as mentioned in this paper proposed a unified Bilateral-Branch Network (BBN) to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately.

...read moreread less

Posted Content

BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Boyan Zhou, +3 more

- 05 Dec 2019 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A unified Bilateral-Branch Network (BBN) is proposed to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately.

...read moreread less

Proceedings ArticleDOI

Your “Flamingo” is My “Bird”: Fine-Grained, or Not

Dongliang Chang, +5 more

TL;DR: In this paper, the authors re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy.

...read moreread less

Proceedings ArticleDOI

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

Weiqing Min, +6 more

TL;DR: Wang et al. as discussed by the authors proposed a stacked global-local attention network, which consists of two sub-networks for food recognition, one sub-network first utilizes hybrid spatial-channel attention to extract more discriminative features, and then aggregates these multi-scale features from multiple layers into global-level representation (e.g., texture and shape information about food).

...read moreread less

Posted Content

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

Weiqing Min, +6 more

- 13 Aug 2020 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: This work introduces the dataset ISIA Food-500 with 500 categories from the list in the Wikipedia and 399,726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume, and proposes a stacked global-local attention network, which consists of two sub-networks for food recognition.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Deep learning

Yann LeCun, +4 more

- 28 May 2015 -

Nature

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.

...read moreread less

Journal ArticleDOI

Generative Adversarial Nets

Ian Goodfellow, +7 more

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.

...read moreread less

Proceedings Article

Spatial transformer networks

Max Jaderberg, +3 more

TL;DR: This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.

...read moreread less

The Caltech-UCSD Birds-200-2011 Dataset

Catherine Wah, +4 more

TL;DR: CUB-200-2011 as mentioned in this paper is an extended version of CUB200, which roughly doubles the number of images per category and adds new part localization annotations, annotated with bounding boxes, part locations, and at-ribute labels.

...read moreread less

Journal ArticleDOI

DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia

Jens Lehmann, +11 more

- 01 Jan 2015 -

Social Work

TL;DR: An overview of the DBpedia community project is given, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications, including DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud.

...read moreread less

Collapse

Deep Learning for Fine-Grained Image Analysis: A Survey

Citations

BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition

BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

Your “Flamingo” is My “Bird”: Fine-Grained, or Not

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

References

Deep learning

Generative Adversarial Nets

Spatial transformer networks

The Caltech-UCSD Birds-200-2011 Dataset

DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia

Related Papers (5)

Deep Residual Learning for Image Recognition

ImageNet: A large-scale hierarchical image database

The Caltech-UCSD Birds-200-2011 Dataset

3D Object Representations for Fine-Grained Categorization

Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization