scispace - formally typeset
Open AccessPosted Content

Deep Learning for Fine-Grained Image Analysis: A Survey

TLDR
This survey aims to give a survey on recent advances of deep learning based FGIA techniques in a systematic way, and organizes the existing studies of FGia techniques into three major categories: fine-grained image recognition, fine- grained image retrieval and fine-Grained image generation.
Abstract
Computer vision (CV) is the process of using machines to understand and analyze imagery, which is an integral branch of artificial intelligence. Among various research areas of CV, fine-grained image analysis (FGIA) is a longstanding and fundamental problem, and has become ubiquitous in diverse real-world applications. The task of FGIA targets analyzing visual objects from subordinate categories, \eg, species of birds or models of cars. The small inter-class variations and the large intra-class variations caused by the fine-grained nature makes it a challenging problem. During the booming of deep learning, recent years have witnessed remarkable progress of FGIA using deep learning techniques. In this paper, we aim to give a survey on recent advances of deep learning based FGIA techniques in a systematic way. Specifically, we organize the existing studies of FGIA techniques into three major categories: fine-grained image recognition, fine-grained image retrieval and fine-grained image generation. In addition, we also cover some other important issues of FGIA, such as publicly available benchmark datasets and its related domain specific applications. Finally, we conclude this survey by highlighting several directions and open problems which need be further explored by the community in the future.

read more

Citations
More filters
Proceedings ArticleDOI

BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition

TL;DR: Zhang et al. as mentioned in this paper proposed a unified Bilateral-Branch Network (BBN) to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately.
Posted Content

BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition

TL;DR: A unified Bilateral-Branch Network (BBN) is proposed to take care of both representation learning and classifier learning simultaneously, where each branch does perform its own duty separately.
Proceedings ArticleDOI

Your “Flamingo” is My “Bird”: Fine-Grained, or Not

TL;DR: In this paper, the authors re-envisage the traditional setting of FGVC, from single-label classification, to that of top-down traversal of a pre-defined coarse-to-fine label hierarchy.
Proceedings ArticleDOI

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

TL;DR: Wang et al. as discussed by the authors proposed a stacked global-local attention network, which consists of two sub-networks for food recognition, one sub-network first utilizes hybrid spatial-channel attention to extract more discriminative features, and then aggregates these multi-scale features from multiple layers into global-level representation (e.g., texture and shape information about food).
Posted Content

ISIA Food-500: A Dataset for Large-Scale Food Recognition via Stacked Global-Local Attention Network

TL;DR: This work introduces the dataset ISIA Food-500 with 500 categories from the list in the Wikipedia and 399,726 images, a more comprehensive food dataset that surpasses existing popular benchmark datasets by category coverage and data volume, and proposes a stacked global-local attention network, which consists of two sub-networks for food recognition.
References
More filters
Journal ArticleDOI

Deep learning

TL;DR: Deep learning is making major advances in solving problems that have resisted the best attempts of the artificial intelligence community for many years, and will have many more successes in the near future because it requires very little engineering by hand and can easily take advantage of increases in the amount of available computation and data.
Journal ArticleDOI

Generative Adversarial Nets

TL;DR: A new framework for estimating generative models via an adversarial process, in which two models are simultaneously train: a generative model G that captures the data distribution and a discriminative model D that estimates the probability that a sample came from the training data rather than G.
Proceedings Article

Spatial transformer networks

TL;DR: This work introduces a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network, and can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps.

The Caltech-UCSD Birds-200-2011 Dataset

TL;DR: CUB-200-2011 as mentioned in this paper is an extended version of CUB200, which roughly doubles the number of images per category and adds new part localization annotations, annotated with bounding boxes, part locations, and at-ribute labels.
Journal ArticleDOI

DBpedia - A Large-scale, Multilingual Knowledge Base Extracted from Wikipedia

TL;DR: An overview of the DBpedia community project is given, including its architecture, technical implementation, maintenance, internationalisation, usage statistics and applications, including DBpedia one of the central interlinking hubs in the Linked Open Data (LOD) cloud.
Related Papers (5)