scispace - formally typeset
Search or ask a question

Showing papers by "Jia Deng published in 2014"


Posted Content
TL;DR: The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared.
Abstract: The ImageNet Large Scale Visual Recognition Challenge is a benchmark in object category classification and detection on hundreds of object categories and millions of images. The challenge has been run annually from 2010 to present, attracting participation from more than fifty institutions. This paper describes the creation of this benchmark dataset and the advances in object recognition that have been possible as a result. We discuss the challenges of collecting large-scale ground truth annotation, highlight key breakthroughs in categorical object recognition, provide a detailed analysis of the current state of the field of large-scale image classification and object detection, and compare the state-of-the-art computer vision accuracy with human accuracy. We conclude with lessons learned in the five years of the challenge, and propose future directions and improvements.

519 citations


Book ChapterDOI
06 Sep 2014
TL;DR: A new model that allows encoding of flexible relations between labels is developed that can significantly improve object classification by exploiting the label relations and a probabilistic classification model based on HEX graphs is proposed.
Abstract: In this paper we study how to perform object classification in a principled way that exploits the rich structure of real world labels. We develop a new model that allows encoding of flexible relations between labels. We introduce Hierarchy and Exclusion (HEX) graphs, a new formalism that captures semantic relations between any two labels applied to the same object: mutual exclusion, overlap and subsumption. We then provide rigorous theoretical analysis that illustrates properties of HEX graphs such as consistency, equivalence, and computational implications of the graph structure. Next, we propose a probabilistic classification model based on HEX graphs and show that it enjoys a number of desirable properties. Finally, we evaluate our method using a large-scale benchmark. Empirical results demonstrate that our model can significantly improve object classification by exploiting the label relations.

448 citations


Proceedings ArticleDOI
26 Apr 2014
TL;DR: An algorithm that exploits correlation, hierarchy, and sparsity of the label distribution is proposed that results in up to 6x reduction in human computation time compared to the naive method of querying a human annotator for the presence of every object in every image.
Abstract: We study strategies for scalable multi-label annotation, or for efficiently acquiring multiple labels from humans for a collection of items. We propose an algorithm that exploits correlation, hierarchy, and sparsity of the label distribution. A case study of labeling 200 objects using 20,000 images demonstrates the effectiveness of our approach. The algorithm results in up to 6x reduction in human computation time compared to the naive method of querying a human annotator for the presence of every object in every image.

162 citations


01 Jan 2014
TL;DR: This paper proposes an object representation that detects important parts and describes fine grained appearances, and shows experimentally that combining these two insights is an effective strategy for fine-grained recognition.

98 citations


Proceedings ArticleDOI
24 Aug 2014
TL;DR: In this paper, an object representation that detects important parts and describes fine grained appearances is proposed, based on the insight that images with similar poses can be automatically discovered for fine-grained classes in the same domain.
Abstract: This paper addresses the problem of fine-grained recognition: recognizing subordinate categories such as bird species, car models, or dog breeds. We focus on two major challenges: learning expressive appearance descriptors and localizing discriminative parts. To this end, we propose an object representation that detects important parts and describes fine grained appearances. The part detectors are learned in a fully unsupervised manner, based on the insight that images with similar poses can be automatically discovered for fine-grained classes in the same domain. The appearance descriptors are learned using a convolutional neural network. Our approach requires only image level class labels, without any use of part annotations or segmentation masks, which may be costly to obtain. We show experimentally that combining these two insights is an effective strategy for fine-grained recognition.

92 citations