ImageNet: A large-scale hierarchical image database

doi:10.1109/CVPR.2009.5206848

Home
/
Papers
/
ImageNet: A large-scale hierarchical image database

Proceedings Article•DOI•

ImageNet: A large-scale hierarchical image database

Jia Deng¹, Wei Dong¹, Richard Socher¹, Li-Jia Li¹, Kai Li¹, Li Fei-Fei¹ - Show less +2 more•Institutions (1)

Princeton University¹

20 Jun 2009-pp 248-255

TL;DR: A new database called “ImageNet” is introduced, a large-scale ontology of images built upon the backbone of the WordNet structure, much larger in scale and diversity and much more accurate than the current image datasets.

read less

Abstract: The explosion of image data on the Internet has the potential to foster more sophisticated and robust models and algorithms to index, retrieve, organize and interact with images and multimedia data. But exactly how such data can be harnessed and organized remains a critical problem. We introduce here a new database called “ImageNet”, a large-scale ontology of images built upon the backbone of the WordNet structure. ImageNet aims to populate the majority of the 80,000 synsets of WordNet with an average of 500-1000 clean and full resolution images. This will result in tens of millions of annotated images organized by the semantic hierarchy of WordNet. This paper offers a detailed analysis of ImageNet in its current state: 12 subtrees with 5247 synsets and 3.2 million images in total. We show that ImageNet is much larger in scale and diversity and much more accurate than the current image datasets. Constructing such a large-scale database is a challenging task. We describe the data collection scheme with Amazon Mechanical Turk. Lastly, we illustrate the usefulness of ImageNet through three simple applications in object recognition, image classification and automatic object clustering. We hope that the scale, accuracy, diversity and hierarchical structure of ImageNet can offer unparalleled opportunities to researchers in the computer vision community and beyond.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos

[...]

Sebastian Agethen¹, Winston H. Hsu¹•Institutions (1)

National Taiwan University¹

01 Mar 2020-IEEE Transactions on Multimedia

TL;DR: This work proposes a new enhancement to Convolutional LSTM networks that supports accommodation of multiple convolutional kernels and layers, and proposes an attention-based mechanism that is specifically designed for the multi-kernel extension.

...read moreread less

Abstract: Action recognition greatly benefits motion understanding in video analysis. Recurrent networks such as long short-term memory (LSTM) networks are a popular choice for motion-aware sequence learning tasks. Recently, a convolutional extension of LSTM was proposed, in which input-to-hidden and hidden-to-hidden transitions are modeled through convolution with a single kernel. This implies an unavoidable trade-off between effectiveness and efficiency. Herein, we propose a new enhancement to convolutional LSTM networks that supports accommodation of multiple convolutional kernels and layers. This resembles a Network-in-LSTM approach, which improves upon the aforementioned concern. In addition, we propose an attention-based mechanism that is specifically designed for our multi-kernel extension. We evaluated our proposed extensions in a supervised classification setting on the UCF-101 and Sports-1M datasets, with the findings showing that our enhancements improve accuracy. We also undertook qualitative analysis to reveal the characteristics of our system and the convolutional LSTM baseline.

...read moreread less

23 citations

Cites background from "ImageNet: A large-scale hierarchica..."

...A notable insight of their work is the fact that convolutional networks based on optical flow features can be fine tuned from DCNs taught on RGB inputs, such as that used for image classification on the ImageNet dataset [11]....
[...]

Posted Content•

Instance Shadow Detection

[...]

Tianyu Wang¹, Xiaowei Hu¹, Qiong Wang², Pheng-Ann Heng¹, Chi-Wing Fu¹ - Show less +1 more•Institutions (2)

The Chinese University of Hong Kong¹, Chinese Academy of Sciences²

16 Nov 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: LISA is designed, named after Light-guided Instance Shadow-object Association, an end-to-end framework to automatically predict the shadow and object instances, together with the shadow-object associations and light direction, and demonstrates its applicability on light direction estimation and photo editing.

...read moreread less

Abstract: Instance shadow detection is a brand new problem, aiming to find shadow instances paired with object instances. To approach it, we first prepare a new dataset called SOBA, named after Shadow-OBject Association, with 3,623 pairs of shadow and object instances in 1,000 photos, each with individual labeled masks. Second, we design LISA, named after Light-guided Instance Shadow-object Association, an end-to-end framework to automatically predict the shadow and object instances, together with the shadow-object associations and light direction. Then, we pair up the predicted shadow and object instances, and match them with the predicted shadow-object associations to generate the final results. In our evaluations, we formulate a new metric named the shadow-object average precision to measure the performance of our results. Further, we conducted various experiments and demonstrate our method's applicability on light direction estimation and photo editing.

...read moreread less

23 citations

Cites methods from "ImageNet: A large-scale hierarchica..."

...Specifically, we adopt the weights of ResNeXt-101-FPN [27, 48] trained on ImageNet [7] to initialize the parameters of the backbone network, and train our framework on two GeForce GTX 1080 Ti GPUs (four images per GPU) for 40k training iterations....
[...]

Journal Article•DOI•

Cell Image Classification: A Comparative Overview.

[...]

Mohammad Shifat-E-Rabbi¹, Xuwang Yin¹, Cailey E. Fitzgerald¹, Gustavo K. Rohde¹•Institutions (1)

University of Virginia¹

10 Feb 2020-Cytometry Part A

TL;DR: In this paper, three main approaches for cell image classification most often used: numerical feature extraction, end-to-end classification with neural networks (NNs), and transport-based morphometry (TBM).

...read moreread less

Abstract: Cell image classification methods are currently being used in numerous applications in cell biology and medicine. Applications include understanding the effects of genes and drugs in screening experiments, understanding the role and subcellular localization of different proteins, as well as diagnosis and prognosis of cancer from images acquired using cytological and histological techniques. The article also reviews three main approaches for cell image classification most often used: numerical feature extraction, end-to-end classification with neural networks (NNs), and transport-based morphometry (TBM). In addition, we provide comparisons on four different cell imaging datasets to highlight the relative strength of each method. The results computed using four publicly available datasets show that numerical features tend to carry the best discriminative information for most of the classification tasks. Results also show that NN-based methods produce state-of-the-art results in the dataset that contains a relatively large number of training samples. Data augmentation or the choice of a more recently reported architecture does not necessarily improve the classification performance of NNs in the datasets with limited number of training samples. If understanding and visualization are desired aspects, TBM methods can offer the ability to invert classification functions, and thus can aid in the interpretation of results. These and other comparison outcomes are discussed with the aim of clarifying the advantages and disadvantages of each method. © 2020 International Society for Advancement of Cytometry.

...read moreread less

23 citations

Journal Article•DOI•

A survey of image semantics-based visual simultaneous localization and mapping: Application-oriented solutions to autonomous navigation of mobile robots:

[...]

Linlin Xia¹, Jiashuo Cui¹, Ran Shen¹, Xun Xu², Yiping Gao¹, Xinying Li¹ - Show less +2 more•Institutions (2)

Electric Power University¹, University of Wollongong²

13 May 2020-International Journal of Advanced Robotic Systems

TL;DR: It is argued that multiscaled map representation, object simultaneous localized and mapping system, and deep neural network-based simultaneous localization and mapping pipeline design could be effective solutions to image semantics-fused visual simultaneous localizationand mapping.

...read moreread less

Abstract: As one of the typical application-oriented solutions to robot autonomous navigation, visual simultaneous localization and mapping is essentially restricted to simplex environmental understanding ba...

...read moreread less

23 citations

Proceedings Article•DOI•

Improved Image Classification using Topological Persistence

[...]

Tamal K. Dey¹, Sayan Mandal², William Varcho•Institutions (2)

Ohio State University¹, Indian Institute of Technology Kharagpur²

01 Jan 2017

TL;DR: This work focuses on improving modern image classification techniques by considering topological features as well, and shows that incorporating this information allows the models to improve the accuracy, precision and recall on test data, thus providing evidence that topological signatures can be leveraged for enhancing some of the state-of-the art applications in computer vision.

...read moreread less

Abstract: Image classification has been a topic of interest for many years. With the advent of Deep Learning, impressive progress has been made on the task, resulting in quite accurate classification. Our work focuses on improving modern image classification techniques by considering topological features as well. We show that incorporating this information allows our models to improve the accuracy, precision and recall on test data, thus providing evidence that topological signatures can be leveraged for enhancing some of the state-of-the art applications in computer vision.

...read moreread less

23 citations

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Distinctive Image Features from Scale-Invariant Keypoints

[...]

David G. Lowe¹•Institutions (1)

University of British Columbia¹

01 Nov 2004-International Journal of Computer Vision

TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

...read moreread less

46,906 citations

"ImageNet: A large-scale hierarchica..." refers methods in this paper

...SIFT [15] descriptors are used in this experiment....
[...]

Journal Article•DOI•

WordNet : an electronic lexical database

[...]

Christiane Fellbaum

01 Sep 2000-Language

TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.

...read moreread less

Abstract: Part 1 The lexical database: nouns in WordNet, George A. Miller modifiers in WordNet, Katherine J. Miller a semantic network of English verbs, Christiane Fellbaum design and implementation of the WordNet lexical database and searching software, Randee I. Tengi. Part 2: automated discovery of WordNet relations, Marti A. Hearst representing verb alterations in WordNet, Karen T. Kohl et al the formalization of WordNet by methods of relational concept analysis, Uta E. Priss. Part 3 Applications of WordNet: building semantic concordances, Shari Landes et al performance and confidence in a semantic annotation task, Christiane Fellbaum et al WordNet and class-based probabilities, Philip Resnik combining local context and WordNet similarity for word sense identification, Claudia Leacock and Martin Chodorow using WordNet for text retrieval, Ellen M. Voorhees lexical chains as representations of context for the detection and correction of malapropisms, Graeme Hirst and David St-Onge temporal indexing through lexical chaining, Reem Al-Halimi and Rick Kazman COLOR-X - using knowledge from WordNet for conceptual modelling, J.F.M. Burg and R.P. van de Riet knowledge processing on an extended WordNet, Sanda M. Harabagiu and Dan I Moldovan appendix - obtaining and using WordNet.

...read moreread less

13,049 citations

"ImageNet: A large-scale hierarchica..." refers background or methods in this paper

...ImageNet uses the hierarchical structure of WordNet [9]....
[...]
...The main asset of WordNet [9] lies in its semantic structure, i....
[...]

Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments

[...]

Gary B. Huang¹, Marwan Mattar¹, Tamara L. Berg², Eric Learned-Miller¹•Institutions (2)

University of Massachusetts Amherst¹, Stony Brook University²

01 Oct 2008

TL;DR: The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life, and exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background.

...read moreread less

Abstract: Most face databases have been created under controlled conditions to facilitate the study of specific parameters on the face recognition problem. These parameters include such variables as position, pose, lighting, background, camera quality, and gender. While there are many applications for face recognition technology in which one can control the parameters of image acquisition, there are also many applications in which the practitioner has little or no control over such parameters. This database, Labeled Faces in the Wild, is provided as an aid in studying the latter, unconstrained, recognition problem. The database contains labeled face photographs spanning the range of conditions typically encountered in everyday life. The database exhibits “natural” variability in factors such as pose, lighting, race, accessories, occlusions, and background. In addition to describing the details of the database, we provide specific experimental paradigms for which the database is suitable. This is done in an effort to make research performed with the database as consistent and comparable as possible. We provide baseline results, including results of a state of the art face recognition system combined with a face alignment system. To facilitate experimentation on the database, we provide several parallel databases, including an aligned version.

...read moreread less

5,742 citations

"ImageNet: A large-scale hierarchica..." refers methods in this paper

...Special purpose datasets, such as FERET faces [19], Labeled faces in the Wild [13] and the Mammal Benchmark by Fink and Ullman [11] are not included....
[...]

Principles of categorization

[...]

Eleanor Rosch¹•Institutions (1)

University of California, Berkeley¹

01 Jan 1978

TL;DR: On those remote pages it is written that animals are divided into those that belong to the Emperor, and those that are trained, suckling pigs and stray dogs.

...read moreread less

Abstract: On those remote pages itis written that animals are divided into (a) those that belong tothe Emperor, (b)embalmed ones, (c) those that are trained, (d) suckling pigs, (e) mermaids, (f) fabulous ones, (g) stray dogs, (h) those that are included in this classification, (i) those that tremble as if they were mad, (j) innumerable ones, (k) those drawn with a very fine camels hair brush, (1) others, (m) those that have just broken a flower vase, (n) those that resemble f ies from

...read moreread less

4,302 citations

"ImageNet: A large-scale hierarchica..." refers background in this paper

...Rosch and Lloyd [ 20 ] have demonstrated that humans tend to label visual objects at an easily accessible semantic level termed as “basic level” (e.g....
[...]

Proceedings Article•DOI•

Scalable Recognition with a Vocabulary Tree

[...]

David Nister¹, Henrik Stewenius¹•Institutions (1)

University of Kentucky¹

17 Jun 2006

TL;DR: A recognition scheme that scales efficiently to a large number of objects and allows a larger and more discriminatory vocabulary to be used efficiently is presented, which it is shown experimentally leads to a dramatic improvement in retrieval quality.

...read moreread less

Abstract: A recognition scheme that scales efficiently to a large number of objects is presented. The efficiency and quality is exhibited in a live demonstration that recognizes CD-covers from a database of 40000 images of popular music CDs. The scheme builds upon popular techniques of indexing descriptors extracted from local regions, and is robust to background clutter and occlusion. The local region descriptors are hierarchically quantized in a vocabulary tree. The vocabulary tree allows a larger and more discriminatory vocabulary to be used efficiently, which we show experimentally leads to a dramatic improvement in retrieval quality. The most significant property of the scheme is that the tree directly defines the quantization. The quantization and the indexing are therefore fully integrated, essentially being one and the same. The recognition quality is evaluated through retrieval on a database with ground truth, showing the power of the vocabulary tree approach, going as high as 1 million images.

...read moreread less

4,024 citations