Automatic Image Annotation using Deep Learning Representations

doi:10.1145/2671188.2749391

Home
/
Papers
/
Automatic Image Annotation using Deep Learning Representations

Proceedings Article•DOI•

Automatic Image Annotation using Deep Learning Representations

Venkatesh N. Murthy¹, Subhransu Maji¹, R. Manmatha¹•Institutions (1)

University of Massachusetts Amherst¹

22 Jun 2015-pp 603-606

TL;DR: It is demonstrated that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image and the CCA model is compared to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.

read less

Abstract: We propose simple and effective models for the image annotation that make use of Convolutional Neural Network (CNN) features extracted from an image and word embedding vectors to represent their associated tags. Our first set of models is based on the Canonical Correlation Analysis (CCA) framework that helps in modeling both views - visual features (CNN feature) and textual features (word embedding vectors) of the data. Results on all three variants of the CCA models, namely linear CCA, kernel CCA and CCA with k-nearest neighbor (CCA-KNN) clustering, are reported. The best results are obtained using CCA-KNN which outperforms previous results on the Corel-5k and the ESP-Game datasets and achieves comparable results on the IAPRTC-12 dataset. In our experiments we evaluate CNN features in the existing models which bring out the advantages of it over dozens of handcrafted features. We also demonstrate that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image. In addition we compare the CCA model to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.

...read moreread less

Citations

PDF

Open Access

More filters

Journal Article•DOI•

Deep Learning Microscopy

[...]

Yair Rivenson¹, Zoltán Göröcs¹, Harun Gunaydin¹, Yibo Zhang¹, Hongda Wang¹, Aydogan Ozcan¹ - Show less +2 more•Institutions (1)

University of California, Los Angeles¹

12 May 2017-arXiv: Learning

TL;DR: It is demonstrated that a deep neural network can significantly improve optical microscopy, enhancing its spatial resolution over a large field-of-view and depth of field, and can be used to design computational imagers that get better and better as they continue to image specimen and establish new transformations among different modes of imaging.

...read moreread less

Abstract: We demonstrate that a deep neural network can significantly improve optical microscopy, enhancing its spatial resolution over a large field-of-view and depth-of-field. After its training, the only input to this network is an image acquired using a regular optical microscope, without any changes to its design. We blindly tested this deep learning approach using various tissue samples that are imaged with low-resolution and wide-field systems, where the network rapidly outputs an image with remarkably better resolution, matching the performance of higher numerical aperture lenses, also significantly surpassing their limited field-of-view and depth-of-field. These results are transformative for various fields that use microscopy tools, including e.g., life sciences, where optical microscopy is considered as one of the most widely used and deployed techniques. Beyond such applications, our presented approach is broadly applicable to other imaging modalities, also spanning different parts of the electromagnetic spectrum, and can be used to design computational imagers that get better and better as they continue to image specimen and establish new transformations among different modes of imaging.

...read moreread less

428 citations

Journal Article•DOI•

Deep learning microscopy

[...]

Yair Rivenson¹, Zoltán Göröcs¹, Harun Gunaydin¹, Yibo Zhang¹, Hongda Wang¹, Aydogan Ozcan¹ - Show less +2 more•Institutions (1)

University of California, Los Angeles¹

20 Nov 2017

TL;DR: In this paper, a deep neural network was used to improve optical microscopy, enhancing its spatial resolution over a large field of view and depth of field. But, the only input to this network is an image acquired using a regular optical microscope, without any changes to its design.

...read moreread less

Abstract: We demonstrate that a deep neural network can significantly improve optical microscopy, enhancing its spatial resolution over a large field of view and depth of field. After its training, the only input to this network is an image acquired using a regular optical microscope, without any changes to its design. We blindly tested this deep learning approach using various tissue samples that are imaged with low-resolution and wide-field systems, where the network rapidly outputs an image with better resolution, matching the performance of higher numerical aperture lenses and also significantly surpassing their limited field of view and depth of field. These results are significant for various fields that use microscopy tools, including, e.g., life sciences, where optical microscopy is considered as one of the most widely used and deployed techniques. Beyond such applications, the presented approach might be applicable to other imaging modalities, also spanning different parts of the electromagnetic spectrum, and can be used to design computational imagers that get better as they continue to image specimens and establish new transformations among different modes of imaging.

...read moreread less

377 citations

Journal Article•DOI•

A trans-disciplinary review of deep learning research for water resources scientists.

[...]

Chaopeng Shen

06 Dec 2017-arXiv: Machine Learning

TL;DR: In this paper, the authors provide water resources scientists and hydrologists with a simple technical overview, trans-disciplinary progress update, and a source of inspiration about the relevance of deep learning to water.

...read moreread less

Abstract: Deep learning (DL), a new-generation of artificial neural network research, has transformed industries, daily lives and various scientific disciplines in recent years. DL represents significant progress in the ability of neural networks to automatically engineer problem-relevant features and capture highly complex data distributions. I argue that DL can help address several major new and old challenges facing research in water sciences such as inter-disciplinarity, data discoverability, hydrologic scaling, equifinality, and needs for parameter regionalization. This review paper is intended to provide water resources scientists and hydrologists in particular with a simple technical overview, trans-disciplinary progress update, and a source of inspiration about the relevance of DL to water. The review reveals that various physical and geoscientific disciplines have utilized DL to address data challenges, improve efficiency, and gain scientific insights. DL is especially suited for information extraction from image-like data and sequential data. Techniques and experiences presented in other disciplines are of high relevance to water research. Meanwhile, less noticed is that DL may also serve as a scientific exploratory tool. A new area termed 'AI neuroscience,' where scientists interpret the decision process of deep networks and derive insights, has been born. This budding sub-discipline has demonstrated methods including correlation-based analysis, inversion of network-extracted features, reduced-order approximations by interpretable models, and attribution of network decisions to inputs. Moreover, DL can also use data to condition neurons that mimic problem-specific fundamental organizing units, thus revealing emergent behaviors of these units. Vast opportunities exist for DL to propel advances in water sciences.

...read moreread less

260 citations

Journal Article•DOI•

Deep Learning Enhanced Mobile-Phone Microscopy

[...]

Yair Rivenson¹, Hatice Ceylan Koydemir¹, Hongda Wang¹, Zhensong Wei¹, Zhengshuang Ren¹, Harun Gunaydin¹, Yibo Zhang¹, Zoltán Göröcs¹, Kyle Liang¹, Derek Tseng¹, Aydogan Ozcan - Show less +7 more•Institutions (1)

University of California, Los Angeles¹

15 Mar 2018-ACS Photonics

TL;DR: The use of deep learning is reported on to correct distortions introduced by mobile-phone-based microscopes, facilitating the production of high-resolution, denoised and colour-corrected images, matching the performance of benchtop microscopes with high-end objective lenses, also extending their limited depth-of-field.

...read moreread less

Abstract: Mobile phones have facilitated the creation of field-portable, cost-effective imaging and sensing technologies that approach laboratory-grade instrument performance. However, the optical imaging interfaces of mobile phones are not designed for microscopy and produce distortions in imaging microscopic specimens. Here, we report on the use of deep learning to correct such distortions introduced by mobile-phone-based microscopes, facilitating the production of high-resolution, denoised, and color-corrected images, matching the performance of benchtop microscopes with high-end objective lenses, also extending their limited depth of field. After training a convolutional neural network, we successfully imaged various samples, including human tissue sections and Papanicolaou and blood smears, where the recorded images were highly compressed to ease storage and transmission. This method is applicable to other low-cost, aberrated imaging systems and could offer alternatives for costly and bulky microscopes, while ...

...read moreread less

152 citations

Journal Article•DOI•

Lensless digital holographic microscopy and its applications in biomedicine and environmental monitoring

[...]

Yichen Wu¹, Aydogan Ozcan•Institutions (1)

University of California, Los Angeles¹

01 Mar 2018-Methods

TL;DR: The operation principles and the methods behind lensless digital holographic on-chip microscopy are discussed, including some recent work on air quality monitoring, which utilized machine learning for high-throughput and accurate quantification of particulate matter in air.

...read moreread less

149 citations

Cites background from "Automatic Image Annotation using De..."

...In general, CNNs and deeplearning currently form one of the fastest growing areas of computer science, and have been widely applied to various tasks such as image labeling [86], style-transfer [87] and even playing games against professional human players [88]....
[...]

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

Collapse

References

PDF

Open Access

More filters

Proceedings Article•

ImageNet Classification with Deep Convolutional Neural Networks

[...]

Alex Krizhevsky¹, Ilya Sutskever¹, Geoffrey E. Hinton¹•Institutions (1)

University of Toronto¹

03 Dec 2012

TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.

...read moreread less

Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

...read moreread less

73,978 citations

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

04 Sep 2014

TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.

...read moreread less

Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

...read moreread less

55,235 citations

"Automatic Image Annotation using De..." refers methods in this paper

...Here CNN features are extracted for images using a pretrained VGG-16 [26] network, and the word embedding vector for a tag is extracted using a pre-trained Skip-gram architecture (Word2Vec) [19]; both these networks are publicly available....
[...]
...Features extracted from Caffe-Net provided by Caffe [7] (similar to AlexNet [9]) did not work as well as VGG-16, hence we used VGG-16 features for all our experiments....
[...]
...Here CNN features are extracted for images using a pre-trained VGG-16 [14] network, and the word embedding vector for a tag is extracted using a pre-trained skip-gram architecture (word2vec of [11]); both these networks are publicly available....
[...]
...Inspired by the success of deep CNN architectures [16, 26, 6] on the large scale image classification task [25] we intend to make use of this powerful architecture to solve the task of automatic image annotation....
[...]
...We explored both VGG-16 and VGG-19 layered architecture features....
[...]

Proceedings Article•

Very Deep Convolutional Networks for Large-Scale Image Recognition

[...]

Karen Simonyan¹, Andrew Zisserman¹•Institutions (1)

University of Oxford¹

01 Jan 2015

TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.

...read moreread less

49,914 citations

Posted Content•

Efficient Estimation of Word Representations in Vector Space

[...]

Tomas Mikolov¹, Kai Chen², Greg S. Corrado³, Jeffrey Dean³•Institutions (3)

Brno University of Technology¹, Beijing University of Posts and Telecommunications², Google³

16 Jan 2013-arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

...read moreread less

20,077 citations

Posted Content•

Rich feature hierarchies for accurate object detection and semantic segmentation

[...]

Ross Girshick¹, Jeff Donahue¹, Trevor Darrell¹, Jitendra Malik¹•Institutions (1)

University of California, Berkeley¹

11 Nov 2013-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.

...read moreread less

Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at this http URL.

...read moreread less

13,081 citations

"Automatic Image Annotation using De..." refers methods in this paper

...CNN features are shown to be successful for most of the vision tasks producing significantly improved results on the most challenging datasets like PASCAL VOC and ILSVRC2013 [6, 24]....
[...]
...Inspired by the success of deep CNN architectures [16, 26, 6] on the large scale image classification task [25] we intend to make use of this powerful architecture to solve the task of automatic image annotation....
[...]