scispace - formally typeset
Search or ask a question
Proceedings ArticleDOI

Automatic Image Annotation using Deep Learning Representations

TL;DR: It is demonstrated that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image and the CCA model is compared to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.
Abstract: We propose simple and effective models for the image annotation that make use of Convolutional Neural Network (CNN) features extracted from an image and word embedding vectors to represent their associated tags. Our first set of models is based on the Canonical Correlation Analysis (CCA) framework that helps in modeling both views - visual features (CNN feature) and textual features (word embedding vectors) of the data. Results on all three variants of the CCA models, namely linear CCA, kernel CCA and CCA with k-nearest neighbor (CCA-KNN) clustering, are reported. The best results are obtained using CCA-KNN which outperforms previous results on the Corel-5k and the ESP-Game datasets and achieves comparable results on the IAPRTC-12 dataset. In our experiments we evaluate CNN features in the existing models which bring out the advantages of it over dozens of handcrafted features. We also demonstrate that word embedding vectors perform better than binary vectors as a representation of the tags associated with an image. In addition we compare the CCA model to a simple CNN based linear regression model, which allows the CNN layers to be trained using back-propagation.
Citations
More filters
Journal ArticleDOI
TL;DR: It is demonstrated that a deep neural network can significantly improve optical microscopy, enhancing its spatial resolution over a large field-of-view and depth of field, and can be used to design computational imagers that get better and better as they continue to image specimen and establish new transformations among different modes of imaging.
Abstract: We demonstrate that a deep neural network can significantly improve optical microscopy, enhancing its spatial resolution over a large field-of-view and depth-of-field. After its training, the only input to this network is an image acquired using a regular optical microscope, without any changes to its design. We blindly tested this deep learning approach using various tissue samples that are imaged with low-resolution and wide-field systems, where the network rapidly outputs an image with remarkably better resolution, matching the performance of higher numerical aperture lenses, also significantly surpassing their limited field-of-view and depth-of-field. These results are transformative for various fields that use microscopy tools, including e.g., life sciences, where optical microscopy is considered as one of the most widely used and deployed techniques. Beyond such applications, our presented approach is broadly applicable to other imaging modalities, also spanning different parts of the electromagnetic spectrum, and can be used to design computational imagers that get better and better as they continue to image specimen and establish new transformations among different modes of imaging.

428 citations

Journal ArticleDOI
20 Nov 2017
TL;DR: In this paper, a deep neural network was used to improve optical microscopy, enhancing its spatial resolution over a large field of view and depth of field. But, the only input to this network is an image acquired using a regular optical microscope, without any changes to its design.
Abstract: We demonstrate that a deep neural network can significantly improve optical microscopy, enhancing its spatial resolution over a large field of view and depth of field. After its training, the only input to this network is an image acquired using a regular optical microscope, without any changes to its design. We blindly tested this deep learning approach using various tissue samples that are imaged with low-resolution and wide-field systems, where the network rapidly outputs an image with better resolution, matching the performance of higher numerical aperture lenses and also significantly surpassing their limited field of view and depth of field. These results are significant for various fields that use microscopy tools, including, e.g., life sciences, where optical microscopy is considered as one of the most widely used and deployed techniques. Beyond such applications, the presented approach might be applicable to other imaging modalities, also spanning different parts of the electromagnetic spectrum, and can be used to design computational imagers that get better as they continue to image specimens and establish new transformations among different modes of imaging.

377 citations

Journal ArticleDOI
TL;DR: In this paper, the authors provide water resources scientists and hydrologists with a simple technical overview, trans-disciplinary progress update, and a source of inspiration about the relevance of deep learning to water.
Abstract: Deep learning (DL), a new-generation of artificial neural network research, has transformed industries, daily lives and various scientific disciplines in recent years. DL represents significant progress in the ability of neural networks to automatically engineer problem-relevant features and capture highly complex data distributions. I argue that DL can help address several major new and old challenges facing research in water sciences such as inter-disciplinarity, data discoverability, hydrologic scaling, equifinality, and needs for parameter regionalization. This review paper is intended to provide water resources scientists and hydrologists in particular with a simple technical overview, trans-disciplinary progress update, and a source of inspiration about the relevance of DL to water. The review reveals that various physical and geoscientific disciplines have utilized DL to address data challenges, improve efficiency, and gain scientific insights. DL is especially suited for information extraction from image-like data and sequential data. Techniques and experiences presented in other disciplines are of high relevance to water research. Meanwhile, less noticed is that DL may also serve as a scientific exploratory tool. A new area termed 'AI neuroscience,' where scientists interpret the decision process of deep networks and derive insights, has been born. This budding sub-discipline has demonstrated methods including correlation-based analysis, inversion of network-extracted features, reduced-order approximations by interpretable models, and attribution of network decisions to inputs. Moreover, DL can also use data to condition neurons that mimic problem-specific fundamental organizing units, thus revealing emergent behaviors of these units. Vast opportunities exist for DL to propel advances in water sciences.

260 citations

Journal ArticleDOI
TL;DR: The use of deep learning is reported on to correct distortions introduced by mobile-phone-based microscopes, facilitating the production of high-resolution, denoised and colour-corrected images, matching the performance of benchtop microscopes with high-end objective lenses, also extending their limited depth-of-field.
Abstract: Mobile phones have facilitated the creation of field-portable, cost-effective imaging and sensing technologies that approach laboratory-grade instrument performance. However, the optical imaging interfaces of mobile phones are not designed for microscopy and produce distortions in imaging microscopic specimens. Here, we report on the use of deep learning to correct such distortions introduced by mobile-phone-based microscopes, facilitating the production of high-resolution, denoised, and color-corrected images, matching the performance of benchtop microscopes with high-end objective lenses, also extending their limited depth of field. After training a convolutional neural network, we successfully imaged various samples, including human tissue sections and Papanicolaou and blood smears, where the recorded images were highly compressed to ease storage and transmission. This method is applicable to other low-cost, aberrated imaging systems and could offer alternatives for costly and bulky microscopes, while ...

152 citations

Journal ArticleDOI
01 Mar 2018-Methods
TL;DR: The operation principles and the methods behind lensless digital holographic on-chip microscopy are discussed, including some recent work on air quality monitoring, which utilized machine learning for high-throughput and accurate quantification of particulate matter in air.

149 citations


Cites background from "Automatic Image Annotation using De..."

  • ...In general, CNNs and deeplearning currently form one of the fastest growing areas of computer science, and have been widely applied to various tasks such as image labeling [86], style-transfer [87] and even playing games against professional human players [88]....

    [...]

References
More filters
Proceedings Article
03 Dec 2012
TL;DR: The state-of-the-art performance of CNNs was achieved by Deep Convolutional Neural Networks (DCNNs) as discussed by the authors, which consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax.
Abstract: We trained a large, deep convolutional neural network to classify the 1.2 million high-resolution images in the ImageNet LSVRC-2010 contest into the 1000 different classes. On the test data, we achieved top-1 and top-5 error rates of 37.5% and 17.0% which is considerably better than the previous state-of-the-art. The neural network, which has 60 million parameters and 650,000 neurons, consists of five convolutional layers, some of which are followed by max-pooling layers, and three fully-connected layers with a final 1000-way softmax. To make training faster, we used non-saturating neurons and a very efficient GPU implementation of the convolution operation. To reduce overriding in the fully-connected layers we employed a recently-developed regularization method called "dropout" that proved to be very effective. We also entered a variant of this model in the ILSVRC-2012 competition and achieved a winning top-5 test error rate of 15.3%, compared to 26.2% achieved by the second-best entry.

73,978 citations

Proceedings Article
04 Sep 2014
TL;DR: This work investigates the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting using an architecture with very small convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

55,235 citations


"Automatic Image Annotation using De..." refers methods in this paper

  • ...Here CNN features are extracted for images using a pretrained VGG-16 [26] network, and the word embedding vector for a tag is extracted using a pre-trained Skip-gram architecture (Word2Vec) [19]; both these networks are publicly available....

    [...]

  • ...Features extracted from Caffe-Net provided by Caffe [7] (similar to AlexNet [9]) did not work as well as VGG-16, hence we used VGG-16 features for all our experiments....

    [...]

  • ...Here CNN features are extracted for images using a pre-trained VGG-16 [14] network, and the word embedding vector for a tag is extracted using a pre-trained skip-gram architecture (word2vec of [11]); both these networks are publicly available....

    [...]

  • ...Inspired by the success of deep CNN architectures [16, 26, 6] on the large scale image classification task [25] we intend to make use of this powerful architecture to solve the task of automatic image annotation....

    [...]

  • ...We explored both VGG-16 and VGG-19 layered architecture features....

    [...]

Proceedings Article
01 Jan 2015
TL;DR: In this paper, the authors investigated the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting and showed that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 layers.
Abstract: In this work we investigate the effect of the convolutional network depth on its accuracy in the large-scale image recognition setting. Our main contribution is a thorough evaluation of networks of increasing depth using an architecture with very small (3x3) convolution filters, which shows that a significant improvement on the prior-art configurations can be achieved by pushing the depth to 16-19 weight layers. These findings were the basis of our ImageNet Challenge 2014 submission, where our team secured the first and the second places in the localisation and classification tracks respectively. We also show that our representations generalise well to other datasets, where they achieve state-of-the-art results. We have made our two best-performing ConvNet models publicly available to facilitate further research on the use of deep visual representations in computer vision.

49,914 citations

Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations

Posted Content
TL;DR: This paper proposes a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012 -- achieving a mAP of 53.3%.
Abstract: Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at this http URL.

13,081 citations


"Automatic Image Annotation using De..." refers methods in this paper

  • ...CNN features are shown to be successful for most of the vision tasks producing significantly improved results on the most challenging datasets like PASCAL VOC and ILSVRC2013 [6, 24]....

    [...]

  • ...Inspired by the success of deep CNN architectures [16, 26, 6] on the large scale image classification task [25] we intend to make use of this powerful architecture to solve the task of automatic image annotation....

    [...]