Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention

doi:10.1155/2020/8909458

Open AccessJournal ArticleDOI

Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention

Yan Chu, +4 more

- 20 Oct 2020 -

Wireless Communications and Mobile Compu...

- Vol. 2020, pp 1-7

Chats0

TLDR

The experimental results indicate that AICRL is effective in generating captions for the images and has been trained over a big dataset MS COCO 2014 to maximize the likelihood of the target description sentence given the training images and evaluated it in various metrics.

Abstract:

Captioning the images with proper descriptions automatically has become an interesting and challenging problem. In this paper, we present one joint model AICRL, which is able to conduct the automatic image captioning based on ResNet50 and LSTM with soft attention. AICRL consists of one encoder and one decoder. The encoder adopts ResNet50 based on the convolutional neural network, which creates an extensive representation of the given image by embedding it into a fixed length vector. The decoder is designed with LSTM, a recurrent neural network and a soft attention mechanism, to selectively focus the attention over certain parts of an image to predict the next sentence. We have trained AICRL over a big dataset MS COCO 2014 to maximize the likelihood of the target description sentence given the training images and evaluated it in various metrics like BLEU, METEROR, and CIDEr. Our experimental results indicate that AICRL is effective in generating captions for the images.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Bag of Features (BoF) Based Deep Learning Framework for Bleached Corals Detection

Sonain Jamil, +2 more

TL;DR: In this article, a bag of features (BoF) based approach was proposed to detect and localize the bleached corals before the safety measures are applied, which achieved 99.08% accuracy with a classification error of 0.92%.

...read moreread less

Journal ArticleDOI

A multiple-stage defect detection model by convolutional neural network

Kung-Jeng Wang, +2 more

- 01 Mar 2022 -

Computers & Industrial Engineering

TL;DR: Wang et al. as discussed by the authors proposed a four-stage defect detection model, which uses convolution neural networks (CNNs) to examine product images for defect identification, classification, and positioning to reduce error rates and offer informative quality messages.

...read moreread less

Journal ArticleDOI

Modeling of Hyperparameter Tuned Deep Learning Model for Automated Image Captioning

Mohamed-Saleh Omri, +4 more

- 18 Jan 2022 -

Mathematics

TL;DR: A novel hyperparameter tuned DL for automated image captioning (HPTDL-AIC) technique is proposed and the design of RMSProp and BSA for thehyperparameter tuning process of the Faster SqueezeNet and LSTM models for image Captioning shows the novelty of the work, which helps to accomplish enhanced image captioned performance.

...read moreread less

Journal ArticleDOI

Automatic Evaluation of the Lung Condition of COVID-19 Patients Using X-ray Images and Convolutional Neural Networks.

Ivan Lorencin, +9 more

- 04 Jan 2021 -

Journal of Personalized Medicine

TL;DR: In this article, the authors presented an examination of the possibility of classifying the clinical picture of a patient using X-ray images and convolutional neural networks and showed that the best classification performance can be achieved if ResNet152 is used.

...read moreread less

Proceedings ArticleDOI

Crime Scene Analysis Using Deep Learning

P. Mahesha, +4 more

TL;DR: In this paper, three deep learning models were proposed to use for generating sentences: Inceptionv3-LSTM network, VGG-16-LstM network and ResNet-50-LStM network.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

Kishore Papineni, +3 more

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.

...read moreread less

Posted Content

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

Kelvin Xu, +7 more

- 10 Feb 2015 -

arXiv: Learning

TL;DR: This paper proposed an attention-based model that automatically learns to describe the content of images by focusing on salient objects while generating corresponding words in the output sequence, which achieved state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.

...read moreread less

Proceedings ArticleDOI

Deep visual-semantic alignments for generating image descriptions

Andrej Karpathy, +1 more

TL;DR: A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.

...read moreread less

Posted Content

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

Jeff Donahue, +6 more

- 17 Nov 2014 -

arXiv: Computer Vision and Pattern Recog...

TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.

...read moreread less

Proceedings Article

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

Satanjeev Banerjee, +1 more

TL;DR: METEOR is described, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations and can be easily extended to include more advanced matching strategies.

...read moreread less