scispace - formally typeset
Open AccessJournal ArticleDOI

Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention

Reads0
Chats0
TLDR
The experimental results indicate that AICRL is effective in generating captions for the images and has been trained over a big dataset MS COCO 2014 to maximize the likelihood of the target description sentence given the training images and evaluated it in various metrics.
Abstract
Captioning the images with proper descriptions automatically has become an interesting and challenging problem. In this paper, we present one joint model AICRL, which is able to conduct the automatic image captioning based on ResNet50 and LSTM with soft attention. AICRL consists of one encoder and one decoder. The encoder adopts ResNet50 based on the convolutional neural network, which creates an extensive representation of the given image by embedding it into a fixed length vector. The decoder is designed with LSTM, a recurrent neural network and a soft attention mechanism, to selectively focus the attention over certain parts of an image to predict the next sentence. We have trained AICRL over a big dataset MS COCO 2014 to maximize the likelihood of the target description sentence given the training images and evaluated it in various metrics like BLEU, METEROR, and CIDEr. Our experimental results indicate that AICRL is effective in generating captions for the images.

read more

Content maybe subject to copyright    Report

Citations
More filters
Journal ArticleDOI

Bag of Features (BoF) Based Deep Learning Framework for Bleached Corals Detection

TL;DR: In this article, a bag of features (BoF) based approach was proposed to detect and localize the bleached corals before the safety measures are applied, which achieved 99.08% accuracy with a classification error of 0.92%.
Journal ArticleDOI

A multiple-stage defect detection model by convolutional neural network

TL;DR: Wang et al. as discussed by the authors proposed a four-stage defect detection model, which uses convolution neural networks (CNNs) to examine product images for defect identification, classification, and positioning to reduce error rates and offer informative quality messages.
Journal ArticleDOI

Modeling of Hyperparameter Tuned Deep Learning Model for Automated Image Captioning

TL;DR: A novel hyperparameter tuned DL for automated image captioning (HPTDL-AIC) technique is proposed and the design of RMSProp and BSA for thehyperparameter tuning process of the Faster SqueezeNet and LSTM models for image Captioning shows the novelty of the work, which helps to accomplish enhanced image captioned performance.
Journal ArticleDOI

Automatic Evaluation of the Lung Condition of COVID-19 Patients Using X-ray Images and Convolutional Neural Networks.

TL;DR: In this article, the authors presented an examination of the possibility of classifying the clinical picture of a patient using X-ray images and convolutional neural networks and showed that the best classification performance can be achieved if ResNet152 is used.
Proceedings ArticleDOI

Crime Scene Analysis Using Deep Learning

TL;DR: In this paper, three deep learning models were proposed to use for generating sentences: Inceptionv3-LSTM network, VGG-16-LstM network and ResNet-50-LStM network.
References
More filters
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Posted Content

Show, Attend and Tell: Neural Image Caption Generation with Visual Attention

TL;DR: This paper proposed an attention-based model that automatically learns to describe the content of images by focusing on salient objects while generating corresponding words in the output sequence, which achieved state-of-the-art performance on three benchmark datasets: Flickr8k, Flickr30k and MS COCO.
Proceedings ArticleDOI

Deep visual-semantic alignments for generating image descriptions

TL;DR: A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.
Posted Content

Long-term Recurrent Convolutional Networks for Visual Recognition and Description

TL;DR: A novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and shows such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Proceedings Article

METEOR: An Automatic Metric for MT Evaluation with Improved Correlation with Human Judgments

TL;DR: METEOR is described, an automatic metric for machine translation evaluation that is based on a generalized concept of unigram matching between the machineproduced translation and human-produced reference translations and can be easily extended to include more advanced matching strategies.
Related Papers (5)