scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Review on latest approaches used in Natural Language Processing for generation of Image Captioning

25 Jun 2017-International Journal on Computer Science and Engineering (Seventh Sense Research Group Journals)-Vol. 4, Iss: 6, pp 41-48
TL;DR: A survey about different recent approaches that have been used for image captioning with discussing the datasets been used is given.
Abstract: Recently the area of image captioning has received a lot of attention from researchers and academia. Image caption generation area has received attentions since the development of Deep Learning. Automatically generating caption from an image is done by integrating the domain of computer vision and natural language processing. Describing the content of an image is inherently a natural language processing and computer vision task. Many image captioning systems have shown that it is possible to describe the most salient information conveyed by images with accurate and meaningful sentences. This paper gives a survey about different recent approaches that have been used for image captioning with discussing the datasets been used. Index terms – Image captioning, Computer Vision, Natural language processing, Deep Learning.
Citations
More filters
Journal ArticleDOI
TL;DR: In this paper , a robust sarcasm detection using Artificial Rabbits Optimizer with Multilayer Convolutional Encoder-Decoder Neural Network (ARO-MCEDNN) technique on social media is presented.
Abstract: - Nowadays, posting sarcastic comments on media platforms developed a general trend. People to pester or taunt others frequently utilize sarcasm. It is regularly stated that with tonal stress, inflexion from the speech or in the procedure of hyperbolic, lexical, and pragmatic aspects occur from the textual data. Sarcasm Detection (SD) utilizing Deep Learning (DL) on media platforms is an active study field in Natural Language Processing (NLP). Sarcasm is a figurative language method frequently exploited on social networks like Reddit, Twitter, and Facebook. Detecting sarcasm is essential to various applications like Sentiment Analysis (SA), opinion mining, and social network monitoring. DL techniques are demonstrated that effectual at sarcasm detection on media platforms. This study presents a robust sarcasm detection using Artificial Rabbits Optimizer with Multilayer Convolutional Encoder-Decoder Neural Network (ARO-MCEDNN) technique on social media—the presented ARO-MCEDNN technique concentrations on detecting sarcasm in social networking sites. Primarily, the ARO-MCEDNN technique follows a series of pre-processing data levels for transforming the input data into a compatible format. Followed by, Glove approach is applied for word embedding purposes. Moreover, the MCEDNN model is applied as a classification model to identify and categorize distinct kinds of sarcasm. Furthermore, the ARO algorithm is chosen as a hyperparameter optimizer of the MCEDNN model, enhancing the sarcasm detection performance. To highlight the advanced performance of the ARO-MCEDNN system, a sequence of simulations was performed.
Journal ArticleDOI
TL;DR: In this paper , a Spotted Hyena Optimization with Deep Learning based Automatic Text Summarization (SHODL-ATS) model is proposed. But, the SHODLATS technique performs data preprocessing to convert the data into a convenient form, and the SHO technique is enforced for parameter tuning of the ABiGRU approach.
Abstract: - Automatic text summarization is an active investigation region determined as removing snippets or introductory sentences of a massive document and relating them as a short form of documents. Text Summarization can be either cost-efficient or time-efficient. An abstractive or extractive summary was studied with distinct algorithms comprising deep learning (DL), graph, and statistical-based techniques. DL has attained promising shows in comparison to the typical methods. With the development of various neural structures like the attention mechanism (usually called a transformer), there is a potential growth area for summarization tasks. Hence, this research presents a Spotted Hyena Optimization with Deep Learning based Automatic Text Summarization (SHODL-ATS) model. The SHODL-ATS technique's principal objective lies in the documents' automated summarization. To accomplish this, the presented SHODL-ATS technique performs data preprocessing to convert the data into a convenient form. The SHODL-ATS technique uses an Attention-based Bidirectional Gated Recurrent Unit (ABiGRU) model for summarizing the text documents. Finally, the SHO technique is enforced for the parameter tuning of the ABiGRU approach. To examine the achievement of the SHODL-ATS model, we validate the outcomes on benchmark datasets. The results indicate the promising achievement of the SHODL-ATS method over other existing techniques.
Journal ArticleDOI
TL;DR: In this paper , a deep learning-based aspect-based sentiment analysis (ABSA) model is proposed to identify the sentiments in the direction of particular aspects or features of a product, service, or experience.
Abstract: - Aspect-based Sentiment Analysis (ABSA) is a subdomain of Sentiment Analysis (SA) that focuses on detecting the sentiment toward features of a product or particular aspects, experience, or service. ABSA targets to go beyond simple sentiment classification of a sentence or document and present a more granular study of sentiment towards different aspects. ABSA has several real-time applications, which include social media monitoring, customer feedback analysis, and product reviews. Many difficulties exist in ABSA, including dealing with language variability and complexity, sentiment subjectivity, and managing multiple aspects in a single sentence. Recently, Deep Learning (DL) methods continued to be an active area of research and proved a promising model in ABSA. This study focuses on designing and developing ABSA models using DL concepts. The presented ABSA model aims to identify the sentiments in the direction of particular aspects or features of a product, service, or experience. The presented approach initially accomplishes diverse phases of data pre-processing to convert the input data meaningfully. In addition, the word2vec model is applied as a feature extraction approach. For sentiment analysis, three DL models are employed, namely Hopfield Network (HN), Convolutional Neural Network (CNN), and Bidirectional Long Short Term Memory (BiLSTM) approaches. The experimental validation of the DL models occurs utilizing a benchmark dataset. The simulation values highlighted that the CNN model exhibits improved sentiment classification results over other DL models.
References
More filters
Journal ArticleDOI
TL;DR: A novel, efficient, gradient based method called long short-term memory (LSTM) is introduced, which can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units.
Abstract: Learning to store information over extended time intervals by recurrent backpropagation takes a very long time, mostly because of insufficient, decaying error backflow. We briefly review Hochreiter's (1991) analysis of this problem, then address it by introducing a novel, efficient, gradient based method called long short-term memory (LSTM). Truncating the gradient where this does not do harm, LSTM can learn to bridge minimal time lags in excess of 1000 discrete-time steps by enforcing constant error flow through constant error carousels within special units. Multiplicative gate units learn to open and close access to the constant error flow. LSTM is local in space and time; its computational complexity per time step and weight is O. 1. Our experiments with artificial data involve local, distributed, real-valued, and noisy pattern representations. In comparisons with real-time recurrent learning, back propagation through time, recurrent cascade correlation, Elman nets, and neural sequence chunking, LSTM leads to many more successful runs, and learns much faster. LSTM also solves complex, artificial long-time-lag tasks that have never been solved by previous recurrent network algorithms.

72,897 citations


"Review on latest approaches used in..." refers background in this paper

  • ...org Page 44 C) Long Short-Term Memory (LSTM) Numerous researchers now use a deep learning RNN called the long short-term memory (LSTM) network [1]....

    [...]

Book ChapterDOI
06 Sep 2014
TL;DR: A new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding by gathering images of complex everyday scenes containing common objects in their natural context.
Abstract: We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.

30,462 citations


"Review on latest approaches used in..." refers methods in this paper

  • ...The image captioning datasets used by the authors for experiments are the Flickr8K [8], Flickr30K and MSCOCO [24]....

    [...]

  • ...Images were taken from the intersection of ISSN: 2348 – 8387 www.internationaljournalssrg.org Page 46 MS COCO [24] and YFCC100M [23], and annotations were collected on Amazon‟s Mechanical Turk [32]....

    [...]

  • ...org Page 46 MS COCO [24] and YFCC100M [23], and annotations were collected on Amazon‟s Mechanical Turk [32]....

    [...]

  • ...The image captioning datasets used by the authors for experiments are the Flickr8K [8], Flickr30K and MSCOCO [24]....

    [...]

Proceedings Article
08 Dec 2014
TL;DR: The authors used a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector.
Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT-14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous state of the art. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

12,299 citations

Posted Content
TL;DR: This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.
Abstract: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

11,936 citations


"Review on latest approaches used in..." refers methods in this paper

  • ...LSTMs have been used to achieve state-of-theart performance in several tasks such as handwriting recognition, sequence generation speech recognition and machine translation [7] among others....

    [...]

Proceedings ArticleDOI
07 Jun 2015
TL;DR: In this paper, a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation is proposed to generate natural sentences describing an image, which can be used to automatically describe the content of an image.
Abstract: Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU-1 score (the higher the better) on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU-1 score improvements on Flickr30k, from 56 to 66, and on SBU, from 19 to 28. Lastly, on the newly released COCO dataset, we achieve a BLEU-4 of 27.7, which is the current state-of-the-art.

5,095 citations