scispace - formally typeset
Search or ask a question
Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.


Papers
More filters
Proceedings ArticleDOI
19 Jul 2020
TL;DR: A novel learning-based evaluation metric, namely Unpaired Image Captioning Evaluation (UICE), which can be trained to distinguish between human-written and generated captions, and which can correctly judge the grammatical correctness of generatedCaptions and the semantic consistency between captions and corresponding images.
Abstract: Recently, instead of pursuing high performance on classical evaluation metrics, the research focus of image captioning has shifted to generating sentences which are more vivid and stylized than human-written ones. However, there are still no applicable metrics which can judge how close the generated captions are to the human-written ones. In this paper, we propose a novel learning-based evaluation metric, namely Unpaired Image Captioning Evaluation (UICE), which can be trained to distinguish between human-written and generated captions. Unlike existing metrics, our UICE consists of two parts: the semantic alignment module measuring the semantic distance between extracted image features and caption meanings, and the syntactic discriminating module syntactically judging how human-like the candidate caption is. The semantic alignment module is implemented by mapping the image features and the word embedding into a unified tensor space. And the syntactic discriminating module is designed to be learning-based, and thereby can be trained to be stylized by users’ own, fed with additional personalized corpus during the training process. Extensive experiments indicate that our metric can correctly judge the grammatical correctness of generated captions and the semantic consistency between captions and corresponding images.
Posted Content
TL;DR: This article proposed two models representing the target words across the periods to predict the changing words using threshold and voting schemes, which achieved competent results, ranking third in the DIACR-Ita shared task at EVALITA 2020.
Abstract: We present our systems and findings on unsupervised lexical semantic change for the Italian language in the DIACR-Ita shared-task at EVALITA 2020. The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets. We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes. Our first model solely relies on part-of-speech usage and an ensemble of distance measures. The second model uses word embedding representation to extract the neighbor's relative distances across spaces and propose "the average of absolute differences" to estimate lexical semantic change. Our models achieved competent results, ranking third in the DIACR-Ita competition. Furthermore, we experiment with the k_neighbor parameter of our second model to compare the impact of using "the average of absolute differences" versus the cosine distance used in Hamilton et al. (2016).
Book ChapterDOI
01 Jan 2020
TL;DR: This work proposes two models representing the target words across the periods to predict the changing words using threshold and voting schemes and proposes "the average of absolute differences" to estimate lexical semantic change.
Abstract: We present our systems and findings on unsupervised lexical semantic change for the Italian language in the DIACR-Ita shared-task at EVALITA 2020. The task is to determine whether a target word has evolved its meaning with time, only relying on raw-text from two time-specific datasets. We propose two models representing the target words across the periods to predict the changing words using threshold and voting schemes. Our first model solely relies on part-of-speech usage and an ensemble of distance measures. The second model uses word embedding representation to extract the neighbor's relative distances across spaces and propose "the average of absolute differences" to estimate lexical semantic change. Our models achieved competent results, ranking third in the DIACR-Ita competition. Furthermore, we experiment with the k_neighbor parameter of our second model to compare the impact of using "the average of absolute differences" versus the cosine distance used in Hamilton et al. (2016).
Proceedings ArticleDOI
01 Sep 2019
TL;DR: This paper proposes an effective text classification scheme by incorporating word weight into word embedding, and extensive experimental results verify that the accuracy of the proposedText classification scheme outperforms the state-of-the-art ones.
Abstract: As a fundamental task of natural language processing, text classification has been widely used in various applications such as sentiment analysis and spam detection. In recent years, the continuous-valued word embedding learned by neural network attaches extensive attentions. Although word embedding achieves impressive results in capturing similarities and regularities between words, it fails to highlight important words for identifying text category. Such deficiency could be attenuated by word weight, which conveys word contribution in text categorization. Toward this end, we propose an effective text classification scheme by incorporating word weight into word embedding in this paper. Specifically, in order to enrich word representation, the bidirectional gated recurrent units (Bi-GRU) is first employed to grasp context information of words. Then the word weights yielded by term frequency (TF) are used to modulate the word representation of Bi-GRU for constructing text representation. Extensive experimental results on several large text datasets verify that the accuracy of our proposed text classification scheme outperforms the state-of-the-art ones.
Book ChapterDOI
01 Jan 2021
TL;DR: In this article, a deep learning-based approach is introduced that automatically performs sentiment analysis for hotel reviews using word embedding and gated recurrent unit, which outperformed the performance of the traditional machine learning methods in sentiment classification of hotel reviews with 89% accuracy and 92% F-score.
Abstract: As everything is shifting online, the demand for sentiment analysis has expanded tremendously in the recent years. Sentiment analysis is an automated process of analyzing people’s opinions and feelings using natural language processing tools. Organizations in the field of tourism can benefit from sentiment analysis to accurately track their customers’ opinions. A deep learning-based approach is introduced in this paper that automatically performs sentiment analysis for hotel reviews using word embedding and gated recurrent unit. The performance of our deep learning-based model outperformed the performance of the traditional machine learning methods in sentiment classification of hotel reviews with 89% accuracy and 92% F-score.

Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
87% related
Unsupervised learning
22.7K papers, 1M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Reinforcement learning
46K papers, 1M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023317
2022716
2021736
20201,025
20191,078
2018788