scispace - formally typeset
Search or ask a question
Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.


Papers
More filters
Journal ArticleDOI
TL;DR: This work tunes the generated word vectors to their lemma forms using linear compositionality to generate lemma-based embedding and shows improvements over existing state-of-the-art methods for Arabic word embedding.

25 citations

Journal ArticleDOI
TL;DR: In this paper, a survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories: word-to-word based, structure-based, and vector-based.
Abstract: Objective/Methods: This study is to review the approaches used for measuring sentences similarity. Measuring similarity between natural language sentences is a crucial task for many Natural Language Processing applications such as text classification, information retrieval, question answering, and plagiarism detection. This survey classifies approaches of calculating sentences similarity based on the adopted methodology into three categories. Word-to-word based, structurebased, and vector-based are the most widely used approaches to find sentences similarity. Findings/Application: Each approach measures relatedness between short texts based on a specific perspective. In addition, datasets that are mostly used as benchmarks for evaluating techniques in this field are introduced to provide a complete view on this issue. The approaches that combine more than one perspective give better results. Moreover, structure based similarity that measures similarity between sentences’ structures needs more investigation. Keywords: Sentence Representation, Sentences Similarity, Structural Similarity, Word Embedding, Words Similarity

25 citations

Journal ArticleDOI
TL;DR: A novel NEG strategy that samples negatives based on the notion of Term Frequency-Inverse Document Frequency (NEG-TFIDF), which outperforms Mikolov's NEG on both word analogy and word similarity test tasks, particularly in terms of the performance of medium-frequency words.

25 citations

Proceedings ArticleDOI
12 Sep 2019
TL;DR: This work proposes a post-processing approach to retrofit the contextualized word embedding with paraphrases, which seeks to minimize the variance of word representations on paraphrased contexts and significantly improves ELMo on various sentence classification and inference tasks.
Abstract: Contextualized word embeddings, such as ELMo, provide meaningful representations for words and their contexts. They have been shown to have a great impact on downstream applications. However, we observe that the contextualized embeddings of a word might change drastically when its contexts are paraphrased. As these embeddings are over-sensitive to the context, the downstream model may make different predictions when the input sentence is paraphrased. To address this issue, we propose a post-processing approach to retrofit the embedding with paraphrases. Our method learns an orthogonal transformation on the input space of the contextualized word embedding model, which seeks to minimize the variance of word representations on paraphrased contexts. Experiments show that the proposed method significantly improves ELMo on various sentence classification and inference tasks.

25 citations

Proceedings ArticleDOI
09 Apr 2018
TL;DR: This paper analyses various unsupervised automatic keyphrase extraction methods based on graphs as well as the impact of word embedding to show that there are no differences when using word embeding and when not using it.
Abstract: This paper analyses various unsupervised automatic keyphrase extraction methods based on graphs as well as the impact of word embedding. Evaluation is made on three datasets. We show that there is no differences when using word embedding and when not using it.

25 citations


Network Information
Related Topics (5)
Recurrent neural network
29.2K papers, 890K citations
87% related
Unsupervised learning
22.7K papers, 1M citations
86% related
Deep learning
79.8K papers, 2.1M citations
85% related
Reinforcement learning
46K papers, 1M citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023317
2022716
2021736
20201,025
20191,078
2018788