Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

Two improved continuous bag-of-word models

[...]

Qi Wang¹, Jungang Xu¹, Hong Chen¹, Ben He¹•Institutions (1)

Chinese Academy of Sciences¹

14 May 2017

TL;DR: Two kinds of schemes for improving the Continuous Bag-of-Words (CBOW) model are proposed, where the relative positions of adjacent words are taken as weights for the input layer of the model and the context is considered.

...read moreread less

Abstract: Data representation is a fundamental task in machine learning, which affects the performance of the whole machine learning system. In the past few years, with the rapid development of deep learning, the models for word embedding based on neural networks have brought new inspiration to the research of natural language processing. In this paper, two kinds of schemes for improving the Continuous Bag-of-Words (CBOW) model are proposed. On one hand, the relative positions of adjacent words are taken as weights for the input layer of the model; on the other hand, the context is considered, and which can take part in the training course when the prediction of next target word is to be made. Experimental results show that our proposed models outperform the classical CBOW model.

...read moreread less

17 citations

Proceedings Article•

On the Robustness of Unsupervised and Semi-supervised Cross-lingual Word Embedding Learning.

[...]

Yerai Doval¹, Jose Camacho-Collados², Luis Espinosa Anke², Steven Schockaert²•Institutions (2)

University of Vigo¹, Cardiff University²

01 May 2020

TL;DR: An extensive evaluation over multiple cross-lingual embedding models, analyzing their strengths and limitations with respect to different variables such as target language, training corpora and amount of supervision puts in doubt the view that high-quality cross-lingsual embeddings can always be learned without much supervision.

...read moreread less

Abstract: Cross-lingual word embeddings are vector representations of words in different languages where words with similar meaning are represented by similar vectors, regardless of the language. Recent developments which construct these embeddings by aligning monolingual spaces have shown that accurate alignments can be obtained with little or no supervision, which usually comes in the form of bilingual dictionaries. However, the focus has been on a particular controlled scenario for evaluation, and there is no strong evidence on how current state-of-the-art systems would fare with noisy text or for language pairs with major linguistic differences. In this paper we present an extensive evaluation over multiple cross-lingual embedding models, analyzing their strengths and limitations with respect to different variables such as target language, training corpora and amount of supervision. Our conclusions put in doubt the view that high-quality cross-lingual embeddings can always be learned without much supervision.

...read moreread less

17 citations

Book Chapter•DOI•

Deep Neural Network Models for Paraphrased Text Classification in the Arabic Language

[...]

Adnen Mahmoud¹, Adnen Mahmoud², Mounir Zrigui²•Institutions (2)

University of Sousse¹, University of Monastir²

26 Jun 2019

TL;DR: This work considers the problem of Arabic paraphrase detection and presents different deep neural networks like Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) to study the effective of each in extracting the proper features of sentences without the knowledge of semantic and syntactic structure of Arabic language.

...read moreread less

Abstract: Paraphrase is the act of reusing original texts without proper citation of the source. Different obfuscation operations can be employed such as addition/deletion of words, synonym substitutions, lexical changes, active to passive switching, etc. This phenomenon dramatically increased because of the progressive advancement of the web and the automatic text editing tools. Recently, deep leaning methods have gained competitive results than traditional methods for Natural Language Processing (NLP). In this context, we consider the problem of Arabic paraphrase detection. We present different deep neural networks like Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM). Our aim is to study the effective of each one in extracting the proper features of sentences without the knowledge of semantic and syntactic structure of Arabic language. For the experiments, we propose an automatic corpus construction seeing the lack of Arabic resources publicly available. Evaluations reveal that LSTM model achieved the higher rate of semantic similarity and outperformed significantly other state-of-the-art methods.

...read moreread less

17 citations

Journal Article•DOI•

Exploiting Sentence Embedding for Medical Question Answering

[...]

Yu Hao¹, Xien Liu¹, Ji Wu¹, Ping Lv¹•Institutions (1)

Tsinghua University¹

17 Jul 2019

TL;DR: In this paper, a supervised learning framework is proposed to exploit sentence embedding for the medical question answering task, which consists of two main parts: a sentence embeddings producing module and a scoring module.

...read moreread less

Abstract: Despite the great success of word embedding, sentence embedding remains a not-well-solved problem. In this paper, we present a supervised learning framework to exploit sentence embedding for the medical question answering task. The learning framework consists of two main parts: 1) a sentence embedding producing module, and 2) a scoring module. The former is developed with contextual self-attention and multi-scale techniques to encode a sentence into an embedding tensor. This module is shortly called Contextual self-Attention Multi-scale Sentence Embedding (CAMSE). The latter employs two scoring strategies: Semantic Matching Scoring (SMS) and Semantic Association Scoring (SAS). SMS measures similarity while SAS captures association between sentence pairs: a medical question concatenated with a candidate choice, and a piece of corresponding supportive evidence. The proposed framework is examined by two Medical Question Answering(MedicalQA) datasets which are collected from real-world applications: medical exam and clinical diagnosis based on electronic medical records (EMR). The comparison results show that our proposed framework achieved significant improvements compared to competitive baseline approaches. Additionally, a series of controlled experiments are also conducted to illustrate that the multi-scale strategy and the contextual self-attention layer play important roles for producing effective sentence embedding, and the two kinds of scoring strategies are highly complementary to each other for question answering problems.

...read moreread less

17 citations

Posted Content•

Equation Embeddings.

[...]

Kriste Krstovski, David M. Blei

24 Mar 2018-arXiv: Machine Learning

TL;DR: This work used equation embeddings to analyze four collections of scientific articles from the arXiv, covering four computer science domains (NLP, IR, AI, and ML) and $\sim$98.5k equations.

...read moreread less

Abstract: We present an unsupervised approach for discovering semantic representations of mathematical equations. Equations are challenging to analyze because each is unique, or nearly unique. Our method, which we call equation embeddings, finds good representations of equations by using the representations of their surrounding words. We used equation embeddings to analyze four collections of scientific articles from the arXiv, covering four computer science domains (NLP, IR, AI, and ML) and $\sim$98.5k equations. Quantitatively, we found that equation embeddings provide better models when compared to existing word embedding approaches. Qualitatively, we found that equation embeddings provide coherent semantic representations of equations and can capture semantic similarity to other equations and to words.

...read moreread less

17 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics