Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

A machine learning approach to analyze customer satisfaction from airline tweets

[...]

Sachin Kumar¹, Mikhail L. Zymbler¹•Institutions (1)

South Ural State University¹

01 Dec 2019-Journal of Big Data

TL;DR: This study presents a machine learning approach to analyze the tweets to improve the customer’s experience and found that convolutional neural network (CNN) outperformed SVM and ANN models.

...read moreread less

Abstract: Customer’s experience is one of the important concern for airline industries. Twitter is one of the popular social media platform where flight travelers share their feedbacks in the form of tweets. This study presents a machine learning approach to analyze the tweets to improve the customer’s experience. Features were extracted from the tweets using word embedding with Glove dictionary approach and n-gram approach. Further, SVM (support vector machine) and several ANN (artificial neural network) architectures were considered to develop classification model that maps the tweet into positive and negative category. Additionally, convolutional neural network (CNN) were developed to classify the tweets and the results were compared with the most accurate model among SVM and several ANN architectures. It was found that CNN outperformed SVM and ANN models. In the end, association rule mining have been performed on different categories of tweets to map the relationship with sentiment categories. The results show that interesting associations were identified that certainly helps the airline industries to improve their customer’s experience.

...read moreread less

69 citations

Proceedings Article•

Bayesian Neural Word Embedding

[...]

Oren Barkan¹•Institutions (1)

Tel Aviv University¹

01 Mar 2016

TL;DR: The authors proposed a scalable Bayesian neural word embedding algorithm, which relies on a Variational Bayes solution for the Skip-Gram objective and a detailed step by step description is provided.

...read moreread less

Abstract: Recently, several works in the domain of natural language processing presented successful methods for word embedding. Among them, the Skip-Gram with negative sampling, known also as word2vec, advanced the state-of-the-art of various linguistics tasks. In this paper, we propose a scalable Bayesian neural word embedding algorithm. The algorithm relies on a Variational Bayes solution for the Skip-Gram objective and a detailed step by step description is provided. We present experimental results that demonstrate the performance of the proposed algorithm for word analogy and similarity tasks on six different datasets and show it is competitive with the original Skip-Gram method.

...read moreread less

69 citations

Journal Article•DOI•

Drug-Drug Interaction Extraction via Recurrent Hybrid Convolutional Neural Networks with an Improved Focal Loss.

[...]

Xia Sun¹, Ke Dong¹, Long Ma¹, Richard F. E. Sutcliffe¹, Feijuan He², Su-Shing Chen³, Jun Feng¹ - Show less +3 more•Institutions (3)

Northwest University (China)¹, Xi'an Jiaotong University², University of Florida³

08 Jan 2019-Entropy

TL;DR: A novel recurrent hybrid convolutional neural network (RHCNN) for DDI extraction from biomedical literature and applies an improved focal loss function to mitigate against the defects of the traditional cross-entropy loss function when dealing with class imbalanced data.

...read moreread less

Abstract: Drug-drug interactions (DDIs) may bring huge health risks and dangerous effects to a patient's body when taking two or more drugs at the same time or within a certain period of time. Therefore, the automatic extraction of unknown DDIs has great potential for the development of pharmaceutical agents and the safety of drug use. In this article, we propose a novel recurrent hybrid convolutional neural network (RHCNN) for DDI extraction from biomedical literature. In the embedding layer, the texts mentioning two entities are represented as a sequence of semantic embeddings and position embeddings. In particular, the complete semantic embedding is obtained by the information fusion between a word embedding and its contextual information which is learnt by recurrent structure. After that, the hybrid convolutional neural network is employed to learn the sentence-level features which consist of the local context features from consecutive words and the dependency features between separated words for DDI extraction. Lastly but most significantly, in order to make up for the defects of the traditional cross-entropy loss function when dealing with class imbalanced data, we apply an improved focal loss function to mitigate against this problem when using the DDIExtraction 2013 dataset. In our experiments, we achieve DDI automatic extraction with a micro F-score of 75.48% on the DDIExtraction 2013 dataset, outperforming the state-of-the-art approach by 2.49%.

...read moreread less

69 citations

Posted Content•

Gender Bias in Neural Natural Language Processing

[...]

Kaiji Lu¹, Piotr Mardziel¹, Fangjing Wu¹, Preetam Amancharla¹, Anupam Datta¹ - Show less +1 more•Institutions (1)

Carnegie Mellon University¹

31 Jul 2018-arXiv: Computation and Language

TL;DR: In this paper, a generic methodology for corpus augmentation via causal interventions that break associations between gendered and gender-neutral words is proposed to mitigate bias with CDA, which effectively decreases gender bias while preserving accuracy.

...read moreread less

Abstract: We examine whether neural natural language processing (NLP) systems reflect historical biases in training data. We define a general benchmark to quantify gender bias in a variety of neural NLP tasks. Our empirical evaluation with state-of-the-art neural coreference resolution and textbook RNN-based language models trained on benchmark datasets finds significant gender bias in how models view occupations. We then mitigate bias with CDA: a generic methodology for corpus augmentation via causal interventions that breaks associations between gendered and gender-neutral words. We empirically show that CDA effectively decreases gender bias while preserving accuracy. We also explore the space of mitigation strategies with CDA, a prior approach to word embedding debiasing (WED), and their compositions. We show that CDA outperforms WED, drastically so when word embeddings are trained. For pre-trained embeddings, the two methods can be effectively composed. We also find that as training proceeds on the original data set with gradient descent the gender bias grows as the loss reduces, indicating that the optimization encourages bias; CDA mitigates this behavior.

...read moreread less

69 citations

Proceedings Article•DOI•

Frustratingly Easy Meta-Embedding -- Computing Meta-Embeddings by Averaging Source Word Embeddings

[...]

Joshua Coates, Danushka Bollegala¹•Institutions (1)

University of Liverpool¹

01 Mar 2018

TL;DR: The authors showed that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is comparable or better than more complex meta embedding learning methods, despite the incomparability of the source vector spaces.

...read moreread less

Abstract: Creating accurate meta-embeddings from pre-trained source embeddings has received attention lately. Methods based on global and locally-linear transformation and concatenation have shown to produce accurate meta-embeddings. In this paper, we show that the arithmetic mean of two distinct word embedding sets yields a performant meta-embedding that is comparable or better than more complex meta-embedding learning methods. The result seems counter-intuitive given that vector spaces in different source embeddings are not comparable and cannot be simply averaged. We give insight into why averaging can still produce accurate meta-embedding despite the incomparability of the source vector spaces.

...read moreread less

69 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics