Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Natural language processing and recurrent network models for identifying genomic mutation-associated cancer treatment change from patient progress notes.

[...]

Meijian Guan¹, Samuel Cho¹, Robin M. Petro, Wei Zhang¹, Boris Pasche¹, Umit Topaloglu¹ - Show less +2 more•Institutions (1)

Wake Forest University¹

01 Apr 2019

TL;DR: RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score.

...read moreread less

Abstract: Objectives Natural language processing (NLP) and machine learning approaches were used to build classifiers to identify genomic-related treatment changes in the free-text visit progress notes of cancer patients. Methods We obtained 5889 deidentified progress reports (2439 words on average) for 755 cancer patients who have undergone a clinical next generation sequencing (NGS) testing in Wake Forest Baptist Comprehensive Cancer Center for our data analyses. An NLP system was implemented to process the free-text data and extract NGS-related information. Three types of recurrent neural network (RNN) namely, gated recurrent unit, long short-term memory (LSTM), and bidirectional LSTM (LSTM_Bi) were applied to classify documents to the treatment-change and no-treatment-change groups. Further, we compared the performances of RNNs to 5 machine learning algorithms including Naive Bayes, K-nearest Neighbor, Support Vector Machine for classification, Random forest, and Logistic Regression. Results Our results suggested that, overall, RNNs outperformed traditional machine learning algorithms, and LSTM_Bi showed the best performance among the RNNs in terms of accuracy, precision, recall, and F1 score. In addition, pretrained word embedding can improve the accuracy of LSTM by 3.4% and reduce the training time by more than 60%. Discussion and Conclusion NLP and RNN-based text mining solutions have demonstrated advantages in information retrieval and document classification tasks for unstructured clinical progress notes.

...read moreread less

28 citations

Journal Article•DOI•

Improving Arabic information retrieval using word embedding similarities

[...]

Abdelkader El Mahdaouy¹, Said El Alaoui Ouatik², Eric Gaussier¹•Institutions (2)

University of Grenoble¹, SIDI²

01 Mar 2018-International Journal of Speech Technology

TL;DR: A method to incorporate word embedding (THE AUTHORS) semantic similarities into existing probabilistic IR models for Arabic in order to deal with term mismatch and results show that extending the existing IR models improves significantly baseline bag-of-words models.

...read moreread less

Abstract: Term mismatch is a common limitation of traditional information retrieval (IR) models where relevance scores are estimated based on exact matching of documents and queries. Typically, good IR model should consider distinct but semantically similar words in the matching process. In this paper, we propose a method to incorporate word embedding (WE) semantic similarities into existing probabilistic IR models for Arabic in order to deal with term mismatch. Experiments are performed on the standard Arabic TREC collection using three neural word embedding models. The results show that extending the existing IR models improves significantly baseline bag-of-words models. Although the proposed extensions significantly outperform their baseline bag-of-words, the difference between the evaluated neural word embedding models is not statistically significant. Moreover, the overall comparison results show that our extensions significantly improve the Arabic WordNet based semantic indexing approach and three recent WE-based IR language models.

...read moreread less

28 citations

Journal Article•DOI•

GLTM: A Global and Local Word Embedding-Based Topic Model for Short Texts

[...]

Wenxin Liang¹, Ran Feng², Xinyue Liu², Yuangang Li³, Xianchao Zhang² - Show less +1 more•Institutions (3)

Chongqing University of Posts and Telecommunications¹, Dalian University of Technology², Shanghai University of Finance and Economics³

06 Aug 2018-IEEE Access

TL;DR: A novel global and local word embedding-based topic model (GLTM) for short texts that can distill semantic relatedness information between words which can be further leveraged by Gibbs sampler in the inference process to strengthen semantic coherence of topics.

...read moreread less

Abstract: Short texts have become a kind of prevalent source of information, and discovering topical information from short text collections is valuable for many applications. Due to the length limitation, conventional topic models based on document-level word co-occurrence information often fail to distill semantically coherent topics from short text collections. On the other hand, word embeddings as a powerful tool have been successfully applied in natural language processing. Word embeddings trained on large corpus are encoded with general semantic and syntactic information of words, and hence they can be leveraged to guide topic modeling for short text collections as supplementary information for sparse co-occurrence patterns. However, word embeddings are trained on large external corpus and the encoded information is not necessarily suitable for training data set of topic models, which is ignored by most existing models. In this article, we propose a novel global and local word embedding-based topic model (GLTM) for short texts. In the GLTM, we train global word embeddings from large external corpus and employ the continuous skip-gram model with negative sampling (SGNS) to obtain local word embeddings. Utilizing both the global and local word embeddings, the GLTM can distill semantic relatedness information between words which can be further leveraged by Gibbs sampler in the inference process to strengthen semantic coherence of topics. Compared with five state-of-the-art short text topic models on four real-world short text collections, the proposed GLTM exhibits the superiority in most cases.

...read moreread less

28 citations

Journal Article•DOI•

[...]

Ming Liu¹, Bo Lang¹, Zepeng Gu¹, Ahmed Zeeshan¹•Institutions (1)

Beihang University¹

21 Dec 2017-Tsinghua Science & Technology

TL;DR: In this article, a joint word-embedding model for long documents in the academic domain is proposed to improve the semantic representation quality of word vectors by incorporating a domain-specific semantic relation constraint into the traditional context constraint.

...read moreread less

28 citations

Posted Content•

Enhance word representation for out-of-vocabulary on Ubuntu dialogue corpus

[...]

Jianxiong Dong, Jim C. Huang

07 Feb 2018-arXiv: Computation and Language

TL;DR: This paper proposed an algorithm which combines the general pre-trained word embedding vectors with those generated on the task-specific training set to address the large number of out-of-vocabulary words in Ubuntu dialogue corpus.

...read moreread less

Abstract: Ubuntu dialogue corpus is the largest public available dialogue corpus to make it feasible to build end-to-end deep neural network models directly from the conversation data. One challenge of Ubuntu dialogue corpus is the large number of out-of-vocabulary words. In this paper we proposed a method which combines the general pre-trained word embedding vectors with those generated on the task-specific training set to address this issue. We integrated character embedding into Chen et al's Enhanced LSTM method (ESIM) and used it to evaluate the effectiveness of our proposed method. For the task of next utterance selection, the proposed method has demonstrated a significant performance improvement against original ESIM and the new model has achieved state-of-the-art results on both Ubuntu dialogue corpus and Douban conversation corpus. In addition, we investigated the performance impact of end-of-utterance and end-of-turn token tags.

...read moreread less

28 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics