Topic

Word embedding

About: Word embedding is a research topic. Over the lifetime, 4683 publications have been published within this topic receiving 153378 citations. The topic is also known as: word embeddings.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Posted Content•

Improving the Accuracy of Pre-trained Word Embeddings for Sentiment Analysis

[...]

Seyed Mahdi Rezaeinia, Ali Ghodsi, Rouhollah Rahmani

23 Nov 2017-arXiv: Computation and Language

TL;DR: Improved Word Vectors (IWV) is proposed, which increases the accuracy of pre-trained word embeddings in sentiment analysis, based on Part-of-Speech tagging techniques, lexicon-based approaches and Word2Vec/GloVe methods.

...read moreread less

Abstract: Sentiment analysis is one of the well-known tasks and fast growing research areas in natural language processing (NLP) and text classifications. This technique has become an essential part of a wide range of applications including politics, business, advertising and marketing. There are various techniques for sentiment analysis, but recently word embeddings methods have been widely used in sentiment classification tasks. Word2Vec and GloVe are currently among the most accurate and usable word embedding methods which can convert words into meaningful vectors. However, these methods ignore sentiment information of texts and need a huge corpus of texts for training and generating exact vectors which are used as inputs of deep learning models. As a result, because of the small size of some corpuses, researcher often have to use pre-trained word embeddings which were trained on other large text corpus such as Google News with about 100 billion words. The increasing accuracy of pre-trained word embeddings has a great impact on sentiment analysis research. In this paper we propose a novel method, Improved Word Vectors (IWV), which increases the accuracy of pre-trained word embeddings in sentiment analysis. Our method is based on Part-of-Speech (POS) tagging techniques, lexicon-based approaches and Word2Vec/GloVe methods. We tested the accuracy of our method via different deep learning models and sentiment datasets. Our experiment results show that Improved Word Vectors (IWV) are very effective for sentiment analysis.

...read moreread less

49 citations

Proceedings Article•DOI•

Specializing Word Embeddings (for Parsing) by Information Bottleneck

[...]

Xiang Lisa Li¹, Jason Eisner¹•Institutions (1)

Johns Hopkins University¹

01 Oct 2019

TL;DR: A very fast variational information bottleneck (VIB) method to nonlinearly compress word embeddings, keeping only the information that helps a discriminative parser.

...read moreread less

Abstract: Pre-trained word embeddings like ELMo and BERT contain rich syntactic and semantic information, resulting in state-of-the-art performance on various tasks. We propose a very fast variational information bottleneck (VIB) method to nonlinearly compress these embeddings, keeping only the information that helps a discriminative parser. We compress each word embedding to either a discrete tag or a continuous vector. In the discrete version, our automatically compressed tags form an alternative tag set: we show experimentally that our tags capture most of the information in traditional POS tag annotations, but our tag sequences can be parsed more accurately at the same level of tag granularity. In the continuous version, we show experimentally that moderately compressing the word embeddings by our method yields a more accurate parser in 8 of 9 languages, unlike simple dimensionality reduction.

...read moreread less

49 citations

Proceedings Article•DOI•

Regularizing Matrix Factorization with User and Item Embeddings for Recommendation

[...]

Thanh Tran¹, Kyumin Lee¹, Yiming Liao², Dongwon Lee²•Institutions (2)

Worcester Polytechnic Institute¹, Pennsylvania State University²

31 Aug 2018-arXiv: Information Retrieval

TL;DR: A novel Regularized Multi-Embedding (RME) based recommendation model that simultaneously encapsulates the following ideas via decomposition, which items a user likes, which two users co-like the same items, and which two items users often co-liked.

...read moreread less

Abstract: Following recent successes in exploiting both latent factor and word embedding models in recommendation, we propose a novel Regularized Multi-Embedding (RME) based recommendation model that simultaneously encapsulates the following ideas via decomposition: (1) which items a user likes, (2) which two users co-like the same items, (3) which two items users often co-liked, and (4) which two items users often co-disliked In experimental validation, the RME outperforms competing state-of-the-art models in both explicit and implicit feedback datasets, significantly improving Recall@5 by 59~70%, NDCG@20 by 43~56%, and MAP@10 by 79~89% In addition, under the cold-start scenario for users with the lowest number of interactions, against the competing models, the RME outperforms NDCG@5 by 202% and 294% in MovieLens-10M and MovieLens-20M datasets, respectively Our datasets and source code are available at: this https URL

...read moreread less

48 citations

Journal Article•DOI•

Learning distributed word representation with multi-contextual mixed embedding

[...]

Jianqiang Li¹, Jing Li¹, Xianghua Fu¹, Md. Abdul Masud¹, Joshua Zhexue Huang¹ - Show less +1 more•Institutions (1)

Shenzhen University¹

15 Aug 2016-Knowledge Based Systems

TL;DR: The proposed MWE model combines the two variants of word2vec in a seamless way via sharing a common encoding structure, which is able to capture the syntax information of words more accurately and incorporates a global text vector into the CBOW variant so as to capture more semantic information.

...read moreread less

Abstract: Learning distributed word representations has been a popular method for various natural language processing applications such as word analogy and similarity, document classification and sentiment analysis. However, most existing word embedding models only exploit a shallow slide window as the context to predict the target word. Because the semantic of each word is also influenced by its global context, as the distributional models usually induced the word representations from the global co-occurrence matrix, the window-based models are insufficient to capture semantic knowledge. In this paper, we propose a novel hybrid model called mixed word embedding (MWE) based on the well-known word2vec toolbox. Specifically, the proposed MWE model combines the two variants of word2vec, i.e., SKIP-GRAM and CBOW, in a seamless way via sharing a common encoding structure, which is able to capture the syntax information of words more accurately. Furthermore, it incorporates a global text vector into the CBOW variant so as to capture more semantic information. Our MWE preserves the same time complexity as the SKIP-GRAM. To evaluate our MWE model efficiently and adaptively, we study our model on linguistic and application perspectives with both English and Chinese dataset. For linguistics, we conduct empirical studies on word analogies and similarities. The learned latent representations on both document classification and sentiment analysis are considered for application point of view of this work. The experimental results show that our MWE model is very competitive in all tasks as compared with the state-of-the-art word embedding models such as CBOW, SKIP-GRAM, and GloVe.

...read moreread less

48 citations

Journal Article•DOI•

Neural network-based approaches for biomedical relation classification: A review.

[...]

Yijia Zhang¹, Hongfei Lin¹, Zhihao Yang¹, Jian Wang¹, Yuanyuan Sun¹, Bo Xu¹, Zhehuan Zhao¹ - Show less +3 more•Institutions (1)

Dalian University of Technology¹

01 Nov 2019-Journal of Biomedical Informatics

TL;DR: The recent advancement of neural network-based approaches for classifying biomedical relations is described, including convolutional neural networks (CNNs) and recurrent neural Networks (RNNs), and the remaining challenges are described and the future directions are outlined.

...read moreread less

48 citations

Collapse

Network Information

Performance

Metrics

5,718

Papers

201,647

Citations

No. of papers in the topic in previous years
Year	Papers
2023	317
2022	716
2021	736
2020	1,025
2019	1,078
2018	788

Word embedding

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics