scispace - formally typeset
Search or ask a question

Showing papers on "Word embedding published in 2016"


Proceedings Article
05 Dec 2016
TL;DR: The authors showed that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent, which raises concerns because their widespread use often tends to amplify these biases.
Abstract: The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between the words receptionist and female, while maintaining desired associations such as between the words queen and female. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

1,379 citations


Posted Content
TL;DR: This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.
Abstract: The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

1,074 citations


Proceedings ArticleDOI
14 Mar 2016
TL;DR: Item2vec as mentioned in this paper is an item-based collaborative filtering method based on skip-gram with negative sampling (SGNS) that produces embedding for items in a latent space.
Abstract: Many Collaborative Filtering (CF) algorithms are item-based in the sense that they analyze item-item relations in order to produce item similarities. Recently, several works in the field of Natural Language Processing (NLP) suggested to learn a latent representation of words using neural embedding algorithms. Among them, the Skip-gram with Negative Sampling (SGNS), also known as word2vec, was shown to provide state-of-the-art results on various linguistics tasks. In this paper, we show that item-based CF can be cast in the same framework of neural word embedding. Inspired by SGNS, we describe a method we name item2vec for item-based CF that produces embedding for items in a latent space. The method is capable of inferring item-item relations even when user information is not available. We present experimental results that demonstrate the effectiveness of the item2vec method and show it is competitive with SVD.

440 citations


Proceedings ArticleDOI
07 Jul 2016
TL;DR: A simple, fast, and effective topic model for short texts, named GPU-DMM, based on the Dirichlet Multinomial Mixture model, which achieves comparable or better topic representations than state-of-the-art models, measured by topic coherence.
Abstract: For many applications that require semantic understanding of short texts, inferring discriminative and coherent latent topics from short texts is a critical and fundamental task. Conventional topic models largely rely on word co-occurrences to derive topics from a collection of documents. However, due to the length of each document, short texts are much more sparse in terms of word co-occurrences. Data sparsity therefore becomes a bottleneck for conventional topic models to achieve good results on short texts. On the other hand, when a human being interprets a piece of short text, the understanding is not solely based on its content words, but also her background knowledge (e.g., semantically related words). The recent advances in word embedding offer effective learning of word semantic relations from a large corpus. Exploiting such auxiliary word embeddings to enrich topic modeling for short texts is the main focus of this paper. To this end, we propose a simple, fast, and effective topic model for short texts, named GPU-DMM. Based on the Dirichlet Multinomial Mixture (DMM) model, GPU-DMM promotes the semantically related words under the same topic during the sampling process by using the generalized Polya urn (GPU) model. In this sense, the background knowledge about word semantic relatedness learned from millions of external documents can be easily exploited to improve topic modeling for short texts. Through extensive experiments on two real-world short text collections in two languages, we show that GPU-DMM achieves comparable or better topic representations than state-of-the-art models, measured by topic coherence. The learned topic representation leads to the best accuracy in text classification task, which is used as an indirect evaluation.

293 citations


Journal ArticleDOI
Duyu Tang1, Furu Wei2, Bing Qin1, Nan Yang2, Ting Liu1, Ming Zhou2 
TL;DR: This work develops a number of neural networks with tailoring loss functions, and applies sentiment embeddings to word-level sentiment analysis, sentence level sentiment classification, and building sentiment lexicons, showing results that consistently outperform context-basedembeddings on several benchmark datasets of these tasks.
Abstract: We propose learning sentiment-specific word embeddings dubbed sentiment embeddings in this paper. Existing word embedding learning algorithms typically only use the contexts of words but ignore the sentiment of texts. It is problematic for sentiment analysis because the words with similar contexts but opposite sentiment polarity, such as good and bad , are mapped to neighboring word vectors. We address this issue by encoding sentiment information of texts (e.g., sentences and words) together with contexts of words in sentiment embeddings. By combining context and sentiment level evidences, the nearest neighbors in sentiment embedding space are semantically similar and it favors words with the same sentiment polarity. In order to learn sentiment embeddings effectively, we develop a number of neural networks with tailoring loss functions, and collect massive texts automatically with sentiment signals like emoticons as the training data. Sentiment embeddings can be naturally used as word features for a variety of sentiment analysis tasks without feature engineering. We apply sentiment embeddings to word-level sentiment analysis, sentence level sentiment classification, and building sentiment lexicons. Experimental results show that sentiment embeddings consistently outperform context-based embeddings on several benchmark datasets of these tasks. This work provides insights on the design of neural networks for learning task-specific word embeddings in other natural language processing tasks.

290 citations


Journal ArticleDOI
TL;DR: A unified framework to expand short texts based on word embedding clustering and convolutional neural network and semantic cliques via fast clustering is proposed, which validates the effectiveness of the proposed method on two open benchmarks.

268 citations


Journal ArticleDOI
TL;DR: This paper analyzed three critical components in training word embeddings: model, corpus, and training parameters, and provided several simple guidelines for training good word embedding. But none of these guidelines are applicable to our task.
Abstract: The authors analyze three critical components in training word embeddings: model, corpus, and training parameters. They systematize existing neural-network-based word embedding methods and experimentally compare them using the same corpus. They then evaluate each word embedding in three ways: analyzing its semantic properties, using it as a feature for supervised tasks, and using it to initialize neural networks. They also provide several simple guidelines for training good word embeddings.

257 citations


Proceedings ArticleDOI
07 Sep 2016
TL;DR: A co-factorization model, CoFactor, which jointly decomposes the user-item interaction matrix and the item-item co-occurrence matrix with shared item latent factors and provides qualitative results that explain how CoFactor improves the quality of the inferred factors.
Abstract: Matrix factorization (MF) models and their extensions are standard in modern recommender systems. MF models decompose the observed user-item interaction matrix into user and item latent factors. In this paper, we propose a co-factorization model, CoFactor, which jointly decomposes the user-item interaction matrix and the item-item co-occurrence matrix with shared item latent factors. For each pair of items, the co-occurrence matrix encodes the number of users that have consumed both items. CoFactor is inspired by the recent success of word embedding models (e.g., word2vec) which can be interpreted as factorizing the word co-occurrence matrix. We show that this model significantly improves the performance over MF models on several datasets with little additional computational overhead. We provide qualitative results that explain how CoFactor improves the quality of the inferred factors and characterize the circumstances where it provides the most significant improvements.

254 citations


Journal ArticleDOI
TL;DR: A weight-based model and a learning procedure based on a novel median-based loss function designed to mitigate the negative effect of outliers are designed and found that the method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining.

188 citations


Journal ArticleDOI
Zhehuan Zhao1, Zhihao Yang1, Ling Luo1, Hongfei Lin1, Jian Wang1 
TL;DR: A syntax convolutional neural network (SCNN) based DDI extraction method that uses a novel word embedding, syntax word embeddedding, to employ the syntactic information of a sentence to extract DDIs from biomedical literature.
Abstract: Motivation: Detecting drug-drug interaction (DDI) has become a vital part of public health safety. Therefore, using text mining techniques to extract DDIs from biomedical literature has received great attentions. However, this research is still at an early stage and its performance has much room to improve. Results: In this article, we present a syntax convolutional neural network (SCNN) based DDI extraction method. In this method, a novel word embedding, syntax word embedding, is proposed to employ the syntactic information of a sentence. Then the position and part of speech features are introduced to extend the embedding of each word. Later, auto-encoder is introduced to encode the traditional bag-of-words feature (sparse 0–1 vector) as the dense real value vector. Finally, a combination of embedding-based convolutional features and traditional features are fed to the softmax classifier to extract DDIs from biomedical literature. Experimental results on the DDIExtraction 2013 corpus show that SCNN obtains a better performance (an F-score of 0.686) than other state-of-the-art methods. Availability and Implementation: The source code is available for academic use at http://202.118.75.18:8080/DDI/SCNN-DDI.zip. Contact: nc.ude.tuld@hzgnay Supplementary information: Supplementary data are available at Bioinformatics online.

184 citations


Proceedings ArticleDOI
11 Apr 2016
TL;DR: This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking and proposes the proposed Dual Embedding Space Model (DESM), which provides evidence that a document is about a query term.
Abstract: This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking. In contrast to NLP applications of word2vec, which tend to use only the input embeddings, we retain both the input and the output embeddings, allowing us to calculate a different word similarity that may be more suitable for document ranking. We map the query words into the input space and the document words into the output space, and compute a relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding Space Model (DESM) provides evidence that a document is about a query term, in addition to and complementing the traditional term frequency based approach.

Proceedings Article
05 Dec 2016
TL;DR: This paper proposes an efficient technique to learn a supervised metric, which it is called the Supervised-WMD (S-W MD) metric, and provides an arbitrarily close approximation of the original WMD distance that results in a practical and efficient update rule.
Abstract: Recently, a new document metric called the word mover's distance (WMD) has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high-quality word embeddings to a document metric by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely un-supervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised-WMD (S-WMD) metric. The supervised training minimizes the stochastic leave-one-out nearest neighbor classification error on a per-document level by updating an affine transformation of the underlying word embedding space and a word-imporance weight vector. As the gradient of the original WMD distance would result in an inefficient nested optimization problem, we provide an arbitrarily close approximation that results in a practical and efficient update rule. We evaluate S-WMD on eight real-world text classification tasks on which it consistently outperforms almost all of our 26 competitive baselines.

Proceedings Article
12 Feb 2016
TL;DR: This paper proposes a context-based neural network model for Twitter sentiment analysis, incorporating contextualized features from relevant Tweets into the model in the form of word embedding vectors.
Abstract: Sentiment classification on Twitter has attracted increasing research in recent years. Most existing work focuses on feature engineering according to the tweet content itself. In this paper, we propose a context-based neural network model for Twitter sentiment analysis, incorporating contextualized features from relevant Tweets into the model in the form of word embedding vectors. Experiments on both balanced and unbalanced datasets show that our proposed models outperform the current state-of-the-art.

Proceedings ArticleDOI
12 Sep 2016
TL;DR: This paper proposes to use word embeddings to incorporate and weight terms that do not occur in the query, but are semantically related to the query terms, and develops an embedding-based relevance model, an extension of the effective and robust relevance model approach.
Abstract: Word embeddings, which are low-dimensional vector representations of vocabulary terms that capture the semantic similarity between them, have recently been shown to achieve impressive performance in many natural language processing tasks. The use of word embeddings in information retrieval, however, has only begun to be studied. In this paper, we explore the use of word embeddings to enhance the accuracy of query language models in the ad-hoc retrieval task. To this end, we propose to use word embeddings to incorporate and weight terms that do not occur in the query, but are semantically related to the query terms. We describe two embedding-based query expansion models with different assumptions. Since pseudo-relevance feedback methods that use the top retrieved documents to update the original query model are well-known to be effective, we also develop an embedding-based relevance model, an extension of the effective and robust relevance model approach. In these models, we transform the similarity values obtained by the widely-used cosine similarity with a sigmoid function to have more discriminative semantic similarity values. We evaluate our proposed methods using three TREC newswire and web collections. The experimental results demonstrate that the embedding-based methods significantly outperform competitive baselines in most cases. The embedding-based methods are also shown to be more robust than the baselines.

Proceedings Article
01 Dec 2016
TL;DR: This model makes full use of word embedding, part-of-speech tag embedding and position embedding information and enables learning some important features from task-specific labeled data, forgoing the need for external knowledge such as explicit dependency structures.
Abstract: Nowadays, neural networks play an important role in the task of relation classification. In this paper, we propose a novel attention-based convolutional neural network architecture for this task. Our model makes full use of word embedding, part-of-speech tag embedding and position embedding information. Word level attention mechanism is able to better determine which parts of the sentence are most influential with respect to the two entities of interest. This architecture enables learning some important features from task-specific labeled data, forgoing the need for external knowledge such as explicit dependency structures. Experiments on the SemEval-2010 Task 8 benchmark dataset show that our model achieves better performances than several state-of-the-art neural network models and can achieve a competitive performance just with minimal feature engineering.

Journal ArticleDOI
TL;DR: Experimental results on the dataset show that topic-enhanced word embedding is very effective for Twitter sentiment classification.

Journal ArticleDOI
TL;DR: A deep and bidirectional representation learning model is proposed to address the issue of image-text cross-modal retrieval and shows that the proposed architecture is effective and the learned representations have good semantics to achieve superior cross- modal retrieval performance.
Abstract: Cross-modal retrieval emphasizes understanding inter-modality semantic correlations, which is often achieved by designing a similarity function. Generally, one of the most important things considered by the similarity function is how to make the cross-modal similarity computable. In this paper, a deep and bidirectional representation learning model is proposed to address the issue of image–text cross-modal retrieval. Owing to the solid progress of deep learning in computer vision and natural language processing, it is reliable to extract semantic representations from both raw image and text data by using deep neural networks. Therefore, in the proposed model, two convolution-based networks are adopted to accomplish representation learning for images and texts. By passing the networks, images and texts are mapped to a common space, in which the cross-modal similarity is measured by cosine distance. Subsequently, a bidirectional network architecture is designed to capture the property of the cross-modal retrieval—the bidirectional search. Such architecture is characterized by simultaneously involving the matched and unmatched image–text pairs for training. Accordingly, a learning framework with maximum likelihood criterion is finally developed. The network parameters are optimized via backpropagation and stochastic gradient descent. A great deal of experiments are conducted to sufficiently evaluate the proposed method on three publicly released datasets: IAPRTC-12, Flickr30k, and Flickr8k. The overall results definitely show that the proposed architecture is effective and the learned representations have good semantics to achieve superior cross-modal retrieval performance.

Proceedings ArticleDOI
01 Nov 2016
TL;DR: This article explored if prior work can be enhanced using semantic similarity/discordance between word embeddings, and augmented word embedding-based features to four feature sets reported in the past.
Abstract: This paper makes a simple increment to state-of-the-art in sarcasm detection research. Existing approaches are unable to capture subtle forms of context incongruity which lies at the heart of sarcasm. We explore if prior work can be enhanced using semantic similarity/discordance between word embeddings. We augment word embedding-based features to four feature sets reported in the past. We also experiment with four types of word embeddings. We observe an improvement in sarcasm detection, irrespective of the word embedding used or the original feature set to which our features are augmented. For example, this augmentation results in an improvement in F-score of around 4\% for three out of these four feature sets, and a minor degradation in case of the fourth, when Word2Vec embeddings are used. Finally, a comparison of the four embeddings shows that Word2Vec and dependency weight-based features outperform LSA and GloVe, in terms of their benefit to sarcasm detection.

Posted Content
TL;DR: It is shown for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language---the same sort of language humans are exposed to every day.
Abstract: Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language---the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model---namely, the GloVe word embedding---trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.

Proceedings ArticleDOI
12 Sep 2016
TL;DR: A theoretical framework for estimating query embedding vectors based on the individual embedding vector of vocabulary terms is proposed and a number of different implementations of this framework are provided and it is shown that the AWE method is a special case of the proposed framework.
Abstract: The dense vector representation of vocabulary terms, also known as word embeddings, have been shown to be highly effective in many natural language processing tasks. Word embeddings have recently begun to be studied in a number of information retrieval (IR) tasks. One of the main steps in leveraging word embeddings for IR tasks is to estimate the embedding vectors of queries. This is a challenging task, since queries are not always available during the training phase of word embedding vectors. Previous work has considered the average or sum of embedding vectors of all query terms (AWE) to model the query embedding vectors, but no theoretical justification has been presented for such a model. In this paper, we propose a theoretical framework for estimating query embedding vectors based on the individual embedding vectors of vocabulary terms. We then provide a number of different implementations of this framework and show that the AWE method is a special case of the proposed framework. We also introduce pseudo query vectors, the query embedding vectors estimated using pseudo-relevant documents. We further extrinsically evaluate the proposed methods using two well-known IR tasks: query expansion and query classification. The estimated query embedding vectors are evaluated via query expansion experiments over three newswire and web TREC collections as well as query classification experiments over the KDD Cup 2005 test set. The experiments show that the introduced pseudo query vectors significantly outperform the AWE method.

Proceedings Article
01 May 2016
TL;DR: New perceptions of intrinsic qualities of the famous word embedding families can be different from the ones provided by works previously published in the scientific literature are provided.
Abstract: Word embeddings have been successfully used in several natural language processing tasks (NLP) and speech processing. Different approaches have been introduced to calculate word embeddings through neural networks. In the literature, many studies focused on word embedding evaluation, but for our knowledge, there are still some gaps. This paper presents a study focusing on a rigorous comparison of the performances of different kinds of word embeddings. These performances are evaluated on different NLP and linguistic tasks, while all the word embeddings are estimated on the same training data using the same vocabulary, the same number of dimensions, and other similar characteristics. The evaluation results reported in this paper match those in the literature, since they point out that the improvements achieved by a word embedding in one task are not consistently observed across all tasks. For that reason, this paper investigates and evaluates approaches to combine word embeddings in order to take advantage of their complementarity, and to look for the effective word embeddings that can achieve good performances on all tasks. As a conclusion, this paper provides new perceptions of intrinsic qualities of the famous word embedding families, which can be different from the ones provided by works previously published in the scientific literature.

Book ChapterDOI
20 Mar 2016
TL;DR: This paper compared the effectiveness of three widespread approaches as Latent Semantic Indexing, Random Indexing and Word2Vec in the task of learning a vector space representation of both items to be recommended as well as user profiles.
Abstract: In this paper we present a preliminary investigation towards the adoption of Word Embedding techniques in a content-based recommendation scenario. Specifically, we compared the effectiveness of three widespread approaches as Latent Semantic Indexing, Random Indexing and Word2Vec in the task of learning a vector space representation of both items to be recommended as well as user profiles.

Proceedings ArticleDOI
24 Oct 2016
TL;DR: This paper proposes a linked document embedding framework LDE, which combines link and label information with content information to learn document representations for classification, and experiments demonstrate the effectiveness of the proposed framework.
Abstract: Word and document embedding algorithms such as Skip-gram and Paragraph Vector have been proven to help various text analysis tasks such as document classification, document clustering and information retrieval. The vast majority of these algorithms are designed to work with independent and identically distributed documents. However, in many real-world applications, documents are inherently linked. For example, web documents such as blogs and online news often have hyperlinks to other web documents, and scientific articles usually cite other articles. Linked documents present new challenges to traditional document embedding algorithms. In addition, most existing document embedding algorithms are unsupervised and their learned representations may not be optimal for classification when labeling information is available. In this paper, we study the problem of linked document embedding for classification and propose a linked document embedding framework LDE, which combines link and label information with content information to learn document representations for classification. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of link and label information in the proposed framework LDE.

Journal ArticleDOI
TL;DR: This work deals with the relation classification task utilizing a convolutional neural network approach to automatically control feature learning from raw sentences and minimize the application of external toolkits and resources and shows that the proposed architecture significantly outperforms the state-of-the-art systems.

Posted Content
TL;DR: The authors showed that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance and proposed a new method of regularizing the output embedding.
Abstract: We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.

Posted Content
TL;DR: This paper investigates the impact of the background dataset used to train the embedding models, as well as the parameters of the word embedding training process, namely the context window size, the dimensionality and the number of negative samples, on the attained classification performance and finds that large context window and dimension sizes are preferable to improve the performance.
Abstract: Word embeddings and convolutional neural networks (CNN) have attracted extensive attention in various classification tasks for Twitter, e.g. sentiment classification. However, the effect of the configuration used to train and generate the word embeddings on the classification performance has not been studied in the existing literature. In this paper, using a Twitter election classification task that aims to detect election-related tweets, we investigate the impact of the background dataset used to train the embedding models, the context window size and the dimensionality of word embeddings on the classification performance. By comparing the classification results of two word embedding models, which are trained using different background corpora (e.g. Wikipedia articles and Twitter microposts), we show that the background data type should align with the Twitter classification dataset to achieve a better performance. Moreover, by evaluating the results of word embeddings models trained using various context window sizes and dimensionalities, we found that large context window and dimension sizes are preferable to improve the performance. Our experimental results also show that using word embeddings and CNN leads to statistically significant improvements over various baselines such as random, SVM with TF-IDF and SVM with word embeddings.

Posted Content
TL;DR: A generative topic embedding model is proposed that performs better than eight existing methods, with fewer features, and can generate coherent topics even based on only one document.
Abstract: Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative topic embedding model to combine the two types of patterns. In our model, topics are represented by embedding vectors, and are shared across documents. The probability of each word is influenced by both its local context and its topic. A variational inference method yields the topic embeddings as well as the topic mixing proportions for each document. Jointly they represent the document in a low-dimensional continuous space. In two document classification tasks, our method performs better than eight existing methods, with fewer features. In addition, we illustrate with an example that our method can generate coherent topics even based on only one document.

Proceedings ArticleDOI
01 Aug 2016
TL;DR: This article proposed a generative topic embedding model to combine word embedding and topic modeling, where topics are represented by embedding vectors and are shared across documents, and the probability of each word is influenced by both its local context and its topic.
Abstract: Word embedding maps words into a lowdimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative topic embedding model to combine the two types of patterns. In our model, topics are represented by embedding vectors, and are shared across documents. The probability of each word is influenced by both its local context and its topic. A variational inference method yields the topic embeddings as well as the topic mixing proportions for each document. Jointly they represent the document in a low-dimensional continuous space. In two document classification tasks, our method performs better than eight existing methods, with fewer features. In addition, we illustrate with an example that our method can generate coherent topics even based on only one document.

Proceedings ArticleDOI
01 Jun 2016
TL;DR: A constrained system (unconstrained for English) achieves competitive results across all languages and domains, placing first or second in 5 and 7 out of 11 language-domain pairs for aspect category detection and sentiment polarity respectively, thereby demonstrating the viability of a deep learning-based approach for multilingual aspect-based sentiment analysis.
Abstract: This paper describes our deep learning-based approach to multilingual aspect-based sentiment analysis as part of SemEval 2016 Task 5. We use a convolutional neural network (CNN) for both aspect extraction and aspect-based sentiment analysis. We cast aspect extraction as a multi-label classification problem, outputting probabilities over aspects parameterized by a threshold. To determine the sentiment towards an aspect, we concatenate an aspect vector with every word embedding and apply a convolution over it. Our constrained system (unconstrained for English) achieves competitive results across all languages and domains, placing first or second in 5 and 7 out of 11 language-domain pairs for aspect category detection (slot 1) and sentiment polarity (slot 3) respectively, thereby demonstrating the viability of a deep learning-based approach for multilingual aspect-based sentiment analysis.

Journal ArticleDOI
TL;DR: A simple, principled, direct metric recovery algorithm is proposed that performs on par with the state-of-the-art word embedding and manifold learning methods and is complemented by constructing two new inductive reasoning datasets and demonstrating that word embeddings can be used to solve them.
Abstract: Continuous word representations have been remarkably useful across NLP tasks but remain poorly understood. We ground word embeddings in semantic spaces studied in the cognitive-psychometric literature, taking these spaces as the primary objects to recover. To this end, we relate log co-occurrences of words in large corpora to semantic similarity assessments and show that co-occurrences are indeed consistent with an Euclidean semantic space hypothesis. Framing word embedding as metric recovery of a semantic space unifies existing word embedding algorithms, ties them to manifold learning, and demonstrates that existing algorithms are consistent metric recovery methods given co-occurrence counts from random walks. Furthermore, we propose a simple, principled, direct metric recovery algorithm that performs on par with the state-of-the-art word embedding and manifold learning methods. Finally, we complement recent focus on analogies by constructing two new inductive reasoning datasets---series completion and classification---and demonstrate that word embeddings can be used to solve them as well.