Showing papers on "Word embedding published in 2016"

PDF

Open Access

Proceedings Article•

Man is to computer programmer as woman is to homemaker? debiasing word embeddings

[...]

Tolga Bolukbasi¹, Kai-Wei Chang², James Zou², Venkatesh Saligrama², Adam Tauman Kalai² - Show less +1 more•Institutions (2)

Boston University¹, Microsoft²

05 Dec 2016

TL;DR: The authors showed that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent, which raises concerns because their widespread use often tends to amplify these biases.

...read moreread less

Abstract: The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between the words receptionist and female, while maintaining desired associations such as between the words queen and female. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

...read moreread less

1,379 citations

Posted Content•

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

[...]

Tolga Bolukbasi¹, Kai-Wei Chang², James Zou², Venkatesh Saligrama², Adam Tauman Kalai² - Show less +1 more•Institutions (2)

Boston University¹, Microsoft²

21 Jul 2016-arXiv: Computation and Language

TL;DR: This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks.

...read moreread less

Abstract: The blind application of machine learning runs the risk of amplifying biases present in data. Such a danger is facing us with word embedding, a popular framework to represent text data as vectors which has been used in many machine learning and natural language processing tasks. We show that even word embeddings trained on Google News articles exhibit female/male gender stereotypes to a disturbing extent. This raises concerns because their widespread use, as we describe, often tends to amplify these biases. Geometrically, gender bias is first shown to be captured by a direction in the word embedding. Second, gender neutral words are shown to be linearly separable from gender definition words in the word embedding. Using these properties, we provide a methodology for modifying an embedding to remove gender stereotypes, such as the association between between the words receptionist and female, while maintaining desired associations such as between the words queen and female. We define metrics to quantify both direct and indirect gender biases in embeddings, and develop algorithms to "debias" the embedding. Using crowd-worker evaluation as well as standard benchmarks, we empirically demonstrate that our algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. The resulting embeddings can be used in applications without amplifying gender bias.

...read moreread less

1,074 citations

Proceedings Article•DOI•

ITEM2VEC: Neural item embedding for collaborative filtering

[...]

Oren Barkan¹, Noam Koenigstein²•Institutions (2)

Tel Aviv University¹, Microsoft²

14 Mar 2016

TL;DR: Item2vec as mentioned in this paper is an item-based collaborative filtering method based on skip-gram with negative sampling (SGNS) that produces embedding for items in a latent space.

...read moreread less

Abstract: Many Collaborative Filtering (CF) algorithms are item-based in the sense that they analyze item-item relations in order to produce item similarities. Recently, several works in the field of Natural Language Processing (NLP) suggested to learn a latent representation of words using neural embedding algorithms. Among them, the Skip-gram with Negative Sampling (SGNS), also known as word2vec, was shown to provide state-of-the-art results on various linguistics tasks. In this paper, we show that item-based CF can be cast in the same framework of neural word embedding. Inspired by SGNS, we describe a method we name item2vec for item-based CF that produces embedding for items in a latent space. The method is capable of inferring item-item relations even when user information is not available. We present experimental results that demonstrate the effectiveness of the item2vec method and show it is competitive with SVD.

...read moreread less

440 citations

Proceedings Article•DOI•

Topic Modeling for Short Texts with Auxiliary Word Embeddings

[...]

Chenliang Li¹, Haoran Wang¹, Zhiqian Zhang¹, Aixin Sun², Zongyang Ma² - Show less +1 more•Institutions (2)

Wuhan University¹, Nanyang Technological University²

07 Jul 2016

TL;DR: A simple, fast, and effective topic model for short texts, named GPU-DMM, based on the Dirichlet Multinomial Mixture model, which achieves comparable or better topic representations than state-of-the-art models, measured by topic coherence.

...read moreread less

Abstract: For many applications that require semantic understanding of short texts, inferring discriminative and coherent latent topics from short texts is a critical and fundamental task. Conventional topic models largely rely on word co-occurrences to derive topics from a collection of documents. However, due to the length of each document, short texts are much more sparse in terms of word co-occurrences. Data sparsity therefore becomes a bottleneck for conventional topic models to achieve good results on short texts. On the other hand, when a human being interprets a piece of short text, the understanding is not solely based on its content words, but also her background knowledge (e.g., semantically related words). The recent advances in word embedding offer effective learning of word semantic relations from a large corpus. Exploiting such auxiliary word embeddings to enrich topic modeling for short texts is the main focus of this paper. To this end, we propose a simple, fast, and effective topic model for short texts, named GPU-DMM. Based on the Dirichlet Multinomial Mixture (DMM) model, GPU-DMM promotes the semantically related words under the same topic during the sampling process by using the generalized Polya urn (GPU) model. In this sense, the background knowledge about word semantic relatedness learned from millions of external documents can be easily exploited to improve topic modeling for short texts. Through extensive experiments on two real-world short text collections in two languages, we show that GPU-DMM achieves comparable or better topic representations than state-of-the-art models, measured by topic coherence. The learned topic representation leads to the best accuracy in text classification task, which is used as an indirect evaluation.

...read moreread less

293 citations

Journal Article•DOI•

Sentiment Embeddings with Applications to Sentiment Analysis

[...]

Duyu Tang¹, Furu Wei², Bing Qin¹, Nan Yang², Ting Liu¹, Ming Zhou² - Show less +2 more•Institutions (2)

Harbin Institute of Technology¹, Microsoft²

01 Feb 2016-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work develops a number of neural networks with tailoring loss functions, and applies sentiment embeddings to word-level sentiment analysis, sentence level sentiment classification, and building sentiment lexicons, showing results that consistently outperform context-basedembeddings on several benchmark datasets of these tasks.

...read moreread less

Abstract: We propose learning sentiment-specific word embeddings dubbed sentiment embeddings in this paper. Existing word embedding learning algorithms typically only use the contexts of words but ignore the sentiment of texts. It is problematic for sentiment analysis because the words with similar contexts but opposite sentiment polarity, such as good and bad , are mapped to neighboring word vectors. We address this issue by encoding sentiment information of texts (e.g., sentences and words) together with contexts of words in sentiment embeddings. By combining context and sentiment level evidences, the nearest neighbors in sentiment embedding space are semantically similar and it favors words with the same sentiment polarity. In order to learn sentiment embeddings effectively, we develop a number of neural networks with tailoring loss functions, and collect massive texts automatically with sentiment signals like emoticons as the training data. Sentiment embeddings can be naturally used as word features for a variety of sentiment analysis tasks without feature engineering. We apply sentiment embeddings to word-level sentiment analysis, sentence level sentiment classification, and building sentiment lexicons. Experimental results show that sentiment embeddings consistently outperform context-based embeddings on several benchmark datasets of these tasks. This work provides insights on the design of neural networks for learning task-specific word embeddings in other natural language processing tasks.

...read moreread less

290 citations

Journal Article•DOI•

Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification

[...]

Peng Wang¹, Bo Xu¹, Jiaming Xu¹, Guanhua Tian¹, Cheng-Lin Liu, Hongwei Hao¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

22 Jan 2016-Neurocomputing

TL;DR: A unified framework to expand short texts based on word embedding clustering and convolutional neural network and semantic cliques via fast clustering is proposed, which validates the effectiveness of the proposed method on two open benchmarks.

...read moreread less

268 citations

Journal Article•DOI•

How to Generate a Good Word Embedding

[...]

Siwei Lai¹, Kang Liu¹, Shizhu He¹, Jun Zhao¹•Institutions (1)

Chinese Academy of Sciences¹

01 Nov 2016-IEEE Intelligent Systems

TL;DR: This paper analyzed three critical components in training word embeddings: model, corpus, and training parameters, and provided several simple guidelines for training good word embedding. But none of these guidelines are applicable to our task.

...read moreread less

Abstract: The authors analyze three critical components in training word embeddings: model, corpus, and training parameters. They systematize existing neural-network-based word embedding methods and experimentally compare them using the same corpus. They then evaluate each word embedding in three ways: analyzing its semantic properties, using it as a feature for supervised tasks, and using it to initialize neural networks. They also provide several simple guidelines for training good word embeddings.

...read moreread less

257 citations

Proceedings Article•DOI•

Factorization Meets the Item Embedding: Regularizing Matrix Factorization with Item Co-occurrence

[...]

Dawen Liang¹, Jaan Altosaar², Laurent Charlin³, David M. Blei⁴•Institutions (4)

Netflix¹, Princeton University², HEC Montréal³, Columbia University⁴

07 Sep 2016

TL;DR: A co-factorization model, CoFactor, which jointly decomposes the user-item interaction matrix and the item-item co-occurrence matrix with shared item latent factors and provides qualitative results that explain how CoFactor improves the quality of the inferred factors.

...read moreread less

Abstract: Matrix factorization (MF) models and their extensions are standard in modern recommender systems. MF models decompose the observed user-item interaction matrix into user and item latent factors. In this paper, we propose a co-factorization model, CoFactor, which jointly decomposes the user-item interaction matrix and the item-item co-occurrence matrix with shared item latent factors. For each pair of items, the co-occurrence matrix encodes the number of users that have consumed both items. CoFactor is inspired by the recent success of word embedding models (e.g., word2vec) which can be interpreted as factorizing the word co-occurrence matrix. We show that this model significantly improves the performance over MF models on several datasets with little additional computational overhead. We provide qualitative results that explain how CoFactor improves the quality of the inferred factors and characterize the circumstances where it provides the most significant improvements.

...read moreread less

254 citations

Journal Article•DOI•

Representation learning for very short texts using weighted word embedding aggregation

[...]

Cedric De Boom¹, Steven Van Canneyt¹, Thomas Demeester¹, Bart Dhoedt¹•Institutions (1)

Ghent University¹

01 Sep 2016-Pattern Recognition Letters

TL;DR: A weight-based model and a learning procedure based on a novel median-based loss function designed to mitigate the negative effect of outliers are designed and found that the method outperforms the baseline approaches in the experiments, and that it generalizes well on different word embeddings without retraining.

...read moreread less

188 citations

Journal Article•DOI•

Drug Drug Interaction Extraction from Biomedical Literature Using Syntax Convolutional Neural Network

[...]

Zhehuan Zhao¹, Zhihao Yang¹, Ling Luo¹, Hongfei Lin¹, Jian Wang¹ - Show less +1 more•Institutions (1)

Dalian University of Technology¹

15 Nov 2016-Bioinformatics

TL;DR: A syntax convolutional neural network (SCNN) based DDI extraction method that uses a novel word embedding, syntax word embeddedding, to employ the syntactic information of a sentence to extract DDIs from biomedical literature.

...read moreread less

Abstract: Motivation: Detecting drug-drug interaction (DDI) has become a vital part of public health safety. Therefore, using text mining techniques to extract DDIs from biomedical literature has received great attentions. However, this research is still at an early stage and its performance has much room to improve. Results: In this article, we present a syntax convolutional neural network (SCNN) based DDI extraction method. In this method, a novel word embedding, syntax word embedding, is proposed to employ the syntactic information of a sentence. Then the position and part of speech features are introduced to extend the embedding of each word. Later, auto-encoder is introduced to encode the traditional bag-of-words feature (sparse 0–1 vector) as the dense real value vector. Finally, a combination of embedding-based convolutional features and traditional features are fed to the softmax classifier to extract DDIs from biomedical literature. Experimental results on the DDIExtraction 2013 corpus show that SCNN obtains a better performance (an F-score of 0.686) than other state-of-the-art methods. Availability and Implementation: The source code is available for academic use at http://202.118.75.18:8080/DDI/SCNN-DDI.zip. Contact: nc.ude.tuld@hzgnay Supplementary information: Supplementary data are available at Bioinformatics online.

...read moreread less

184 citations

Proceedings Article•DOI•

Improving Document Ranking with Dual Word Embeddings

[...]

Eric Nalisnick¹, Bhaskar Mitra², Nick Craswell², Rich Caruana²•Institutions (2)

University of California, Irvine¹, Microsoft²

11 Apr 2016

TL;DR: This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking and proposes the proposed Dual Embedding Space Model (DESM), which provides evidence that a document is about a query term.

...read moreread less

Abstract: This paper investigates the popular neural word embedding method Word2vec as a source of evidence in document ranking. In contrast to NLP applications of word2vec, which tend to use only the input embeddings, we retain both the input and the output embeddings, allowing us to calculate a different word similarity that may be more suitable for document ranking. We map the query words into the input space and the document words into the output space, and compute a relevance score by aggregating the cosine similarities across all the query-document word pairs. We postulate that the proposed Dual Embedding Space Model (DESM) provides evidence that a document is about a query term, in addition to and complementing the traditional term frequency based approach.

...read moreread less

Proceedings Article•

Supervised word mover's distance

[...]

Gao Huang¹, Chuan Quo¹, Matt J. Kusner², Yu Sun¹, Kilian Q. Weinberger¹, Fei Sha³ - Show less +2 more•Institutions (3)

Cornell University¹, University of Warwick², University of California, Los Angeles³

05 Dec 2016

TL;DR: This paper proposes an efficient technique to learn a supervised metric, which it is called the Supervised-WMD (S-W MD) metric, and provides an arbitrarily close approximation of the original WMD distance that results in a practical and efficient update rule.

...read moreread less

Abstract: Recently, a new document metric called the word mover's distance (WMD) has been proposed with unprecedented results on kNN-based document classification. The WMD elevates high-quality word embeddings to a document metric by formulating the distance between two documents as an optimal transport problem between the embedded words. However, the document distances are entirely un-supervised and lack a mechanism to incorporate supervision when available. In this paper we propose an efficient technique to learn a supervised metric, which we call the Supervised-WMD (S-WMD) metric. The supervised training minimizes the stochastic leave-one-out nearest neighbor classification error on a per-document level by updating an affine transformation of the underlying word embedding space and a word-imporance weight vector. As the gradient of the original WMD distance would result in an inefficient nested optimization problem, we provide an arbitrarily close approximation that results in a practical and efficient update rule. We evaluate S-WMD on eight real-world text classification tasks on which it consistently outperforms almost all of our 26 competitive baselines.

...read moreread less

Proceedings Article•

Context-sensitive Twitter sentiment classification using neural network

[...]

Yafeng Ren¹, Yue Zhang², Meishan Zhang³, Donghong Ji¹•Institutions (3)

Wuhan University¹, Singapore University of Technology and Design², Heilongjiang University³

12 Feb 2016

TL;DR: This paper proposes a context-based neural network model for Twitter sentiment analysis, incorporating contextualized features from relevant Tweets into the model in the form of word embedding vectors.

...read moreread less

Abstract: Sentiment classification on Twitter has attracted increasing research in recent years. Most existing work focuses on feature engineering according to the tweet content itself. In this paper, we propose a context-based neural network model for Twitter sentiment analysis, incorporating contextualized features from relevant Tweets into the model in the form of word embedding vectors. Experiments on both balanced and unbalanced datasets show that our proposed models outperform the current state-of-the-art.

...read moreread less

Proceedings Article•DOI•

Embedding-based Query Language Models

[...]

Hamed Zamani¹, W. Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

12 Sep 2016

TL;DR: This paper proposes to use word embeddings to incorporate and weight terms that do not occur in the query, but are semantically related to the query terms, and develops an embedding-based relevance model, an extension of the effective and robust relevance model approach.

...read moreread less

Abstract: Word embeddings, which are low-dimensional vector representations of vocabulary terms that capture the semantic similarity between them, have recently been shown to achieve impressive performance in many natural language processing tasks. The use of word embeddings in information retrieval, however, has only begun to be studied. In this paper, we explore the use of word embeddings to enhance the accuracy of query language models in the ad-hoc retrieval task. To this end, we propose to use word embeddings to incorporate and weight terms that do not occur in the query, but are semantically related to the query terms. We describe two embedding-based query expansion models with different assumptions. Since pseudo-relevance feedback methods that use the top retrieved documents to update the original query model are well-known to be effective, we also develop an embedding-based relevance model, an extension of the effective and robust relevance model approach. In these models, we transform the similarity values obtained by the widely-used cosine similarity with a sigmoid function to have more discriminative semantic similarity values. We evaluate our proposed methods using three TREC newswire and web collections. The experimental results demonstrate that the embedding-based methods significantly outperform competitive baselines in most cases. The embedding-based methods are also shown to be more robust than the baselines.

...read moreread less

Proceedings Article•

Attention-Based Convolutional Neural Network for Semantic Relation Extraction

[...]

Yatian Shen¹, Xuanjing Huang¹•Institutions (1)

Fudan University¹

01 Dec 2016

TL;DR: This model makes full use of word embedding, part-of-speech tag embedding and position embedding information and enables learning some important features from task-specific labeled data, forgoing the need for external knowledge such as explicit dependency structures.

...read moreread less

Abstract: Nowadays, neural networks play an important role in the task of relation classification. In this paper, we propose a novel attention-based convolutional neural network architecture for this task. Our model makes full use of word embedding, part-of-speech tag embedding and position embedding information. Word level attention mechanism is able to better determine which parts of the sentence are most influential with respect to the two entities of interest. This architecture enables learning some important features from task-specific labeled data, forgoing the need for external knowledge such as explicit dependency structures. Experiments on the SemEval-2010 Task 8 benchmark dataset show that our model achieves better performances than several state-of-the-art neural network models and can achieve a competitive performance just with minimal feature engineering.

...read moreread less

Journal Article•DOI•

A topic-enhanced word embedding for Twitter sentiment classification

[...]

Yafeng Ren¹, Ruimin Wang¹, Donghong Ji¹•Institutions (1)

Wuhan University¹

10 Nov 2016-Information Sciences

TL;DR: Experimental results on the dataset show that topic-enhanced word embedding is very effective for Twitter sentiment classification.

...read moreread less

Journal Article•DOI•

Cross-Modal Retrieval via Deep and Bidirectional Representation Learning

[...]

Yonghao He, Shiming Xiang, Cuicui Kang¹, Jian Wang, Chunhong Pan - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

01 Jul 2016-IEEE Transactions on Multimedia

TL;DR: A deep and bidirectional representation learning model is proposed to address the issue of image-text cross-modal retrieval and shows that the proposed architecture is effective and the learned representations have good semantics to achieve superior cross- modal retrieval performance.

...read moreread less

Abstract: Cross-modal retrieval emphasizes understanding inter-modality semantic correlations, which is often achieved by designing a similarity function. Generally, one of the most important things considered by the similarity function is how to make the cross-modal similarity computable. In this paper, a deep and bidirectional representation learning model is proposed to address the issue of image–text cross-modal retrieval. Owing to the solid progress of deep learning in computer vision and natural language processing, it is reliable to extract semantic representations from both raw image and text data by using deep neural networks. Therefore, in the proposed model, two convolution-based networks are adopted to accomplish representation learning for images and texts. By passing the networks, images and texts are mapped to a common space, in which the cross-modal similarity is measured by cosine distance. Subsequently, a bidirectional network architecture is designed to capture the property of the cross-modal retrieval—the bidirectional search. Such architecture is characterized by simultaneously involving the matched and unmatched image–text pairs for training. Accordingly, a learning framework with maximum likelihood criterion is finally developed. The network parameters are optimized via backpropagation and stochastic gradient descent. A great deal of experiments are conducted to sufficiently evaluate the proposed method on three publicly released datasets: IAPRTC-12, Flickr30k, and Flickr8k. The overall results definitely show that the proposed architecture is effective and the learned representations have good semantics to achieve superior cross-modal retrieval performance.

...read moreread less

Proceedings Article•DOI•

Are Word Embedding-based Features Useful for Sarcasm Detection?

[...]

Aditya Joshi¹, Vaibhav Tripathi¹, Kevin Patel¹, Pushpak Bhattacharyya¹, Mark J. Carman² - Show less +1 more•Institutions (2)

Indian Institute of Technology Bombay¹, Monash University²

01 Nov 2016

TL;DR: This article explored if prior work can be enhanced using semantic similarity/discordance between word embeddings, and augmented word embedding-based features to four feature sets reported in the past.

...read moreread less

Abstract: This paper makes a simple increment to state-of-the-art in sarcasm detection research. Existing approaches are unable to capture subtle forms of context incongruity which lies at the heart of sarcasm. We explore if prior work can be enhanced using semantic similarity/discordance between word embeddings. We augment word embedding-based features to four feature sets reported in the past. We also experiment with four types of word embeddings. We observe an improvement in sarcasm detection, irrespective of the word embedding used or the original feature set to which our features are augmented. For example, this augmentation results in an improvement in F-score of around 4\% for three out of these four feature sets, and a minor degradation in case of the fourth, when Word2Vec embeddings are used. Finally, a comparison of the four embeddings shows that Word2Vec and dependency weight-based features outperform LSA and GloVe, in terms of their benefit to sarcasm detection.

...read moreread less

Posted Content•

Semantics derived automatically from language corpora necessarily contain human biases.

[...]

Aylin Caliskan Islam, Joanna J. Bryson, Arvind Narayanan

25 Aug 2016-arXiv: Artificial Intelligence

TL;DR: It is shown for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language---the same sort of language humans are exposed to every day.

...read moreread less

Abstract: Artificial intelligence and machine learning are in a period of astounding growth. However, there are concerns that these technologies may be used, either with or without intention, to perpetuate the prejudice and unfairness that unfortunately characterizes many human institutions. Here we show for the first time that human-like semantic biases result from the application of standard machine learning to ordinary language---the same sort of language humans are exposed to every day. We replicate a spectrum of standard human biases as exposed by the Implicit Association Test and other well-known psychological studies. We replicate these using a widely used, purely statistical machine-learning model---namely, the GloVe word embedding---trained on a corpus of text from the Web. Our results indicate that language itself contains recoverable and accurate imprints of our historic biases, whether these are morally neutral as towards insects or flowers, problematic as towards race or gender, or even simply veridical, reflecting the status quo for the distribution of gender with respect to careers or first names. These regularities are captured by machine learning along with the rest of semantics. In addition to our empirical findings concerning language, we also contribute new methods for evaluating bias in text, the Word Embedding Association Test (WEAT) and the Word Embedding Factual Association Test (WEFAT). Our results have implications not only for AI and machine learning, but also for the fields of psychology, sociology, and human ethics, since they raise the possibility that mere exposure to everyday language can account for the biases we replicate here.

...read moreread less

Proceedings Article•DOI•

Estimating Embedding Vectors for Queries

[...]

Hamed Zamani¹, W. Bruce Croft¹•Institutions (1)

University of Massachusetts Amherst¹

12 Sep 2016

TL;DR: A theoretical framework for estimating query embedding vectors based on the individual embedding vector of vocabulary terms is proposed and a number of different implementations of this framework are provided and it is shown that the AWE method is a special case of the proposed framework.

...read moreread less

Abstract: The dense vector representation of vocabulary terms, also known as word embeddings, have been shown to be highly effective in many natural language processing tasks. Word embeddings have recently begun to be studied in a number of information retrieval (IR) tasks. One of the main steps in leveraging word embeddings for IR tasks is to estimate the embedding vectors of queries. This is a challenging task, since queries are not always available during the training phase of word embedding vectors. Previous work has considered the average or sum of embedding vectors of all query terms (AWE) to model the query embedding vectors, but no theoretical justification has been presented for such a model. In this paper, we propose a theoretical framework for estimating query embedding vectors based on the individual embedding vectors of vocabulary terms. We then provide a number of different implementations of this framework and show that the AWE method is a special case of the proposed framework. We also introduce pseudo query vectors, the query embedding vectors estimated using pseudo-relevant documents. We further extrinsically evaluate the proposed methods using two well-known IR tasks: query expansion and query classification. The estimated query embedding vectors are evaluated via query expansion experiments over three newswire and web TREC collections as well as query classification experiments over the KDD Cup 2005 test set. The experiments show that the introduced pseudo query vectors significantly outperform the AWE method.

...read moreread less

Proceedings Article•

Word embedding evaluation and combination

[...]

Sahar Ghannay, Benoit Favre¹, Yannick Estève, Nathalie Camelin•Institutions (1)

Aix-Marseille University¹

01 May 2016

TL;DR: New perceptions of intrinsic qualities of the famous word embedding families can be different from the ones provided by works previously published in the scientific literature are provided.

...read moreread less

Abstract: Word embeddings have been successfully used in several natural language processing tasks (NLP) and speech processing. Different approaches have been introduced to calculate word embeddings through neural networks. In the literature, many studies focused on word embedding evaluation, but for our knowledge, there are still some gaps. This paper presents a study focusing on a rigorous comparison of the performances of different kinds of word embeddings. These performances are evaluated on different NLP and linguistic tasks, while all the word embeddings are estimated on the same training data using the same vocabulary, the same number of dimensions, and other similar characteristics. The evaluation results reported in this paper match those in the literature, since they point out that the improvements achieved by a word embedding in one task are not consistently observed across all tasks. For that reason, this paper investigates and evaluates approaches to combine word embeddings in order to take advantage of their complementarity, and to look for the effective word embeddings that can achieve good performances on all tasks. As a conclusion, this paper provides new perceptions of intrinsic qualities of the famous word embedding families, which can be different from the ones provided by works previously published in the scientific literature.

...read moreread less

Book Chapter•DOI•

Learning Word Embeddings from Wikipedia for Content-Based Recommender Systems

[...]

Cataldo Musto¹, Giovanni Semeraro¹, Marco de Gemmis¹, Pasquale Lops¹•Institutions (1)

University of Bari¹

20 Mar 2016

TL;DR: This paper compared the effectiveness of three widespread approaches as Latent Semantic Indexing, Random Indexing and Word2Vec in the task of learning a vector space representation of both items to be recommended as well as user profiles.

...read moreread less

Abstract: In this paper we present a preliminary investigation towards the adoption of Word Embedding techniques in a content-based recommendation scenario. Specifically, we compared the effectiveness of three widespread approaches as Latent Semantic Indexing, Random Indexing and Word2Vec in the task of learning a vector space representation of both items to be recommended as well as user profiles.

...read moreread less

Proceedings Article•DOI•

Linked Document Embedding for Classification

[...]

Suhang Wang, Jiliang Tang¹, Charu C. Aggarwal², Huan Liu³•Institutions (3)

Michigan State University¹, IBM², Arizona State University³

24 Oct 2016

TL;DR: This paper proposes a linked document embedding framework LDE, which combines link and label information with content information to learn document representations for classification, and experiments demonstrate the effectiveness of the proposed framework.

...read moreread less

Abstract: Word and document embedding algorithms such as Skip-gram and Paragraph Vector have been proven to help various text analysis tasks such as document classification, document clustering and information retrieval. The vast majority of these algorithms are designed to work with independent and identically distributed documents. However, in many real-world applications, documents are inherently linked. For example, web documents such as blogs and online news often have hyperlinks to other web documents, and scientific articles usually cite other articles. Linked documents present new challenges to traditional document embedding algorithms. In addition, most existing document embedding algorithms are unsupervised and their learned representations may not be optimal for classification when labeling information is available. In this paper, we study the problem of linked document embedding for classification and propose a linked document embedding framework LDE, which combines link and label information with content information to learn document representations for classification. Experimental results on real-world datasets demonstrate the effectiveness of the proposed framework. Further experiments are conducted to understand the importance of link and label information in the proposed framework LDE.

...read moreread less

Journal Article•DOI•

An empirical convolutional neural network approach for semantic relation classification

[...]

Pengda Qin¹, Weiran Xu¹, Jun Guo¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

19 May 2016-Neurocomputing

TL;DR: This work deals with the relation classification task utilizing a convolutional neural network approach to automatically control feature learning from raw sentences and minimize the application of external toolkits and resources and shows that the proposed architecture significantly outperforms the state-of-the-art systems.

...read moreread less

Posted Content•

Using the Output Embedding to Improve Language Models

[...]

Ofir Press¹, Lior Wolf²•Institutions (2)

University of Washington¹, Tel Aviv University²

20 Aug 2016-arXiv: Computation and Language

TL;DR: The authors showed that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance and proposed a new method of regularizing the output embedding.

...read moreread less

Abstract: We study the topmost weight matrix of neural network language models. We show that this matrix constitutes a valid word embedding. When training language models, we recommend tying the input embedding and this output embedding. We analyze the resulting update rules and show that the tied embedding evolves in a more similar way to the output embedding than to the input embedding in the untied model. We also offer a new method of regularizing the output embedding. Our methods lead to a significant reduction in perplexity, as we are able to show on a variety of neural network language models. Finally, we show that weight tying can reduce the size of neural translation models to less than half of their original size without harming their performance.

...read moreread less

Posted Content•

Using Word Embeddings in Twitter Election Classification

[...]

Xiao Yang¹, Craig Macdonald¹, Iadh Ounis¹•Institutions (1)

University of Glasgow¹

22 Jun 2016-arXiv: Information Retrieval

TL;DR: This paper investigates the impact of the background dataset used to train the embedding models, as well as the parameters of the word embedding training process, namely the context window size, the dimensionality and the number of negative samples, on the attained classification performance and finds that large context window and dimension sizes are preferable to improve the performance.

...read moreread less

Abstract: Word embeddings and convolutional neural networks (CNN) have attracted extensive attention in various classification tasks for Twitter, e.g. sentiment classification. However, the effect of the configuration used to train and generate the word embeddings on the classification performance has not been studied in the existing literature. In this paper, using a Twitter election classification task that aims to detect election-related tweets, we investigate the impact of the background dataset used to train the embedding models, the context window size and the dimensionality of word embeddings on the classification performance. By comparing the classification results of two word embedding models, which are trained using different background corpora (e.g. Wikipedia articles and Twitter microposts), we show that the background data type should align with the Twitter classification dataset to achieve a better performance. Moreover, by evaluating the results of word embeddings models trained using various context window sizes and dimensionalities, we found that large context window and dimension sizes are preferable to improve the performance. Our experimental results also show that using word embeddings and CNN leads to statistically significant improvements over various baselines such as random, SVM with TF-IDF and SVM with word embeddings.

...read moreread less

Posted Content•

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

[...]

Shaohua Li, Tat-Seng Chua, Jun Zhu, Chunyan Miao

09 Jun 2016-arXiv: Computation and Language

TL;DR: A generative topic embedding model is proposed that performs better than eight existing methods, with fewer features, and can generate coherent topics even based on only one document.

...read moreread less

Abstract: Word embedding maps words into a low-dimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative topic embedding model to combine the two types of patterns. In our model, topics are represented by embedding vectors, and are shared across documents. The probability of each word is influenced by both its local context and its topic. A variational inference method yields the topic embeddings as well as the topic mixing proportions for each document. Jointly they represent the document in a low-dimensional continuous space. In two document classification tasks, our method performs better than eight existing methods, with fewer features. In addition, we illustrate with an example that our method can generate coherent topics even based on only one document.

...read moreread less

Proceedings Article•DOI•

Generative Topic Embedding: a Continuous Representation of Documents

[...]

Shaohua Li¹, Tat-Seng Chua², Jun Zhu³, Chunyan Miao¹•Institutions (3)

Nanyang Technological University¹, National University of Singapore², Tsinghua University³

01 Aug 2016

TL;DR: This article proposed a generative topic embedding model to combine word embedding and topic modeling, where topics are represented by embedding vectors and are shared across documents, and the probability of each word is influenced by both its local context and its topic.

...read moreread less

Abstract: Word embedding maps words into a lowdimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a low-dimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative topic embedding model to combine the two types of patterns. In our model, topics are represented by embedding vectors, and are shared across documents. The probability of each word is influenced by both its local context and its topic. A variational inference method yields the topic embeddings as well as the topic mixing proportions for each document. Jointly they represent the document in a low-dimensional continuous space. In two document classification tasks, our method performs better than eight existing methods, with fewer features. In addition, we illustrate with an example that our method can generate coherent topics even based on only one document.

...read moreread less

Proceedings Article•DOI•

INSIGHT-1 at SemEval-2016 Task 5: Deep Learning for Multilingual Aspect-based Sentiment Analysis

[...]

Sebastian Ruder¹, Parsa Ghaffari¹, John G. Breslin¹•Institutions (1)

National University of Ireland, Galway¹

01 Jun 2016

TL;DR: A constrained system (unconstrained for English) achieves competitive results across all languages and domains, placing first or second in 5 and 7 out of 11 language-domain pairs for aspect category detection and sentiment polarity respectively, thereby demonstrating the viability of a deep learning-based approach for multilingual aspect-based sentiment analysis.

...read moreread less

Abstract: This paper describes our deep learning-based approach to multilingual aspect-based sentiment analysis as part of SemEval 2016 Task 5. We use a convolutional neural network (CNN) for both aspect extraction and aspect-based sentiment analysis. We cast aspect extraction as a multi-label classification problem, outputting probabilities over aspects parameterized by a threshold. To determine the sentiment towards an aspect, we concatenate an aspect vector with every word embedding and apply a convolution over it. Our constrained system (unconstrained for English) achieves competitive results across all languages and domains, placing first or second in 5 and 7 out of 11 language-domain pairs for aspect category detection (slot 1) and sentiment polarity (slot 3) respectively, thereby demonstrating the viability of a deep learning-based approach for multilingual aspect-based sentiment analysis.

...read moreread less

Journal Article•DOI•

Word Embeddings as Metric Recovery in Semantic Spaces

[...]

Tatsunori Hashimoto¹, David Alvarez-Melis¹, Tommi S. Jaakkola¹•Institutions (1)

Massachusetts Institute of Technology¹

05 Jul 2016-Transactions of the Association for Computational Linguistics

TL;DR: A simple, principled, direct metric recovery algorithm is proposed that performs on par with the state-of-the-art word embedding and manifold learning methods and is complemented by constructing two new inductive reasoning datasets and demonstrating that word embeddings can be used to solve them.

...read moreread less

Abstract: Continuous word representations have been remarkably useful across NLP tasks but remain poorly understood. We ground word embeddings in semantic spaces studied in the cognitive-psychometric literature, taking these spaces as the primary objects to recover. To this end, we relate log co-occurrences of words in large corpora to semantic similarity assessments and show that co-occurrences are indeed consistent with an Euclidean semantic space hypothesis. Framing word embedding as metric recovery of a semantic space unifies existing word embedding algorithms, ties them to manifold learning, and demonstrates that existing algorithms are consistent metric recovery methods given co-occurrence counts from random walks. Furthermore, we propose a simple, principled, direct metric recovery algorithm that performs on par with the state-of-the-art word embedding and manifold learning methods. Finally, we complement recent focus on analogies by constructing two new inductive reasoning datasets---series completion and classification---and demonstrate that word embeddings can be used to solve them as well.

...read moreread less

Collapse