Showing papers on "Word embedding published in 2018"

PDF

Open Access

Posted Content•

[...]

Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Lyn Untalan Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil - Show less +9 more

29 Mar 2018-arXiv: Computation and Language

TL;DR: It is found that transfer learning using sentence embeddings tends to outperform word level transfer with surprisingly good performance with minimal amounts of supervised training data for a transfer task.

...read moreread less

Abstract: We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.

...read moreread less

1,259 citations

Proceedings Article•

Contextual String Embeddings for Sequence Labeling

[...]

Alan Akbik¹, Duncan A. J. Blythe², Roland Vollgraf•Institutions (2)

IBM¹, German Center for Neurodegenerative Diseases²

01 Aug 2018

TL;DR: This paper proposes to leverage the internal states of a trained character language model to produce a novel type of word embedding which they refer to as contextual string embeddings, which are fundamentally model words as sequences of characters and are contextualized by their surrounding text.

...read moreread less

Abstract: Recent advances in language modeling using recurrent neural networks have made it viable to model language as distributions over characters. By learning to predict the next character on the basis of previous characters, such models have been shown to automatically internalize linguistic concepts such as words, sentences, subclauses and even sentiment. In this paper, we propose to leverage the internal states of a trained character language model to produce a novel type of word embedding which we refer to as contextual string embeddings. Our proposed embeddings have the distinct properties that they (a) are trained without any explicit notion of words and thus fundamentally model words as sequences of characters, and (b) are contextualized by their surrounding text, meaning that the same word will have different embeddings depending on its contextual use. We conduct a comparative evaluation against previous embeddings and find that our embeddings are highly useful for downstream tasks: across four classic sequence labeling tasks we consistently outperform the previous state-of-the-art. In particular, we significantly outperform previous work on English and German named entity recognition (NER), allowing us to report new state-of-the-art F1-scores on the CoNLL03 shared task. We release all code and pre-trained language models in a simple-to-use framework to the research community, to enable reproduction of these experiments and application of our proposed embeddings to other tasks: https://github.com/zalandoresearch/flair

...read moreread less

1,152 citations

Proceedings Article•DOI•

Universal Sentence Encoder for English

[...]

Daniel Cer¹, Yinfei Yang², Sheng-yi Kong³, Nan Hua¹, Nicole Lyn Untalan Limtiaco, Rhomni St. John, Noah Constant¹, Mario Guajardo-Cespedes, Steve Yuan¹, Chris Tar¹, Brian Strope¹, Ray Kurzweil¹ - Show less +8 more•Institutions (3)

Google¹, Amazon.com², National Taiwan University³

01 Nov 2018

TL;DR: Transfer learning using sentence-level embeddings is shown to outperform models without transfer learning and often those that use only word-level transfer.

...read moreread less

Abstract: We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer performance. Model variants allow for trade-offs between accuracy and compute resources. We report the relationship between model complexity, resources, and transfer performance. Comparisons are made with baselines without transfer learning and to baselines that incorporate word-level transfer. Transfer learning using sentence-level embeddings is shown to outperform models without transfer learning and often those that use only word-level transfer. We show good transfer task performance with minimal training data and obtain encouraging results on word embedding association tests (WEAT) of model bias.

...read moreread less

876 citations

Journal Article•DOI•

Word embeddings quantify 100 years of gender and ethnic stereotypes.

[...]

Nikhil Garg¹, Londa Schiebinger¹, Dan Jurafsky¹, James Zou¹•Institutions (1)

Stanford University¹

17 Apr 2018-Proceedings of the National Academy of Sciences of the United States of America

TL;DR: A framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States is developed.

...read moreread less

Abstract: Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.

...read moreread less

728 citations

Proceedings Article•DOI•

Zero-Shot Recognition via Semantic Embeddings and Knowledge Graphs

[...]

Xiaolong Wang¹, Yufei Ye², Abhinav Gupta²•Institutions (2)

Facebook¹, Carnegie Mellon University²

18 Jun 2018

TL;DR: In this article, a graph convolutional network (GCN) is used to predict the visual classifiers of unseen categories, which is robust to noise in the learned knowledge graph (KG) given a semantic embedding for each node (representing visual category).

...read moreread less

Abstract: We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this paper, we build upon the recently introduced Graph Convolutional Network (GCN) and propose an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers. Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category). After a series of graph convolutions, we predict the visual classifier for each category. During training, the visual classifiers for a few categories are given to learn the GCN parameters. At test time, these filters are used to predict the visual classifiers of unseen categories. We show that our approach is robust to noise in the KG. More importantly, our approach provides significant improvement in performance compared to the current state-of-the-art results (from 2 ~ 3% on some metrics to whopping 20% on a few).

...read moreread less

570 citations

Proceedings Article•DOI•

Dissecting Contextual Word Embeddings: Architecture and Representation

[...]

Matthew E. Peters¹, Mark Neumann¹, Luke Zettlemoyer², Wen-tau Yih³•Institutions (3)

Allen Institute for Artificial Intelligence¹, University of Washington², Microsoft³

01 Jan 2018

TL;DR: There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

...read moreread less

Abstract: Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

...read moreread less

351 citations

Proceedings Article•DOI•

Learning Gender-Neutral Word Embeddings

[...]

Jieyu Zhao¹, Yichao Zhou², Zeyu Li¹, Wei Wang¹, Kai-Wei Chang³ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, Verizon Communications², Boston University³

01 Aug 2018

TL;DR: This article proposed a novel training procedure for learning gender-neutral word embeddings, which aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence.

...read moreread less

Abstract: Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings Our approach aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence Based on the proposed method, we generate a Gender-Neutral variant of GloVe (GN-GloVe) Quantitative and qualitative experiments demonstrate that GN-GloVe successfully isolates gender information without sacrificing the functionality of the embedding model

...read moreread less

319 citations

Proceedings Article•DOI•

On the Limitations of Unsupervised Bilingual Dictionary Induction

[...]

Anders Søgaard¹, Sebastian Ruder², Ivan Vulić³•Institutions (3)

University of Copenhagen¹, National University of Ireland, Galway², University of Cambridge³

09 May 2018

TL;DR: This article showed that weak supervision from identical words enables more robust dictionary induction and established a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

...read moreread less

Abstract: Unsupervised machine translation - i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora - seems impossible, but nevertheless, Lample et al. (2017) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised cross-lingual word embedding technique for bilingual dictionary induction (Conneau et al., 2017), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

...read moreread less

256 citations

Journal Article•DOI•

The Geometry of Culture: Analyzing Meaning through Word Embeddings.

[...]

Austin C. Kozlowski, Matt Taddy, James A. Evans

25 Mar 2018-arXiv: Computation and Language

TL;DR: The authors demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods.

...read moreread less

Abstract: We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods. Word embeddings represent semantic relations between words as geometric relationships between vectors in a high-dimensional space, operationalizing a relational model of meaning consistent with contemporary theories of identity and culture. We show that dimensions induced by word differences (e.g. man - woman, rich - poor, black - white, liberal - conservative) in these vector spaces closely correspond to dimensions of cultural meaning, and the projection of words onto these dimensions reflects widely shared cultural connotations when compared to surveyed responses and labeled historical data. We pilot a method for testing the stability of these associations, then demonstrate applications of word embeddings for macro-cultural investigation with a longitudinal analysis of the coevolution of gender and class associations in the United States over the 20th century and a comparative analysis of historic distinctions between markers of gender and class in the U.S. and Britain. We argue that the success of these high-dimensional models motivates a move towards "high-dimensional theorizing" of meanings, identities and cultural processes.

...read moreread less

240 citations

Proceedings Article•

Generalizing and Improving Bilingual Word Embedding Mappings with a Multi-Step Framework of Linear Transformations.

[...]

Mikel Artetxe¹, Gorka Labaka¹, Eneko Agirre¹•Institutions (1)

University of the Basque Country¹

27 Apr 2018

TL;DR: A multi-step framework of linear transformations that generalizes a substantial body of previous work is proposed that allows new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction.

...read moreread less

Abstract: Using a dictionary to map independently trained word embeddings to a shared space has shown to be an effective approach to learn bilingual word embeddings. In this work, we propose a multi-step framework of linear transformations that generalizes a substantial body of previous work. The core step of the framework is an orthogonal transformation, and existing methods can be explained in terms of the additional normalization, whitening, re-weighting, de-whitening and dimensionality reduction steps. This allows us to gain new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction. The corresponding software is released as an open source project.

...read moreread less

237 citations

Proceedings Article•DOI•

Botnet Detection in the Internet of Things using Deep Learning Approaches

[...]

Christopher D. McDermott¹, Farzan Majdani¹, Andrei Petrovski¹•Institutions (1)

Robert Gordon University¹

08 Jul 2018

TL;DR: The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time.

...read moreread less

Abstract: The recent growth of the Internet of Things (IoT) has resulted in a rise in IoT based DDoS attacks. This paper presents a solution to the detection of botnet activity within consumer IoT devices and networks. A novel application of Deep Learning is used to develop a detection model based on a Bidirectional Long Short Term Memory based Recurrent Neural Network (BLSTM-RNN). Word Embedding is used for text recognition and conversion of attack packets into tokenised integer format. The developed BLSTM-RNN detection model is compared to a LSTM-RNN for detecting four attack vectors used by the mirai botnet, and evaluated for accuracy and loss. The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time. A labelled dataset was generated as part of this research, and is available upon request.

...read moreread less

Proceedings Article•DOI•

Baseline Needs More Love: On Simple Word-Embedding-Based Models and Associated Pooling Mechanisms

[...]

Dinghan Shen¹, Guoyin Wang¹, Wenlin Wang¹, Martin Renqiang Min, Qinliang Su², Yizhe Zhang³, Chunyuan Li¹, Ricardo Henao¹, Lawrence Carin¹ - Show less +5 more•Institutions (3)

Duke University¹, Sun Yat-sen University², Microsoft³

01 Jan 2018

TL;DR: This paper conducted a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models.

...read moreread less

Abstract: Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

...read moreread less

Proceedings Article•

Diachronic word embeddings and semantic shifts: a survey

[...]

Andrey Kutuzov¹, Lilja Øvrelid¹, Terrence Szymanski², Erik Velldal¹•Institutions (2)

University of Oslo¹, University College Dublin²

09 Jun 2018

TL;DR: This paper surveys the current state of academic research related to diachronic word embeddings and semantic shifts detection, and proposes several axes along which these methods can be compared, and outlines the main challenges before this emerging subfield of NLP.

...read moreread less

Abstract: Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

...read moreread less

Journal Article•DOI•

Sentiment Polarity Detection for Software Development

[...]

Fabio Calefato¹, Filippo Lanubile¹, Federico Maiorano¹, Nicole Novielli¹•Institutions (1)

University of Bari¹

01 Jun 2018-Empirical Software Engineering

TL;DR: Senti4SD as mentioned in this paper is a classifier specifically trained to support sentiment analysis in developers' communication channels, which is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity.

...read moreread less

Abstract: The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.

...read moreread less

Posted Content•

Dissecting Contextual Word Embeddings: Architecture and Representation

[...]

Matthew E. Peters¹, Mark Neumann¹, Luke Zettlemoyer², Wen-tau Yih³•Institutions (3)

Allen Institute for Artificial Intelligence¹, University of Washington², Microsoft³

27 Aug 2018-arXiv: Computation and Language

TL;DR: This article showed that the choice of neural architecture (e.g., LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned.

...read moreread less

Proceedings Article•DOI•

Gromov-Wasserstein Alignment of Word Embedding Spaces

[...]

David Alvarez-Melis¹, Tommi S. Jaakkola¹•Institutions (1)

Massachusetts Institute of Technology¹

31 Aug 2018

TL;DR: The authors cast the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms, and exploit the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages.

...read moreread less

Abstract: Cross-lingual or cross-domain correspondences play key roles in tasks ranging from machine translation to transfer learning. Recently, purely unsupervised methods operating on monolingual embeddings have become effective alignment tools. Current state-of-the-art methods, however, involve multiple steps, including heuristic post-hoc refinement strategies. In this paper, we cast the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms. Indeed, we exploit the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages. We show that our OT objective can be estimated efficiently, requires little or no tuning, and results in performance comparable with the state-of-the-art in various unsupervised word translation tasks.

...read moreread less

Journal Article•DOI•

Deep Learning Approach for Short-Term Stock Trends Prediction Based on Two-Stream Gated Recurrent Unit Network

[...]

Dang Lien Minh¹, Abolghasem Sadeghi-Niaraki¹, Huynh Duc Huy, Kyungbok Min¹, Hyeonjoon Moon¹ - Show less +1 more•Institutions (1)

Sejong University¹

06 Sep 2018-IEEE Access

TL;DR: A novel framework to predict the directions of stock prices by using both financial news and sentiment dictionary is proposed and outperforms state-of-the-art models and is more efficient in dealing with financial datasets.

...read moreread less

Abstract: Financial news has been proven to be a crucial factor which causes fluctuations in stock prices. However, previous studies heavily relied on analyzing shallow features and ignored the structural relation among words in a sentence. Several sentiment analysis studies have tried to point out the relationship between investors’ reaction and news events. However, the sentiment dataset was usually constructed from the lingual dataset which is unrelated to the financial sector and led to poor performance. This paper proposes a novel framework to predict the directions of stock prices by using both financial news and sentiment dictionary. The original contributions of this paper include the proposal of a novel two-stream gated recurrent unit network and Stock2Vec—a sentiment word embedding trained on financial news dataset and Harvard IV-4. Two main experiments are conducted: the first experiment predicts SP 2) Stock2Vec is more efficient in dealing with financial datasets; and 3) applying the model, a simulation scenario proves that our model is effective for the stock sector.

...read moreread less

Book Chapter•DOI•

Word Embedding for Understanding Natural Language: A Survey

[...]

Yang Li¹, Tao Yang¹•Institutions (1)

Northwestern Polytechnical University¹

01 Jan 2018

TL;DR: This survey introduces the motivation and background of word embedding, the methods of text representation as preliminaries, as well as some existingword embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics.

...read moreread less

Abstract: Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP) The extracted features thus could be organized in low dimensional space Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms In this survey, we first introduce the motivation and background of word embedding Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics In the end, we summarize the applications of word embedding and discuss its future directions

...read moreread less

Proceedings Article•DOI•

Retrieval on source code: a neural code search

[...]

Saksham Sachdev¹, Hongyu Li², Sifei Luan², Seohyun Kim², Koushik Sen³, Satish Chandra² - Show less +2 more•Institutions (3)

University of Waterloo¹, Facebook², University of California, Berkeley³

18 Jun 2018

TL;DR: This paper investigates the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand.

...read moreread less

Abstract: Searching over large code corpora can be a powerful productivity tool for both beginner and experienced developers because it helps them quickly find examples of code related to their intent. Code search becomes even more attractive if developers could express their intent in natural language, similar to the interaction that Stack Overflow supports. In this paper, we investigate the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand. Our experiments using a benchmark suite derived from Stack Overflow and GitHub repositories show promising results. We find that while a basic word–embedding based search procedure works acceptably, better results can be obtained by adding a layer of supervision, as well as by a customized ranking strategy.

...read moreread less

Proceedings Article•DOI•

Interpretable Adversarial Perturbation in Input Embedding Space for Text

[...]

Motoki Sato, Jun Suzuki, Hiroyuki Shindo¹, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Jul 2018

TL;DR: This paper restores interpretability to adversarial training methods by restricting the directions of perturbations toward the existing words in the input embedding space and can straightforwardly reconstruct each input with perturbATIONS to an actual text by considering the perturbation to be the replacement of words in a sentence while maintaining or even improving the task performance.

...read moreread less

Abstract: Following great success in the image processing field, the idea of adversarial training has been applied to tasks in the natural language processing (NLP) field. One promising approach directly applies adversarial training developed in the image processing field to the input word embedding space instead of the discrete input space of texts. However, this approach abandons such interpretability as generating adversarial texts to significantly improve the performance of NLP tasks. This paper restores interpretability to such methods by restricting the directions of perturbations toward the existing words in the input embedding space. As a result, we can straightforwardly reconstruct each input with perturbations to an actual text by considering the perturbations to be the replacement of words in the sentence while maintaining or even improving the task performance.

...read moreread less

Proceedings Article•

On the dimensionality of word embedding

[...]

Zi Yin¹, Yuanyuan Shen²•Institutions (2)

Stanford University¹, Microsoft²

03 Dec 2018

TL;DR: In this article, the Pairwise Inner Product (PIP) loss is proposed to measure the dissimilarity between word embeddings and reveal a fundamental bias-variance trade-off in dimensionality selection.

...read moreread less

Abstract: In this paper, we provide a theoretical understanding of word embedding and its dimensionality. Motivated by the unitary-invariance of word embedding, we propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings. Using techniques from matrix perturbation theory, we reveal a fundamental bias-variance trade-off in dimensionality selection for word embeddings. This bias-variance trade-off sheds light on many empirical observations which were previously unexplained, for example the existence of an optimal dimensionality. Moreover, new insights and discoveries, like when and how word embeddings are robust to over-fitting, are revealed. By optimizing over the bias-variance trade-off of the PIP loss, we can explicitly answer the open question of dimensionality selection for word embedding.

...read moreread less

Proceedings Article•DOI•

Short-Text Topic Modeling via Non-negative Matrix Factorization Enriched with Local Word-Context Correlations

[...]

Tian Shi¹, Kyeongpil Kang², Jaegul Choo², Chandan K. Reddy¹•Institutions (2)

Virginia Tech¹, Korea University²

10 Apr 2018

TL;DR: A semantics-assisted non-negative matrix factorization (SeaNMF) model to discover topics for the short texts effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of the corpus.

...read moreread less

Abstract: Being a prevalent form of social communications on the Internet, billions of short texts are generated everyday. Discovering knowledge from them has gained a lot of interest from both industry and academia. The short texts have a limited contextual information, and they are sparse, noisy and ambiguous, and hence, automatically learning topics from them remains an important challenge. To tackle this problem, in this paper, we propose a semantics-assisted non-negative matrix factorization (SeaNMF) model to discover topics for the short texts. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of the corpus. The SeaNMF model is solved using a block coordinate descent algorithm. We also develop a sparse variant of the SeaNMF model which can achieve a better model interpretability. Extensive quantitative evaluations on various real-world short text datasets demonstrate the superior performance of the proposed models over several other state-of-the-art methods in terms of topic coherence and classification accuracy. The qualitative semantic analysis demonstrates the interpretability of our models by discovering meaningful and consistent topics. With a simple formulation and the superior performance, SeaNMF can be an effective standard topic model for short texts.

...read moreread less

Journal Article•DOI•

Refining Word Embeddings Using Intensity Scores for Sentiment Analysis

[...]

Liang-Chih Yu¹, Jin Wang¹, K. Robert Lai¹, Xuejie Zhang²•Institutions (2)

Yuan Ze University¹, Yunnan University²

01 Mar 2018-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: A word vector refinement model is proposed to refine existing pretrained word vectors using real-valued sentiment intensity scores provided by sentiment lexicons to improve each word vector such that it can be closer in the lexicon to both semantically and sentimentally similar words.

...read moreread less

Abstract: Word embeddings that provide continuous low-dimensional vector representations of words have been extensively used for various natural language processing tasks. However, existing context-based word embeddings such as Word2vec and GloVe typically fail to capture sufficient sentiment information, which may result in words with similar vector representations having an opposite sentiment polarity (e.g., good and bad ), thus degrading sentiment analysis performance. To tackle this problem, recent studies have suggested learning sentiment embeddings to incorporate the sentiment polarity (positive and negative) information from labeled corpora. This study adopts another strategy to learn sentiment embeddings. Instead of creating a new word embedding from labeled corpora, we propose a word vector refinement model to refine existing pretrained word vectors using real-valued sentiment intensity scores provided by sentiment lexicons. The idea of the refinement model is to improve each word vector such that it can be closer in the lexicon to both semantically and sentimentally similar words (i.e., those with similar intensity scores) and further away from sentimentally dissimilar words (i.e., those with dissimilar intensity scores). An obvious advantage of the proposed method is that it can be applied to any pretrained word embeddings. In addition, the intensity scores can provide more fine-grained (real-valued) sentiment information than binary polarity labels to guide the refinement process. Experimental results show that the proposed refinement model can improve both conventional word embeddings and previously proposed sentiment embeddings for binary, ternary, and fine-grained sentiment classification on the SemEval and Stanford Sentiment Treebank datasets.

...read moreread less

Proceedings Article•

Hierarchical LSTM for Sign Language Translation

[...]

Dan Guo¹, Wengang Zhou², Houqiang Li², Meng Wang¹•Institutions (2)

Hefei University of Technology¹, University of Science and Technology of China²

27 Apr 2018

TL;DR: A hierarchical-LSTM (HLSTM) encoderdecoder model with visual content and word embedding for SLT exhibits promising performance on singer-independent test with seen sentences and also outperforms the comparison algorithms on unseen sentences.

...read moreread less

Abstract: Continuous Sign Language Translation (SLT) is a challenging task due to its specific linguistics under sequential gesture variation without word alignment. Current hybrid HMM and CTC (Connectionist temporal classification) based models are proposed to solve frame or word level alignment. They may fail to tackle the cases with messing word order corresponding to visual content in sentences. To solve the issue, this paper proposes a hierarchical-LSTM (HLSTM) encoder-decoder model with visual content and word embedding for SLT. It tackles different granularities by conveying spatio-temporal transitions among frames, clips and viseme units. It firstly explores spatio-temporal cues of video clips by 3D CNN and packs appropriate visemes by online key clip mining with adaptive variable-length. After pooling on recurrent outputs of the top layer of HLSTM, a temporal attention-aware weighting mechanism is proposed to balance the intrinsic relationship among viseme source positions. At last, another two LSTM layers are used to separately recurse viseme vectors and translate semantic. After preserving original visual content by 3D CNN and the top layer of HLSTM, it shortens the encoding time step of the bottom two LSTM layers with less computational complexity while attaining more nonlinearity. Our proposed model exhibits promising performance on singer-independent test with seen sentences and also outperforms the comparison algorithms on unseen sentences.

...read moreread less

Posted Content•

Learning Gender-Neutral Word Embeddings

[...]

Jieyu Zhao¹, Yichao Zhou², Zeyu Li¹, Wei Wang¹, Kai-Wei Chang³ - Show less +1 more•Institutions (3)

University of California, Los Angeles¹, Verizon Communications², Boston University³

29 Aug 2018-arXiv: Computation and Language

TL;DR: A novel training procedure for learning gender-neutral word embeddings that preserves gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence is proposed.

...read moreread less

Abstract: Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications. However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs. To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings. Our approach aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence. Based on the proposed method, we generate a Gender-Neutral variant of GloVe (GN-GloVe). Quantitative and qualitative experiments demonstrate that GN-GloVe successfully isolates gender information without sacrificing the functionality of the embedding model.

...read moreread less

Journal Article•DOI•

Neural information retrieval: at the end of the early years

[...]

Kezban Dilek Onal¹, Kezban Dilek Onal², Ye Zhang³, Ismail Sengor Altingovde¹, Md. Mustafizur Rahman³, Pinar Karagoz¹, Alexander Braylan³, Brandon Dang³, Heng-Lu Chang³, Henna Kim³, Quinten McNamara³, Aaron Angert⁴, Edward Banner⁵, Vivek Khetan³, Tyler McDonnell³, An Thanh Nguyen³, Dan Xu³, Byron C. Wallace⁵, Maarten de Rijke², Matthew Lease³ - Show less +16 more•Institutions (5)

Middle East Technical University¹, University of Amsterdam², University of Texas at Austin³, IBM⁴, Northeastern University⁵

01 Jun 2018-Information Retrieval

TL;DR: The successes of neural IR thus far are highlighted, obstacles to its wider adoption are cataloged, and potentially promising directions for future research are suggested.

...read moreread less

Abstract: A recent “third wave” of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, we survey the current landscape of Neural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research.

...read moreread less

Journal Article•DOI•

OntoSenticNet: A Commonsense Ontology for Sentiment Analysis

[...]

Mauro Dragoni¹, Soujanya Poria², Erik Cambria²•Institutions (2)

fondazione bruno kessler¹, Nanyang Technological University²

31 Jul 2018-IEEE Intelligent Systems

TL;DR: OntoSenticNet is presented, a commonsense ontology for sentiment analysis based on SenticNet, a semantic network of 100,000 concepts based on conceptual primitives that has the capability of associating each concept with annotations contained in external resources.

...read moreread less

Abstract: In this work, we present OntoSenticNet, a commonsense ontology for sentiment analysis based on SenticNet, a semantic network of 100,000 concepts based on conceptual primitives. The key characteristics of OntoSenticNet are: (i) the definition of precise conceptual hierarchy and properties associating concepts and sentiment values; (ii) the support for connecting external information (e.g., word embedding, domain information, and different polarity representations) to each individual defined within the ontology; and (iii) the capability of associating each concept with annotations contained in external resources (e.g., documents and multimodal resources).

...read moreread less

Posted Content•

Diachronic word embeddings and semantic shifts: a survey

[...]

Andrey Kutuzov¹, Lilja Øvrelid¹, Terrence Szymanski², Erik Velldal¹•Institutions (2)

University of Oslo¹, University College Dublin²

09 Jun 2018-arXiv: Computation and Language

TL;DR: A survey of the current state of academic research related to diachronic word embeddings and semantic shifts detection can be found in this article, where the authors discuss the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models.

...read moreread less

Posted Content•

Gromov-Wasserstein Alignment of Word Embedding Spaces.

[...]

David Alvarez-Melis¹, Tommi S. Jaakkola¹•Institutions (1)

Massachusetts Institute of Technology¹

31 Aug 2018-arXiv: Computation and Language

TL;DR: This paper casts the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms and exploits the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages.

...read moreread less

Journal Article•DOI•

Using a stacked residual LSTM model for sentiment intensity prediction

[...]

Jin Wang¹, Bo Peng¹, Xuejie Zhang¹•Institutions (1)

Yunnan University¹

17 Dec 2018-Neurocomputing

TL;DR: A stacked residual LSTM model to predict sentiment intensity for a given text that outperforms lexicon- and regression-based methods proposed in previous studies and makes the deeper network easier to optimize.

...read moreread less

Collapse