scispace - formally typeset
Search or ask a question

Showing papers on "Word embedding published in 2018"


Posted Content
TL;DR: It is found that transfer learning using sentence embeddings tends to outperform word level transfer with surprisingly good performance with minimal amounts of supervised training data for a transfer task.
Abstract: We present models for encoding sentences into embedding vectors that specifically target transfer learning to other NLP tasks. The models are efficient and result in accurate performance on diverse transfer tasks. Two variants of the encoding models allow for trade-offs between accuracy and compute resources. For both variants, we investigate and report the relationship between model complexity, resource consumption, the availability of transfer task training data, and task performance. Comparisons are made with baselines that use word level transfer learning via pretrained word embeddings as well as baselines do not use any transfer learning. We find that transfer learning using sentence embeddings tends to outperform word level transfer. With transfer learning via sentence embeddings, we observe surprisingly good performance with minimal amounts of supervised training data for a transfer task. We obtain encouraging results on Word Embedding Association Tests (WEAT) targeted at detecting model bias. Our pre-trained sentence encoding models are made freely available for download and on TF Hub.

1,259 citations


Proceedings Article
01 Aug 2018
TL;DR: This paper proposes to leverage the internal states of a trained character language model to produce a novel type of word embedding which they refer to as contextual string embeddings, which are fundamentally model words as sequences of characters and are contextualized by their surrounding text.
Abstract: Recent advances in language modeling using recurrent neural networks have made it viable to model language as distributions over characters. By learning to predict the next character on the basis of previous characters, such models have been shown to automatically internalize linguistic concepts such as words, sentences, subclauses and even sentiment. In this paper, we propose to leverage the internal states of a trained character language model to produce a novel type of word embedding which we refer to as contextual string embeddings. Our proposed embeddings have the distinct properties that they (a) are trained without any explicit notion of words and thus fundamentally model words as sequences of characters, and (b) are contextualized by their surrounding text, meaning that the same word will have different embeddings depending on its contextual use. We conduct a comparative evaluation against previous embeddings and find that our embeddings are highly useful for downstream tasks: across four classic sequence labeling tasks we consistently outperform the previous state-of-the-art. In particular, we significantly outperform previous work on English and German named entity recognition (NER), allowing us to report new state-of-the-art F1-scores on the CoNLL03 shared task. We release all code and pre-trained language models in a simple-to-use framework to the research community, to enable reproduction of these experiments and application of our proposed embeddings to other tasks: https://github.com/zalandoresearch/flair

1,152 citations


Proceedings ArticleDOI
01 Nov 2018
TL;DR: Transfer learning using sentence-level embeddings is shown to outperform models without transfer learning and often those that use only word-level transfer.
Abstract: We present easy-to-use TensorFlow Hub sentence embedding models having good task transfer performance. Model variants allow for trade-offs between accuracy and compute resources. We report the relationship between model complexity, resources, and transfer performance. Comparisons are made with baselines without transfer learning and to baselines that incorporate word-level transfer. Transfer learning using sentence-level embeddings is shown to outperform models without transfer learning and often those that use only word-level transfer. We show good transfer task performance with minimal training data and obtain encouraging results on word embedding association tests (WEAT) of model bias.

876 citations


Journal ArticleDOI
TL;DR: A framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States is developed.
Abstract: Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.

728 citations


Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this article, a graph convolutional network (GCN) is used to predict the visual classifiers of unseen categories, which is robust to noise in the learned knowledge graph (KG) given a semantic embedding for each node (representing visual category).
Abstract: We consider the problem of zero-shot recognition: learning a visual classifier for a category with zero training examples, just using the word embedding of the category and its relationship to other categories, which visual data are provided. The key to dealing with the unfamiliar or novel category is to transfer knowledge obtained from familiar classes to describe the unfamiliar class. In this paper, we build upon the recently introduced Graph Convolutional Network (GCN) and propose an approach that uses both semantic embeddings and the categorical relationships to predict the classifiers. Given a learned knowledge graph (KG), our approach takes as input semantic embeddings for each node (representing visual category). After a series of graph convolutions, we predict the visual classifier for each category. During training, the visual classifiers for a few categories are given to learn the GCN parameters. At test time, these filters are used to predict the visual classifiers of unseen categories. We show that our approach is robust to noise in the KG. More importantly, our approach provides significant improvement in performance compared to the current state-of-the-art results (from 2 ~ 3% on some metrics to whopping 20% on a few).

570 citations


Proceedings ArticleDOI
01 Jan 2018
TL;DR: There is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks, suggesting that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.
Abstract: Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

351 citations


Proceedings ArticleDOI
01 Aug 2018
TL;DR: This article proposed a novel training procedure for learning gender-neutral word embeddings, which aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence.
Abstract: Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings Our approach aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence Based on the proposed method, we generate a Gender-Neutral variant of GloVe (GN-GloVe) Quantitative and qualitative experiments demonstrate that GN-GloVe successfully isolates gender information without sacrificing the functionality of the embedding model

319 citations


Proceedings ArticleDOI
09 May 2018
TL;DR: This article showed that weak supervision from identical words enables more robust dictionary induction and established a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.
Abstract: Unsupervised machine translation - i.e., not assuming any cross-lingual supervision signal, whether a dictionary, translations, or comparable corpora - seems impossible, but nevertheless, Lample et al. (2017) recently proposed a fully unsupervised machine translation (MT) model. The model relies heavily on an adversarial, unsupervised cross-lingual word embedding technique for bilingual dictionary induction (Conneau et al., 2017), which we examine here. Our results identify the limitations of current unsupervised MT: unsupervised bilingual dictionary induction performs much worse on morphologically rich languages that are not dependent marking, when monolingual corpora from different domains or different embedding algorithms are used. We show that a simple trick, exploiting a weak supervision signal from identical words, enables more robust induction and establish a near-perfect correlation between unsupervised bilingual dictionary induction performance and a previously unexplored graph similarity metric.

256 citations


Journal ArticleDOI
TL;DR: The authors demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods.
Abstract: We demonstrate the utility of a new methodological tool, neural-network word embedding models, for large-scale text analysis, revealing how these models produce richer insights into cultural associations and categories than possible with prior methods. Word embeddings represent semantic relations between words as geometric relationships between vectors in a high-dimensional space, operationalizing a relational model of meaning consistent with contemporary theories of identity and culture. We show that dimensions induced by word differences (e.g. man - woman, rich - poor, black - white, liberal - conservative) in these vector spaces closely correspond to dimensions of cultural meaning, and the projection of words onto these dimensions reflects widely shared cultural connotations when compared to surveyed responses and labeled historical data. We pilot a method for testing the stability of these associations, then demonstrate applications of word embeddings for macro-cultural investigation with a longitudinal analysis of the coevolution of gender and class associations in the United States over the 20th century and a comparative analysis of historic distinctions between markers of gender and class in the U.S. and Britain. We argue that the success of these high-dimensional models motivates a move towards "high-dimensional theorizing" of meanings, identities and cultural processes.

240 citations


Proceedings Article
27 Apr 2018
TL;DR: A multi-step framework of linear transformations that generalizes a substantial body of previous work is proposed that allows new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction.
Abstract: Using a dictionary to map independently trained word embeddings to a shared space has shown to be an effective approach to learn bilingual word embeddings. In this work, we propose a multi-step framework of linear transformations that generalizes a substantial body of previous work. The core step of the framework is an orthogonal transformation, and existing methods can be explained in terms of the additional normalization, whitening, re-weighting, de-whitening and dimensionality reduction steps. This allows us to gain new insights into the behavior of existing methods, including the effectiveness of inverse regression, and design a novel variant that obtains the best published results in zero-shot bilingual lexicon extraction. The corresponding software is released as an open source project.

237 citations


Proceedings ArticleDOI
08 Jul 2018
TL;DR: The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time.
Abstract: The recent growth of the Internet of Things (IoT) has resulted in a rise in IoT based DDoS attacks. This paper presents a solution to the detection of botnet activity within consumer IoT devices and networks. A novel application of Deep Learning is used to develop a detection model based on a Bidirectional Long Short Term Memory based Recurrent Neural Network (BLSTM-RNN). Word Embedding is used for text recognition and conversion of attack packets into tokenised integer format. The developed BLSTM-RNN detection model is compared to a LSTM-RNN for detecting four attack vectors used by the mirai botnet, and evaluated for accuracy and loss. The paper demonstrates that although the bidirectional approach adds overhead to each epoch and increases processing time, it proves to be a better progressive model over time. A labelled dataset was generated as part of this research, and is available upon request.

Proceedings ArticleDOI
01 Jan 2018
TL;DR: This paper conducted a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models.
Abstract: Many deep learning architectures have been proposed to model the compositionality in text sequences, requiring substantial number of parameters and expensive computations. However, there has not been a rigorous evaluation regarding the added value of sophisticated compositional functions. In this paper, we conduct a point-by-point comparative study between Simple Word-Embedding-based Models (SWEMs), consisting of parameter-free pooling operations, relative to word-embedding-based RNN/CNN models. Surprisingly, SWEMs exhibit comparable or even superior performance in the majority of cases considered. Based upon this understanding, we propose two additional pooling strategies over learned word embeddings: (i) a max-pooling operation for improved interpretability; and (ii) a hierarchical pooling operation, which preserves spatial (n-gram) information within text sequences. We present experiments on 17 datasets encompassing three tasks: (i) (long) document classification; (ii) text sequence matching; and (iii) short text tasks, including classification and tagging.

Proceedings Article
09 Jun 2018
TL;DR: This paper surveys the current state of academic research related to diachronic word embeddings and semantic shifts detection, and proposes several axes along which these methods can be compared, and outlines the main challenges before this emerging subfield of NLP.
Abstract: Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

Journal ArticleDOI
TL;DR: Senti4SD as mentioned in this paper is a classifier specifically trained to support sentiment analysis in developers' communication channels, which is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity.
Abstract: The role of sentiment analysis is increasingly emerging to study software developers' emotions by mining crowd-generated content within social software engineering tools. However, off-the-shelf sentiment analysis tools have been trained on non-technical domains and general-purpose social media, thus resulting in misclassifications of technical jargon and problem reports. Here, we present Senti4SD, a classifier specifically trained to support sentiment analysis in developers' communication channels. Senti4SD is trained and validated using a gold standard of Stack Overflow questions, answers, and comments manually annotated for sentiment polarity. It exploits a suite of both lexicon- and keyword-based features, as well as semantic features based on word embedding. With respect to a mainstream off-the-shelf tool, which we use as a baseline, Senti4SD reduces the misclassifications of neutral and positive posts as emotionally negative. To encourage replications, we release a lab package including the classifier, the word embedding space, and the gold standard with annotation guidelines.

Posted Content
TL;DR: This article showed that the choice of neural architecture (e.g., LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned.
Abstract: Contextual word representations derived from pre-trained bidirectional language models (biLMs) have recently been shown to provide significant improvements to the state of the art for a wide range of NLP tasks. However, many questions remain as to how and why these models are so effective. In this paper, we present a detailed empirical study of how the choice of neural architecture (e.g. LSTM, CNN, or self attention) influences both end task accuracy and qualitative properties of the representations that are learned. We show there is a tradeoff between speed and accuracy, but all architectures learn high quality contextual representations that outperform word embeddings for four challenging NLP tasks. Additionally, all architectures learn representations that vary with network depth, from exclusively morphological based at the word embedding layer through local syntax based in the lower contextual layers to longer range semantics such coreference at the upper layers. Together, these results suggest that unsupervised biLMs, independent of architecture, are learning much more about the structure of language than previously appreciated.

Proceedings ArticleDOI
31 Aug 2018
TL;DR: The authors cast the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms, and exploit the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages.
Abstract: Cross-lingual or cross-domain correspondences play key roles in tasks ranging from machine translation to transfer learning. Recently, purely unsupervised methods operating on monolingual embeddings have become effective alignment tools. Current state-of-the-art methods, however, involve multiple steps, including heuristic post-hoc refinement strategies. In this paper, we cast the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms. Indeed, we exploit the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages. We show that our OT objective can be estimated efficiently, requires little or no tuning, and results in performance comparable with the state-of-the-art in various unsupervised word translation tasks.

Journal ArticleDOI
TL;DR: A novel framework to predict the directions of stock prices by using both financial news and sentiment dictionary is proposed and outperforms state-of-the-art models and is more efficient in dealing with financial datasets.
Abstract: Financial news has been proven to be a crucial factor which causes fluctuations in stock prices. However, previous studies heavily relied on analyzing shallow features and ignored the structural relation among words in a sentence. Several sentiment analysis studies have tried to point out the relationship between investors’ reaction and news events. However, the sentiment dataset was usually constructed from the lingual dataset which is unrelated to the financial sector and led to poor performance. This paper proposes a novel framework to predict the directions of stock prices by using both financial news and sentiment dictionary. The original contributions of this paper include the proposal of a novel two-stream gated recurrent unit network and Stock2Vec—a sentiment word embedding trained on financial news dataset and Harvard IV-4. Two main experiments are conducted: the first experiment predicts SP 2) Stock2Vec is more efficient in dealing with financial datasets; and 3) applying the model, a simulation scenario proves that our model is effective for the stock sector.

Book ChapterDOI
01 Jan 2018
TL;DR: This survey introduces the motivation and background of word embedding, the methods of text representation as preliminaries, as well as some existingword embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics.
Abstract: Word embedding, where semantic and syntactic features are captured from unlabeled text data, is a basic procedure in Natural Language Processing (NLP) The extracted features thus could be organized in low dimensional space Some representative word embedding approaches include Probability Language Model, Neural Networks Language Model, Sparse Coding, etc The state-of-the-art methods like skip-gram negative samplings, noise-contrastive estimation, matrix factorization and hierarchical structure regularizer are applied correspondingly to resolve those models Most of these literatures are working on the observed count and co-occurrence statistic to learn the word embedding The increasing scale of data, the sparsity of data representation, word position, and training speed are the main challenges for designing word embedding algorithms In this survey, we first introduce the motivation and background of word embedding Next we will introduce the methods of text representation as preliminaries, as well as some existing word embedding approaches such as Neural Network Language Model and Sparse Coding Approach, along with their evaluation metrics In the end, we summarize the applications of word embedding and discuss its future directions

Proceedings ArticleDOI
18 Jun 2018
TL;DR: This paper investigates the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand.
Abstract: Searching over large code corpora can be a powerful productivity tool for both beginner and experienced developers because it helps them quickly find examples of code related to their intent. Code search becomes even more attractive if developers could express their intent in natural language, similar to the interaction that Stack Overflow supports. In this paper, we investigate the use of natural language processing and information retrieval techniques to carry out natural language search directly over source code, i.e. without having a curated Q&A forum such as Stack Overflow at hand. Our experiments using a benchmark suite derived from Stack Overflow and GitHub repositories show promising results. We find that while a basic word–embedding based search procedure works acceptably, better results can be obtained by adding a layer of supervision, as well as by a customized ranking strategy.

Proceedings ArticleDOI
01 Jul 2018
TL;DR: This paper restores interpretability to adversarial training methods by restricting the directions of perturbations toward the existing words in the input embedding space and can straightforwardly reconstruct each input with perturbATIONS to an actual text by considering the perturbation to be the replacement of words in a sentence while maintaining or even improving the task performance.
Abstract: Following great success in the image processing field, the idea of adversarial training has been applied to tasks in the natural language processing (NLP) field. One promising approach directly applies adversarial training developed in the image processing field to the input word embedding space instead of the discrete input space of texts. However, this approach abandons such interpretability as generating adversarial texts to significantly improve the performance of NLP tasks. This paper restores interpretability to such methods by restricting the directions of perturbations toward the existing words in the input embedding space. As a result, we can straightforwardly reconstruct each input with perturbations to an actual text by considering the perturbations to be the replacement of words in the sentence while maintaining or even improving the task performance.

Proceedings Article
03 Dec 2018
TL;DR: In this article, the Pairwise Inner Product (PIP) loss is proposed to measure the dissimilarity between word embeddings and reveal a fundamental bias-variance trade-off in dimensionality selection.
Abstract: In this paper, we provide a theoretical understanding of word embedding and its dimensionality. Motivated by the unitary-invariance of word embedding, we propose the Pairwise Inner Product (PIP) loss, a novel metric on the dissimilarity between word embeddings. Using techniques from matrix perturbation theory, we reveal a fundamental bias-variance trade-off in dimensionality selection for word embeddings. This bias-variance trade-off sheds light on many empirical observations which were previously unexplained, for example the existence of an optimal dimensionality. Moreover, new insights and discoveries, like when and how word embeddings are robust to over-fitting, are revealed. By optimizing over the bias-variance trade-off of the PIP loss, we can explicitly answer the open question of dimensionality selection for word embedding.

Proceedings ArticleDOI
10 Apr 2018
TL;DR: A semantics-assisted non-negative matrix factorization (SeaNMF) model to discover topics for the short texts effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of the corpus.
Abstract: Being a prevalent form of social communications on the Internet, billions of short texts are generated everyday. Discovering knowledge from them has gained a lot of interest from both industry and academia. The short texts have a limited contextual information, and they are sparse, noisy and ambiguous, and hence, automatically learning topics from them remains an important challenge. To tackle this problem, in this paper, we propose a semantics-assisted non-negative matrix factorization (SeaNMF) model to discover topics for the short texts. It effectively incorporates the word-context semantic correlations into the model, where the semantic relationships between the words and their contexts are learned from the skip-gram view of the corpus. The SeaNMF model is solved using a block coordinate descent algorithm. We also develop a sparse variant of the SeaNMF model which can achieve a better model interpretability. Extensive quantitative evaluations on various real-world short text datasets demonstrate the superior performance of the proposed models over several other state-of-the-art methods in terms of topic coherence and classification accuracy. The qualitative semantic analysis demonstrates the interpretability of our models by discovering meaningful and consistent topics. With a simple formulation and the superior performance, SeaNMF can be an effective standard topic model for short texts.

Journal ArticleDOI
TL;DR: A word vector refinement model is proposed to refine existing pretrained word vectors using real-valued sentiment intensity scores provided by sentiment lexicons to improve each word vector such that it can be closer in the lexicon to both semantically and sentimentally similar words.
Abstract: Word embeddings that provide continuous low-dimensional vector representations of words have been extensively used for various natural language processing tasks. However, existing context-based word embeddings such as Word2vec and GloVe typically fail to capture sufficient sentiment information, which may result in words with similar vector representations having an opposite sentiment polarity (e.g., good and bad ), thus degrading sentiment analysis performance. To tackle this problem, recent studies have suggested learning sentiment embeddings to incorporate the sentiment polarity (positive and negative) information from labeled corpora. This study adopts another strategy to learn sentiment embeddings. Instead of creating a new word embedding from labeled corpora, we propose a word vector refinement model to refine existing pretrained word vectors using real-valued sentiment intensity scores provided by sentiment lexicons. The idea of the refinement model is to improve each word vector such that it can be closer in the lexicon to both semantically and sentimentally similar words (i.e., those with similar intensity scores) and further away from sentimentally dissimilar words (i.e., those with dissimilar intensity scores). An obvious advantage of the proposed method is that it can be applied to any pretrained word embeddings. In addition, the intensity scores can provide more fine-grained (real-valued) sentiment information than binary polarity labels to guide the refinement process. Experimental results show that the proposed refinement model can improve both conventional word embeddings and previously proposed sentiment embeddings for binary, ternary, and fine-grained sentiment classification on the SemEval and Stanford Sentiment Treebank datasets.

Proceedings Article
27 Apr 2018
TL;DR: A hierarchical-LSTM (HLSTM) encoderdecoder model with visual content and word embedding for SLT exhibits promising performance on singer-independent test with seen sentences and also outperforms the comparison algorithms on unseen sentences.
Abstract: Continuous Sign Language Translation (SLT) is a challenging task due to its specific linguistics under sequential gesture variation without word alignment. Current hybrid HMM and CTC (Connectionist temporal classification) based models are proposed to solve frame or word level alignment. They may fail to tackle the cases with messing word order corresponding to visual content in sentences. To solve the issue, this paper proposes a hierarchical-LSTM (HLSTM) encoder-decoder model with visual content and word embedding for SLT. It tackles different granularities by conveying spatio-temporal transitions among frames, clips and viseme units. It firstly explores spatio-temporal cues of video clips by 3D CNN and packs appropriate visemes by online key clip mining with adaptive variable-length. After pooling on recurrent outputs of the top layer of HLSTM, a temporal attention-aware weighting mechanism is proposed to balance the intrinsic relationship among viseme source positions. At last, another two LSTM layers are used to separately recurse viseme vectors and translate semantic. After preserving original visual content by 3D CNN and the top layer of HLSTM, it shortens the encoding time step of the bottom two LSTM layers with less computational complexity while attaining more nonlinearity. Our proposed model exhibits promising performance on singer-independent test with seen sentences and also outperforms the comparison algorithms on unseen sentences.

Posted Content
TL;DR: A novel training procedure for learning gender-neutral word embeddings that preserves gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence is proposed.
Abstract: Word embedding models have become a fundamental component in a wide range of Natural Language Processing (NLP) applications. However, embeddings trained on human-generated corpora have been demonstrated to inherit strong gender stereotypes that reflect social constructs. To address this concern, in this paper, we propose a novel training procedure for learning gender-neutral word embeddings. Our approach aims to preserve gender information in certain dimensions of word vectors while compelling other dimensions to be free of gender influence. Based on the proposed method, we generate a Gender-Neutral variant of GloVe (GN-GloVe). Quantitative and qualitative experiments demonstrate that GN-GloVe successfully isolates gender information without sacrificing the functionality of the embedding model.

Journal ArticleDOI
TL;DR: The successes of neural IR thus far are highlighted, obstacles to its wider adoption are cataloged, and potentially promising directions for future research are suggested.
Abstract: A recent “third wave” of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, we survey the current landscape of Neural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research.

Journal ArticleDOI
TL;DR: OntoSenticNet is presented, a commonsense ontology for sentiment analysis based on SenticNet, a semantic network of 100,000 concepts based on conceptual primitives that has the capability of associating each concept with annotations contained in external resources.
Abstract: In this work, we present OntoSenticNet, a commonsense ontology for sentiment analysis based on SenticNet, a semantic network of 100,000 concepts based on conceptual primitives. The key characteristics of OntoSenticNet are: (i) the definition of precise conceptual hierarchy and properties associating concepts and sentiment values; (ii) the support for connecting external information (e.g., word embedding, domain information, and different polarity representations) to each individual defined within the ontology; and (iii) the capability of associating each concept with annotations contained in external resources (e.g., documents and multimodal resources).

Posted Content
TL;DR: A survey of the current state of academic research related to diachronic word embeddings and semantic shifts detection can be found in this article, where the authors discuss the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models.
Abstract: Recent years have witnessed a surge of publications aimed at tracing temporal changes in lexical semantics using distributional methods, particularly prediction-based word embedding models. However, this vein of research lacks the cohesion, common terminology and shared practices of more established areas of natural language processing. In this paper, we survey the current state of academic research related to diachronic word embeddings and semantic shifts detection. We start with discussing the notion of semantic shifts, and then continue with an overview of the existing methods for tracing such time-related shifts with word embedding models. We propose several axes along which these methods can be compared, and outline the main challenges before this emerging subfield of NLP, as well as prospects and possible applications.

Posted Content
TL;DR: This paper casts the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms and exploits the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages.
Abstract: Cross-lingual or cross-domain correspondences play key roles in tasks ranging from machine translation to transfer learning. Recently, purely unsupervised methods operating on monolingual embeddings have become effective alignment tools. Current state-of-the-art methods, however, involve multiple steps, including heuristic post-hoc refinement strategies. In this paper, we cast the correspondence problem directly as an optimal transport (OT) problem, building on the idea that word embeddings arise from metric recovery algorithms. Indeed, we exploit the Gromov-Wasserstein distance that measures how similarities between pairs of words relate across languages. We show that our OT objective can be estimated efficiently, requires little or no tuning, and results in performance comparable with the state-of-the-art in various unsupervised word translation tasks.

Journal ArticleDOI
TL;DR: A stacked residual LSTM model to predict sentiment intensity for a given text that outperforms lexicon- and regression-based methods proposed in previous studies and makes the deeper network easier to optimize.