scispace - formally typeset
Search or ask a question

Showing papers on "Word embedding published in 2014"


Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

30,558 citations


Proceedings Article
08 Dec 2014
TL;DR: It is shown that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks, and conjecture that this stems from the weighted nature of SGNS's factorization.
Abstract: We analyze skip-gram with negative-sampling (SGNS), a word embedding method introduced by Mikolov et al., and show that it is implicitly factorizing a word-context matrix, whose cells are the pointwise mutual information (PMI) of the respective word and context pairs, shifted by a global constant. We find that another embedding method, NCE, is implicitly factorizing a similar matrix, where each cell is the (shifted) log conditional probability of a word given its context. We show that using a sparse Shifted Positive PMI word-context matrix to represent words improves results on two word similarity tasks and one of two analogy tasks. When dense low-dimensional vectors are preferred, exact factorization with SVD can achieve solutions that are at least as good as SGNS's solutions for word similarity tasks. On analogy questions SGNS remains superior to SVD. We conjecture that this stems from the weighted nature of SGNS's factorization.

1,835 citations


Proceedings ArticleDOI
01 Jun 2014
TL;DR: Three neural networks are developed to effectively incorporate the supervision from sentiment polarity of text (e.g. sentences or tweets) in their loss functions and the performance of SSWE is improved by concatenating SSWE with existing feature set.
Abstract: We present a method that learns word embedding for Twitter sentiment classification in this paper. Most existing algorithms for learning continuous word representations typically only model the syntactic context of words but ignore the sentiment of text. This is problematic for sentiment analysis as they usually map words with similar syntactic context but opposite sentiment polarity, such as good and bad, to neighboring word vectors. We address this issue by learning sentimentspecific word embedding (SSWE), which encodes sentiment information in the continuous representation of words. Specifically, we develop three neural networks to effectively incorporate the supervision from sentiment polarity of text (e.g. sentences or tweets) in their loss functions. To obtain large scale training corpora, we learn the sentiment-specific word embedding from massive distant-supervised tweets collected by positive and negative emoticons. Experiments on applying SSWE to a benchmark Twitter sentiment classification dataset in SemEval 2013 show that (1) the SSWE feature performs comparably with hand-crafted features in the top-performed system; (2) the performance is further improved by concatenating SSWE with existing feature set.

1,157 citations


Proceedings Article
01 Jan 2014
TL;DR: A simple method for constructing an image embedding system from any existing image classifier and a semantic word embedding model, which contains the $ $ class labels in its vocabulary is proposed, which outperforms state of the art methods on the ImageNet zero-shot learning task.
Abstract: Several recent publications have proposed methods for mapping images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents of these image embedding systems have stressed their advantages over the traditional way{} classification framing of image understanding, particularly in terms of the promise for zero-shot learning -- the ability to correctly annotate images of previously unseen object categories. In this paper, we propose a simple method for constructing an image embedding system from any existing way{} image classifier and a semantic word embedding model, which contains the $ $ class labels in its vocabulary. Our method maps images into the semantic embedding space via convex combination of the class label embedding vectors, and requires no additional training. We show that this simple and direct method confers many of the advantages associated with more complex image embedding schemes, and indeed outperforms state of the art methods on the ImageNet zero-shot learning task.

853 citations


Posted Content
Xin Rong1
TL;DR: Detailed derivations and explanations of the parameter update equations of the word2vec models, including the original continuous bag-of-word (CBOW) and skip-gram (SG) models, as well as advanced optimization techniques, including hierarchical softmax and negative sampling are provided.
Abstract: The word2vec model and application by Mikolov et al. have attracted a great amount of attention in recent two years. The vector representations of words learned by word2vec models have been shown to carry semantic meanings and are useful in various NLP tasks. As an increasing number of researchers would like to experiment with word2vec or similar techniques, I notice that there lacks a material that comprehensively explains the parameter learning process of word embedding models in details, thus preventing researchers that are non-experts in neural networks from understanding the working mechanism of such models. This note provides detailed derivations and explanations of the parameter update equations of the word2vec models, including the original continuous bag-of-word (CBOW) and skip-gram (SG) models, as well as advanced optimization techniques, including hierarchical softmax and negative sampling. Intuitive interpretations of the gradient equations are also provided alongside mathematical derivations. In the appendix, a review on the basics of neuron networks and backpropagation is provided. I also created an interactive demo, wevi, to facilitate the intuitive understanding of the model.

598 citations


Proceedings ArticleDOI
Duyu Tang1, Furu Wei2, Bing Qin1, Ting Liu1, Ming Zhou1 
01 Aug 2014
TL;DR: A neural network with hybrid loss function 1 is developed to learn SSWE, which encodes the sentiment information of tweets in the continuous representation of words, which can be easily re-implemented with the publicly available sentiment-specific word embedding.
Abstract: In this paper, we develop a deep learning system for message-level Twitter sentiment classification. Among the 45 submitted systems including the SemEval 2013 participants, our system (Coooolll) is ranked 2nd on the Twitter2014 test set of SemEval 2014 Task 9. Coooolll is built in a supervised learning framework by concatenating the sentiment-specific word embedding (SSWE) features with the state-of-the-art hand-crafted features. We develop a neural network with hybrid loss function 1 to learn SSWE, which encodes the sentiment information of tweets in the continuous representation of words. To obtain large-scale training corpora, we train SSWE from 10M tweets collected by positive and negative emoticons, without any manual annotation. Our system can be easily re-implemented with the publicly available sentiment-specific word embedding.

228 citations


Book ChapterDOI
Jiang Bian1, Bin Gao1, Tie-Yan Liu1
15 Sep 2014
TL;DR: This study explores the capacity of leveraging morphological, syntactic, and semantic knowledge to achieve high-quality word embeddings, and explores these types of knowledge to define new basis for word representation, provide additional input information, and serve as auxiliary supervision in deep learning.
Abstract: The basis of applying deep learning to solve natural language processing tasks is to obtain high-quality distributed representations of words, i.e., word embeddings, from large amounts of text data. However, text itself usually contains incomplete and ambiguous information, which makes necessity to leverage extra knowledge to understand it. Fortunately, text itself already contains well-defined morphological and syntactic knowledge; moreover, the large amount of texts on the Web enable the extraction of plenty of semantic knowledge. Therefore, it makes sense to design novel deep learning algorithms and systems in order to leverage the above knowledge to compute more effective word embeddings. In this paper, we conduct an empirical study on the capacity of leveraging morphological, syntactic, and semantic knowledge to achieve high-quality word embeddings. Our study explores these types of knowledge to define new basis for word representation, provide additional input information, and serve as auxiliary supervision in deep learning, respectively. Experiments on an analogical reasoning task, a word similarity task, and a word completion task have all demonstrated that knowledge-powered deep learning can enhance the effectiveness of word embedding.

184 citations


Proceedings Article
01 Aug 2014
TL;DR: A Probabilistic Model for Learning Multi-Prototype Word Embeddings and its Applications in Education and Research is presented.
Abstract: Distributed word representations have been widely used and proven to be useful in quite a few natural language processing and text mining tasks. Most of existing word embedding models aim at generating only one embedding vector for each individual word, which, however, limits their effectiveness because huge amounts of words are polysemous (such as bank and star). To address this problem, it is necessary to build multi embedding vectors to represent different meanings of a word respectively. Some recent studies attempted to train multi-prototype word embeddings through clustering context window features of the word. However, due to a large number of parameters to train, these methods yield limited scalability and are inefficient to be trained with big data. In this paper, we introduce a much more efficient method for learning multi embedding vectors for polysemous words. In particular, we first propose to model word polysemy from a probabilistic perspective and integrate it with the highly efficient continuous Skip-Gram model. Under this framework, we design an Expectation-Maximization algorithm to learn the word’s multi embedding vectors. With much less parameters to train, our model can achieve comparable or even better results on word-similarity tasks compared with conventional methods.

150 citations


Proceedings ArticleDOI
01 Oct 2014
TL;DR: Experiments on the task of named entity recognition show that each of the proposed approaches can better utilize the word embedding features, among which the distributional prototype approach performs the best.
Abstract: Recent work has shown success in using continuous word embeddings learned from unlabeled data as features to improve supervised NLP systems, which is regarded as a simple semi-supervised learning mechanism. However, fundamental problems on effectively incorporating the word embedding features within the framework of linear models remain. In this study, we investigate and analyze three different approaches, including a new proposed distributional prototype approach, for utilizing the embedding features. The presented approaches can be integrated into most of the classical linear models in NLP. Experiments on the task of named entity recognition show that each of the proposed approaches can better utilize the word embedding features, among which the distributional prototype approach performs the best. Moreover, the combination of the approaches provides additive improvements, outperforming the dense and continuous embedding features by nearly 2 points of F1 score.

130 citations


Proceedings ArticleDOI
01 Jun 2014
TL;DR: This paper systematically explores various ways to apply word embeddings and clustering on adapting feature-based relation extraction systems and shows the best adaptation improvement by combining word cluster and word embedding information.
Abstract: Relation extraction suffers from a performance loss when a model is applied to out-of-domain data. This has fostered the development of domain adaptation techniques for relation extraction. This paper evaluates word embeddings and clustering on adapting feature-based relation extraction systems. We systematically explore various ways to apply word embeddings and show the best adaptation improvement by combining word cluster and word embedding information. Finally, we demonstrate the effectiveness of regularization for the adaptability of relation extractors.

91 citations


Proceedings ArticleDOI
01 Jun 2014
TL;DR: A word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers, which outperforms the feed-forward neural network-based model as well as the IBM Model 4 under Japanese-English and French-English word alignment tasks.
Abstract: This study proposes a word alignment model based on a recurrent neural network (RNN), in which an unlimited alignment history is represented by recurrently connected hidden layers. We perform unsupervised learning using noise-contrastive estimation (Gutmann and Hyvarinen, 2010; Mnih and Teh, 2012), which utilizes artificially generated negative samples. Our alignment model is directional, similar to the generative IBM models (Brown et al., 1993). To overcome this limitation, we encourage agreement between the two directional models by introducing a penalty function that ensures word embedding consistency across two directional models during training. The RNN-based model outperforms the feed-forward neural network-based model (Yang et al., 2013) as well as the IBM Model 4 under Japanese-English and French-English word alignment tasks, and achieves comparable translation performance to those baselines for Japanese-English and Chinese-English translation tasks.

Proceedings Article
Lin Qiu1, Yong Cao2, Zaiqing Nie2, Yong Yu1, Yong Rui2 
27 Jul 2014
TL;DR: Proximity-Ambiguity Sensitive (PAS) models are proposed to produce high quality distributed representations of words considering both word proximity and ambiguity, and the strength of pooling-structured neural networks in word representation learning is revealed.
Abstract: Distributed representations of words (aka word embedding) have proven helpful in solving natural language processing (NLP) tasks. Training distributed representations of words with neural networks has lately been a major focus of researchers in the field. Recent work on word embedding, the Continuous Bag-of-Words (CBOW) model and the Continuous Skip-gram (Skip-gram) model, have produced particularly impressive results, significantly speeding up the training process to enable word representation learning from largescale data. However, both CBOW and Skip-gram do not pay enough attention to word proximity in terms of model or word ambiguity in terms of linguistics. In this paper, we propose Proximity-Ambiguity Sensitive (PAS) models (i.e. PAS CBOW and PAS Skip-gram) to produce high quality distributed representations of words considering both word proximity and ambiguity. From the model perspective, we introduce proximity weights as parameters to be learned in PAS CBOWand used in PAS Skip-gram. By better modeling word proximity, we reveal the strength of pooling-structured neural networks in word representation learning. The proximitysensitive pooling layer can also be applied to other neural network applications that employ pooling layers. From the linguistics perspective, we train multiple representation vectors per word. Each representation vector corresponds to a particular group of POS tags of the word. By using PAS models, we achieved a 16.9% increase in accuracy over state-of-theart models.

Posted Content
TL;DR: This paper advocates for density-based distributed embeddings and presents a method for learning representations in the space of Gaussian distributions, and investigates the ability of these embedDings to model entailment and other asymmetric relationships, and explores novel properties of the representation.
Abstract: Current work in lexical distributed representations maps each word to a point vector in low-dimensional space. Mapping instead to a density provides many interesting advantages, including better capturing uncertainty about a representation and its relationships, expressing asymmetries more naturally than dot product or cosine similarity, and enabling more expressive parameterization of decision boundaries. This paper advocates for density-based distributed embeddings and presents a method for learning representations in the space of Gaussian distributions. We compare performance on various word embedding benchmarks, investigate the ability of these embeddings to model entailment and other asymmetric relationships, and explore novel properties of the representation.

Proceedings Article
Yong Luo1, Jian Tang1, Jun Yan2, Chao Xu1, Zheng Chen2 
27 Jul 2014
TL;DR: A two-side multimodal neural network is proposed to learn a robust word embedding from multiple data sources including free text, user search queries and search click-through data and outperforms state-of-the-art word embeddedding algorithm with each individual source.
Abstract: Word embedding aims to learn a continuous representation for each word. It attracts increasing attention due to its effectiveness in various tasks such as named entity recognition and language modeling. Most existing word embedding results are generally trained on one individual data source such as news pages or Wikipedia articles. However, when we apply them to other tasks such as web search, the performance suffers. To obtain a robust word embedding for different applications, multiple data sources could be leveraged. In this paper, we proposed a two-side multimodal neural network to learn a robust word embedding from multiple data sources including free text, user search queries and search click-through data. This framework takes the word embeddings learned from different data sources as pre-train, and then uses a two-side neural network to unify these embeddings. The pre-trained embeddings are obtained by adapting the recently proposed CBOW algorithm. Since the proposed neural network does not need to re-train word embeddings for a new task, it is highly scalable in real world problem solving. Besides, the network allows weighting different sources differently when applied to different application tasks. Experiments on two real-world applications including web search ranking and word similarity measuring show that our neural network with multiple sources outperforms state-of-the-art word embedding algorithm with each individual source. It also outperforms other competitive baselines using multiple sources.

Posted Content
TL;DR: The details of the WordRep collection are described and how to use it in different types of machine learning research related to word embedding, and new potential research topics that can be supported by WordRep are discussed.
Abstract: WordRep is a benchmark collection for the research on learning distributed word representations (or word embeddings), released by Microsoft Research. In this paper, we describe the details of the WordRep collection and show how to use it in different types of machine learning research related to word embedding. Specifically, we describe how the evaluation tasks in WordRep are selected, how the data are sampled, and how the evaluation tool is built. We then compare several state-of-the-art word representations on WordRep, report their evaluation performance, and make discussions on the results. After that, we discuss new potential research topics that can be supported by WordRep, in addition to algorithm comparison. We hope that this paper can help people gain deeper understanding of WordRep, and enable more interesting research on learning distributed word representations and related topics.

Posted Content
TL;DR: This paper trains recurrent neural networks with only raw features, and uses word embedding to automatically learn meaningful representations, and is able to outperform the best SVM-based systems reported in the EMNLP'14 Code-Switching Workshop by 1% in accuracy, or by 17% in error rate reduction.
Abstract: Mixed language data is one of the difficult yet less explored domains of natural language processing. Most research in fields like machine translation or sentiment analysis assume monolingual input. However, people who are capable of using more than one language often communicate using multiple languages at the same time. Sociolinguists believe this "code-switching" phenomenon to be socially motivated. For example, to express solidarity or to establish authority. Most past work depend on external tools or resources, such as part-of-speech tagging, dictionary look-up, or named-entity recognizers to extract rich features for training machine learning models. In this paper, we train recurrent neural networks with only raw features, and use word embedding to automatically learn meaningful representations. Using the same mixed-language Twitter corpus, our system is able to outperform the best SVM-based systems reported in the EMNLP'14 Code-Switching Workshop by 1% in accuracy, or by 17% in error rate reduction.

Proceedings ArticleDOI
04 May 2014
TL;DR: This work presents techniques to obtain task and domain specific word embeddings and shows their usefulness over those obtained from generic unsupervised data, and shows how they are transferred from one language to another enabling training of a multilingual spoken language understanding system.
Abstract: Models for statistical spoken language understanding (SLU) systems are conventionally trained using supervised discriminative training methods In many cases, however, labeled data necessary for these supervised techniques is not readily available necessitating a laborious data collection and annotation effort This often results into data sets that are not expansive enough to cover adequately all patterns of natural language phrases that occur in the target applications Word embedding features alleviate data and feature sparsity issues by learning mathematical representation of words and word associations in the continuous space In this work, we present techniques to obtain task and domain specific word embeddings and show their usefulness over those obtained from generic unsupervised data We also show how we transfer these embeddings from one language to another enabling training of a multilingual spoken language understanding system

Proceedings ArticleDOI
03 Jul 2014
TL;DR: A novel method to efficiently detect local reuses at the semantic level for large scale problems by using continuous vector representations of words to capture the semanticlevel similarities between short text segments is introduced.
Abstract: Text reuse is a common phenomenon in a variety of user-generated content. Along with the quick expansion of social media, reuses of local text are occurring much more frequently than ever before. The task of detecting these local reuses serves as an essential step for many applications. It has attracted extensive attention in recent years. However, semantic level similarities have not received consideration in most previous works. In this paper, we introduce a novel method to efficiently detect local reuses at the semantic level for large scale problems. We propose to use continuous vector representations of words to capture the semantic level similarities between short text segments. In order to handle tens of billions of documents, methods based on information geometry and hashing methods are introduced to aggregate and map text segments presented by word embeddings to binary hash codes. Experimental results demonstrate that the proposed methods achieve significantly better performance than state-of-the-art approaches in all six document collections belonging to four different categories. At some recall levels, the precisions of the proposed method are even 10 times higher than previous methods. Moreover, the efficiency of the proposed method is comparable to or better than that of some other hashing methods.

Book ChapterDOI
05 Dec 2014
TL;DR: A model to generate deep features, which describe the semantic relevance between short “text object”, is designed and achieves the state-of-the-art performance by using shallow features and deep features.
Abstract: Semantic matching is widely used in many natural language processing tasks. In this paper, we focus on the semantic matching between short texts and design a model to generate deep features, which describe the semantic relevance between short “text object”. Furthermore, we design a method to combine shallow features of short texts (i.e., LSI, VSM and some other handcraft features) with deep features of short texts (i.e., word embedding matching of short text). Finally, a ranking model (i.e., RankSVM) is used to make the final judgment. In order to evaluate our method, we implement our method on the task of matching posts and responses. Results of experiments show that our method achieves the state-of-the-art performance by using shallow features and deep features.

Proceedings ArticleDOI
11 Aug 2014
TL;DR: A Bilingual Sentiment Embedding model (BSE) is proposed to jointly embed the review texts in different languages into a joint sentimental semantic space and can outperform the state-of-the-art SCL method.
Abstract: Cross-lingual sentiment classification aims to leverage the rich sentiment resources in one language for sentiment classification in a different language. The biggest challenge of this task is how to eliminate the sentimental semantic gap between two languages. The use of machine translation cannot address this challenge very well due to the translation noises and the different expressions in different languages. In this study, we propose a Bilingual Sentiment Embedding model (BSE) to jointly embed the review texts in different languages into a joint sentimental semantic space. After embedding the reviews texts into the sentimental semantic space, the reviews texts in different languages can be easily classified with a classifier. Moreover, our proposed model can find in both languages the words with similar sentiment orientation or opposite sentiment orientation for a given word. Experimental results on a benchmark dataset show that our proposed model can outperform the state-of-the-art SCL method.

Proceedings ArticleDOI
Yinggong Zhao1, Shujian Huang1, Xinyu Dai1, Jianbing Zhang1, Jiajun Chen1 
04 Dec 2014
TL;DR: This work extends previous approach by learning distributed representations from dependency structure of a sentence which can capture long distance relations and proves that context can learn better semantics for words, which is proved on Semantic-Syntactic Word Relationship task.
Abstract: Continuous-space word representation has demonstrated its effectiveness in many natural language pro-cessing(NLP) tasks. The basic idea for embedding training is to update embedding matrix based on its context. However, such context has been constrained on fixed surrounding words, which we believe are not sufficient to represent the actual relations for given center word. In this work we extend previous approach by learning distributed representations from dependency structure of a sentence which can capture long distance relations. Such context can learn better semantics for words, which is proved on Semantic-Syntactic Word Relationship task. Besides, competitive result is also achieved for dependency embeddings on WordSim-353 task.

Posted Content
Qing Cui, Bin Gao, Jiang Bian, Siyu Qiu, Tie-Yan Liu 
TL;DR: Zhang et al. as mentioned in this paper introduced a novel neural network architecture called KNET that leverages both contextual information and morphological word similarity built based on morphological knowledge to learn word embeddings.
Abstract: Neural network techniques are widely applied to obtain high-quality distributed representations of words, i.e., word embeddings, to address text mining, information retrieval, and natural language processing tasks. Recently, efficient methods have been proposed to learn word embeddings from context that captures both semantic and syntactic relationships between words. However, it is challenging to handle unseen words or rare words with insufficient context. In this paper, inspired by the study on word recognition process in cognitive psychology, we propose to take advantage of seemingly less obvious but essentially important morphological knowledge to address these challenges. In particular, we introduce a novel neural network architecture called KNET that leverages both contextual information and morphological word similarity built based on morphological knowledge to learn word embeddings. Meanwhile, the learning architecture is also able to refine the pre-defined morphological knowledge and obtain more accurate word similarity. Experiments on an analogical reasoning task and a word similarity task both demonstrate that the proposed KNET framework can greatly enhance the effectiveness of word embeddings.

Proceedings Article
01 Aug 2014
TL;DR: A minimally supervised model for noun classification that uses symmetric patterns and an iterative variant of the k-Nearest Neighbors algorithm and obtains 82%-94% accuracy using as few as four labeled examples per category, emphasizing the effectiveness of simple search and representation techniques for this task.
Abstract: Classifying nouns into semantic categories (e.g., animals, food) is an important line of research in both cognitive science and natural language processing. We present a minimally supervised model for noun classification, which uses symmetric patterns (e.g., “X and Y”) and an iterative variant of the k-Nearest Neighbors algorithm. Unlike most previous works, we do not use a predefined set of symmetric patterns, but extract them automatically from plain text, in an unsupervised manner. We experiment with four semantic categories and show that symmetric patterns constitute much better classification features compared to leading word embedding methods. We further demonstrate that our simple k-Nearest Neighbors algorithm outperforms two state-ofthe-art label propagation alternatives for this task. In experiments, our model obtains 82%-94% accuracy using as few as four labeled examples per category, emphasizing the effectiveness of simple search and representation techniques for this task.

Proceedings Article
01 Nov 2014
TL;DR: The word embeddings provide accurate and compact summaries of observed entity contexts further described by topic clusters that are estimated in a non-parametric manner and associate a staleness measure with each entity and topic cluster, dynamically estimating their temporal relevance.
Abstract: : Identifying documents that contain timely and vital information for an entity of interest, a task known as vital filtering, has become increasingly important with the availability of large document collections To efficiently filter such large text corpora in a streaming manner, we need to compactly represent previously observed entity contexts and quickly estimate whether a new document contains novel information Existing approaches to modeling contexts, such as bag of words, latent semantic indexing, and topic models are limited in several respects: they are unable to handle streaming data, do not model the underlying topic of each document, suffer from lexical sparsity, and/or do not accurately estimate temporal vitalness In this paper, we introduce a word embedding-based non-parametric representation of entities that addresses the above limitations The word embeddings provide accurate and compact summaries of observed entity contexts further described by topic clusters that are estimated in a non-parametric manner Additionally we associate a staleness measure with each entity and topic cluster, dynamically estimating their temporal relevance This approach of using word embeddings, non-parametric clustering, and staleness provides an efficient yet appropriate representation of entity contexts for the streaming setting, enabling accurate vital filtering

01 Aug 2014
TL;DR: It is shown that the DSSM trained on large body of text can produce meaningful word embedding vectors as demonstrated on semantic word clustering and semantic word analogy tasks.
Abstract: Deep neural network (DNN) based natural language processing models rely on a word embedding matrix to transform raw words into vectors. Recently, a deep structured semantic model (DSSM) has been proposed to project raw text to a continuously-valued vector for Web Search. In this technical report, we propose learning word embedding using DSSM. We show that the DSSM trained on large body of text can produce meaningful word embedding vectors as demonstrated on semantic word clustering and semantic word analogy tasks.

Posted Content
Qing Cui, Bin Gao, Jiang Bian, Siyu Qiu, Tie-Yan Liu 
07 Jul 2014
TL;DR: A novel neural network architecture is introduced that leverages both contextual information and morphological word similarity to learn word embeddings and is able to refine the pre-defined morphological knowledge and obtain more accurate word similarity.
Abstract: Deep learning techniques aim at obtaining high-quality distributed representations of words, i.e., word embeddings, to address text mining and natural language processing tasks. Recently, efficient methods have been proposed to learn word embeddings from context that captures both semantic and syntactic relationships between words. However, it is challenging to handle unseen words or rare words with insufficient context. In this paper, inspired by the study on word recognition process in cognitive psychology, we propose to take advantage of seemingly less obvious but essentially important morphological word similarity to address these challenges. In particular, we introduce a novel neural network architecture that leverages both contextual information and morphological word similarity to learn word embeddings. Meanwhile, the learning architecture is also able to refine the pre-defined morphological knowledge and obtain more accurate word similarity. Experiments on an analogical reasoning task and a word similarity task both demonstrate that the proposed method can greatly enhance the effectiveness of word embeddings.

Proceedings ArticleDOI
04 May 2014
TL;DR: This work proposes to use variants of Levenshtein alignment for merging an er-rorful utterance with a targeted rephrase of an error segment through phonetic matching and a word embedding distance to address ASR errors.
Abstract: Clarification dialogs can help address ASR errors in speech-to-speech translation systems and other interactive applications. We propose to use variants of Levenshtein alignment for merging an errorful utterance with a targeted rephrase of an error segment. ASR errors that might harm the alignment are addressed through phonetic matching, and a word embedding distance is used to account for the use of synonyms outside targeted segments. These features lead to a relative improvement of 30% of word error rate on ASR output compared to not performing the clarification. Twice as many utterance are completely corrected compared to using basic word alignment. Furthermore, we generate a set of potential merges and train a neural network on crowd-sourced rephrases in order to select the best merger, leading to 24% more instances completely corrected. The system is deployed in the framework of the BOLT project.

Book ChapterDOI
05 Dec 2014
TL;DR: A new NLP task similar to word expansion task or word similarity task, which can discover words sharing the same semantic components (feature sub-space) with seed words is introduced and a Feature Extraction method based on Word Embeddings is proposed for this problem.
Abstract: In this paper, we introduce a new NLP task similar to word expansion task or word similarity task, which can discover words sharing the same semantic components (feature sub-space) with seed words. We also propose a Feature Extraction method based on Word Embeddings for this problem. We train word embeddings using state-of-the-art methods like word2vec and models supplied by Stanford NLP Group. Prior Statistical Knowledge and Negative Sampling are proposed and utilized to help extract the Feature Sub-Space. We evaluate our model on WordNet synonym dictionary dataset and compare it to word2vec on synonymy mining and word similarity computing task, showing that our method outperforms other models or methods and can significantly help improve language understanding.

Posted Content
TL;DR: The proposed word embedding method achieves state-of-the-art results in multilingual dependency parsing and compared word embeddings, including more recent representations, in Named Entity Recognition, Chunking, and Dependency Parsing.
Abstract: We analyze a word embedding method in supervised tasks It maps words on a sphere such that words co-occurring in similar contexts lie closely The similarity of contexts is measured by the distribution of substitutes that can fill them We compared word embeddings, including more recent representations, in Named Entity Recognition (NER), Chunking, and Dependency Parsing We examine our framework in multilingual dependency parsing as well The results show that the proposed method achieves as good as or better results compared to the other word embeddings in the tasks we investigate It achieves state-of-the-art results in multilingual dependency parsing Word embeddings in 7 languages are available for public use

Chen Tao, Lu Qin, Ruifeng Xu, Bin Liu, Jun Xu 
01 Jan 2014
TL;DR: Evaluations on NLP&CC2013 Chinese micro blog emotion classification and English Multi-Domain Sentiment Dataset version 2.0 show that the proposed oversampling approach improves the imbalanced emotion/sentiment classification in Chinese and English obviously.
Abstract: Imbalanced training data always puzzles the supervised learning based emotion and sentiment classification. Several existing research showed that data sparseness and small disjuncts are the two major factors affecting the classification. Target to these two problems, this paper presents a word embedding based oversampling method. Firstly, a large-scale text corpus is used to train a continuous skip-gram model in order to form word embedding. A feature selection and linear combination algorithm is developed to construct text representation vector from word embedding. Based on this, the new minority class training samples are generated through calculating the mean vector of two text representation vectors in the same class until the training samples for each class are the same so that the classifiers can be trained on the fully balanced dataset. Evaluations on NLP&CC2013 Chinese micro blog emotion classification (multi-label) and English Multi-Domain Sentiment Dataset version 2.0 (single label) show that the proposed oversampling approach improves the imbalanced emotion/sentiment classification in Chinese (sentence level) and English (document level) obviously. Further analysis show that our approach can reduce the affection of data sparseness and small disjuncts in imbalanced emotion and sentiment classification.