Top 10 papers published in the topic of Word embedding in 2013

Proceedings Article•

Distributed Representations of Words and Phrases and their Compositionality

[...]

Tomas Mikolov¹, Ilya Sutskever¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +1 more•Institutions (1)

05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

...read moreread less

24,012 citations

Posted Content•

Efficient Estimation of Word Representations in Vector Space

[...]

Tomas Mikolov¹, Kai Chen², Greg S. Corrado³, Jeffrey Dean³•Institutions (3)

Brno University of Technology¹, Beijing University of Posts and Telecommunications², Google³

16 Jan 2013-arXiv: Computation and Language

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.

...read moreread less

Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

...read moreread less

20,077 citations

Posted Content•

Distributed Representations of Words and Phrases and their Compositionality

[...]

Tomas Mikolov¹, Ilya Sutskever¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +1 more•Institutions (1)

Google¹

16 Oct 2013-arXiv: Computation and Language

TL;DR: In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.

...read moreread less

Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

...read moreread less

11,343 citations

Proceedings Article•

Efficient Estimation of Word Representations in Vector Space

[...]

Tomas Mikolov¹, Kai Chen², Greg S. Corrado³, Jeffrey Dean³•Institutions (3)

Brno University of Technology¹, Beijing University of Posts and Telecommunications², Google³

16 Jan 2013

TL;DR: Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.

...read moreread less

Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

...read moreread less

9,270 citations

Proceedings Article•DOI•

Recurrent neural networks for language understanding.

[...]

Kaisheng Yao¹, Geoffrey Zweig, Mei-Yuh Hwang¹, Yangyang Shi², Dong Yu¹ - Show less +1 more•Institutions (2)

Microsoft¹, Delft University of Technology²

25 Aug 2013

TL;DR: This paper modify the architecture to perform Language Understanding, and advance the state-of-the-art for the widely used ATIS dataset.

...read moreread less

Abstract: Recurrent Neural Network Language Models (RNN-LMs) have recently shown exceptional performance across a variety of applications. In this paper, we modify the architecture to perform Language Understanding, and advance the state-of-the-art for the widely used ATIS dataset. The core of our approach is to take words as input as in a standard RNN-LM, and then to predict slot labels rather than words on the output side. We present several variations that differ in the amount of word context that is used on the input side, and in the use of non-lexical features. Remarkably, our simplest model produces state-of-the-art results, and we advance state-of-the-art through the use of bagof-words, word embedding, named-entity, syntactic, and wordclass features. Analysis indicates that the superior performance is attributable to the task-specific word representations learned by the RNN.

...read moreread less

328 citations

Posted Content•

Zero-Shot Learning by Convex Combination of Semantic Embeddings

[...]

Mohammad Norouzi¹, Tomas Mikolov², Samy Bengio², Yoram Singer², Jonathon Shlens², Andrea Frome², Greg S. Corrado², Jeffrey Dean² - Show less +4 more•Institutions (2)

University of Toronto¹, Google²

19 Dec 2013-arXiv: Learning

TL;DR: In this paper, a convex combination of the class label embedding vectors is used to map images into the semantic embedding space via convex combinations of the embeddings, which requires no additional training.

...read moreread less

Abstract: Several recent publications have proposed methods for mapping images into continuous semantic embedding spaces. In some cases the embedding space is trained jointly with the image transformation. In other cases the semantic embedding space is established by an independent natural language processing task, and then the image transformation into that space is learned in a second stage. Proponents of these image embedding systems have stressed their advantages over the traditional way{} classification framing of image understanding, particularly in terms of the promise for zero-shot learning -- the ability to correctly annotate images of previously unseen object categories. In this paper, we propose a simple method for constructing an image embedding system from any existing way{} image classifier and a semantic word embedding model, which contains the $ $ class labels in its vocabulary. Our method maps images into the semantic embedding space via convex combination of the class label embedding vectors, and requires no additional training. We show that this simple and direct method confers many of the advantages associated with more complex image embedding schemes, and indeed outperforms state of the art methods on the ImageNet zero-shot learning task.

...read moreread less

192 citations

Book Chapter•DOI•

Convolution Neural Network for Relation Extraction

[...]

Chunyang Liu, Wenbo Sun¹, Wenhan Chao¹, Wanxiang Che²•Institutions (2)

Beihang University¹, Harbin Institute of Technology²

14 Dec 2013

TL;DR: This paper proposes a novel convolution network, incorporating lexical features, applied to Relation Extraction, and compares the Convolution Neural Network CNN on relation extraction with the state-of-art tree kernel approach, including Typed Dependency Path Kernel and Shortest Dependency path Kernel and Context-Sensitive tree kernel.

...read moreread less

Abstract: Deep Neural Network has been applied to many Natural Language Processing tasks. Instead of building hand-craft features, DNN builds features by automatic learning, fitting different domains well. In this paper, we propose a novel convolution network, incorporating lexical features, applied to Relation Extraction. Since many current deep neural networks use word embedding by word table, which, however, neglects semantic meaning among words, we import a new coding method, which coding input words by synonym dictionary to integrate semantic knowledge into the neural network. We compared our Convolution Neural Network CNN on relation extraction with the state-of-art tree kernel approach, including Typed Dependency Path Kernel and Shortest Dependency Path Kernel and Context-Sensitive tree kernel, resulting in a 9% improvement competitive performance on ACE2005 data set. Also, we compared the synonym coding with the one-hot coding, and our approach got 1.6% improvement. Moreover, we also tried other coding method, such as hypernym coding, and give some discussion according the result.

...read moreread less

133 citations

Proceedings Article•

Word Alignment Modeling with Context Dependent Deep Neural Network

[...]

Nan Yang¹, Shujie Liu¹, Mu Li¹, Ming Zhou¹, Nenghai Yu² - Show less +1 more•Institutions (2)

Microsoft¹, University of Science and Technology of China²

01 Aug 2013

TL;DR: A novel bilingual word alignment approach based on DNN (Deep Neural Network) which outperforms the HMM and IBM model 4 baselines by 2 points in F-score and generates a very compact model with much fewer parameters.

...read moreread less

Abstract: In this paper, we explore a novel bilingual word alignment approach based on DNN (Deep Neural Network), which has been proven to be very effective in various machine learning tasks (Collobert et al., 2011). We describe in detail how we adapt and extend the CD-DNNHMM (Dahl et al., 2012) method introduced in speech recognition to the HMMbased word alignment model, in which bilingual word embedding is discriminatively learnt to capture lexical translation information, and surrounding words are leveraged to model context information in bilingual sentences. While being capable to model the rich bilingual correspondence, our method generates a very compact model with much fewer parameters. Experiments on a large scale EnglishChinese word alignment task show that the proposed method outperforms the HMM and IBM model 4 baselines by 2 points in F-score.

...read moreread less

86 citations

Proceedings Article•

Additive Neural Networks for Statistical Machine Translation

[...]

Lemao Liu¹, Taro Watanabe², Eiichiro Sumita², Tiejun Zhao¹•Institutions (2)

Harbin Institute of Technology¹, National Institute of Information and Communications Technology²

01 Aug 2013

TL;DR: The proposed variant of a neural network is employed as the input to the neural network, which encodes each word as a feature vector and outperforms the log-linear translation models with/without embedding features on Chinese- to-English and Japanese-to-English translation tasks.

...read moreread less

Abstract: Most statistical machine translation (SMT) systems are modeled using a loglinear framework. Although the log-linear model achieves success in SMT, it still suffers from some limitations: (1) the features are required to be linear with respect to the model itself; (2) features cannot be further interpreted to reach their potential. A neural network is a reasonable method to address these pitfalls. However, modeling SMT with a neural network is not trivial, especially when taking the decoding efficiency into consideration. In this paper, we propose a variant of a neural network, i.e. additive neural networks, for SMT to go beyond the log-linear translation model. In addition, word embedding is employed as the input to the neural network, which encodes each word as a feature vector. Our model outperforms the log-linear translation models with/without embedding features on Chinese-to-English and Japanese-to-English translation tasks.

...read moreread less

36 citations

An Empirical Investigation of Word Representations for Parsing the Web

[...]

Sorami Hisamoto, Kevin Duh, Yuji Matsumoto¹•Institutions (1)

Nara Institute of Science and Technology¹

01 Jan 2013

TL;DR: This work systematically examines two different kinds of word representations, namely Brown clustering and word embeddings induced from a neural language model on the task of dependency parsing on web text.

...read moreread less

Abstract: Parsing web text is progressively becoming important for many applications in natural language processing, such as machine translation, information retrieval, and sentiment analysis. Current syntactic parsing has been focused on canonical data such as newswires. When evaluated on standard benchmarks such as Wall Street Journal data set, current state-of-the-art parsers achieve accuracies well above 90%. However the accuracy drops dramatically when they are applied to new domains such as web data, barely over 80%. In order to make progress in many applications that rely on parsing, we need robust parsers that can handle such texts. One approach that is becoming popular recently is to use unsupervised word representations as extra features. Koo et al. [1] has shown that unsupervised clustering features are effective to improve dependency parsing. Turian et al. [2] examined clustering and unsupervised word embedding features on chunking and named entity recognition tasks. Unsupervised word embeddings are dense, low-dimensional and real-value vectors representing words, often induced by neural language models. They have shown that these word representation features lead to improvement in the performances. These word representations are induced by unsupervised methods, thus they are good for new domains such as the web, which has enormous amount of unlabeled data but little labeled data. In this paper we investigate the effect of unsupervised word representation features on dependency parsing with web texts. We consider two different kinds of word representations, namely Brown clustering and word embeddings induced from a neural language model. To the best of our knowledge, this is the first work that systematically examines these word representations on the task of dependency parsing on web text.

...read moreread less

10 citations

Showing papers on "Word embedding published in 2013"