scispace - formally typeset
Open AccessPosted Content

Distributed Representations of Words and Phrases and their Compositionality

TLDR
In this paper, the Skip-gram model is used to learn high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships and improve both the quality of the vectors and the training speed.
Abstract
The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

read more

Citations
More filters
Posted Content

Semi-Supervised Classification with Graph Convolutional Networks

TL;DR: A scalable approach for semi-supervised learning on graph-structured data that is based on an efficient variant of convolutional neural networks which operate directly on graphs which outperforms related methods by a significant margin.
Posted Content

Inductive Representation Learning on Large Graphs

TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Proceedings ArticleDOI

node2vec: Scalable Feature Learning for Networks

TL;DR: Node2vec as mentioned in this paper learns a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes by using a biased random walk procedure.
Posted Content

Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks

TL;DR: This work introduces a class of CNNs called deep convolutional generative adversarial networks (DCGANs), that have certain architectural constraints, and demonstrates that they are a strong candidate for unsupervised learning.
Proceedings ArticleDOI

LINE: Large-scale Information Network Embedding

TL;DR: LINE as discussed by the authors proposes a network embedding method called LINE, which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted, and optimizes a carefully designed objective function that preserves both the local and global network structures.
References
More filters
Journal ArticleDOI

Learning representations by back-propagating errors

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.
Posted Content

Efficient Estimation of Word Representations in Vector Space

TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Journal ArticleDOI

A neural probabilistic language model

TL;DR: The authors propose to learn a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences, which can be expressed in terms of these representations.
Proceedings ArticleDOI

A unified architecture for natural language processing: deep neural networks with multitask learning

TL;DR: This work describes a single convolutional neural network architecture that, given a sentence, outputs a host of language processing predictions: part-of-speech tags, chunks, named entity tags, semantic roles, semantically similar words and the likelihood that the sentence makes sense using a language model.
Proceedings Article

Word Representations: A Simple and General Method for Semi-Supervised Learning

TL;DR: This work evaluates Brown clusters, Collobert and Weston (2008) embeddings, and HLBL (Mnih & Hinton, 2009) embeds of words on both NER and chunking, and finds that each of the three word representations improves the accuracy of these baselines.
Related Papers (5)