Proceedings ArticleDOI
Glove: Global Vectors for Word Representation
Jeffrey Pennington,Richard Socher,Christopher D. Manning +2 more
- pp 1532-1543
Reads0
Chats0
TLDR
A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.Abstract:
Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.read more
Citations
More filters
Posted Content
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings ArticleDOI
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TL;DR: BERT as mentioned in this paper pre-trains deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks.
Proceedings Article
Language Models are Few-Shot Learners
Tom B. Brown,Benjamin Mann,Nick Ryder,Melanie Subbiah,Jared Kaplan,Prafulla Dhariwal,Arvind Neelakantan,Pranav Shyam,Girish Sastry,Amanda Askell,Sandhini Agarwal,Ariel Herbert-Voss,Gretchen Krueger,Thomas Henighan,Rewon Child,Aditya Ramesh,Daniel M. Ziegler,Jeffrey Wu,Clemens Winter,Christopher Hesse,Mark Chen,Eric Sigler,Mateusz Litwin,Scott Gray,Benjamin Chess,Jack Clark,Christopher Berner,Samuel McCandlish,Alec Radford,Ilya Sutskever,Dario Amodei +30 more
TL;DR: GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic.
Posted Content
Inductive Representation Learning on Large Graphs
TL;DR: GraphSAGE is presented, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data and outperforms strong baselines on three inductive node-classification benchmarks.
Proceedings ArticleDOI
Deep contextualized word representations
Matthew E. Peters,Mark Neumann,Mohit Iyyer,Matt Gardner,Christopher Clark,Kenton Lee,Luke Zettlemoyer +6 more
TL;DR: This paper introduced a new type of deep contextualized word representation that models both complex characteristics of word use (e.g., syntax and semantics), and how these uses vary across linguistic contexts (i.e., to model polysemy).
References
More filters
Journal ArticleDOI
Producing high-dimensional semantic spaces from lexical co-occurrence
Kevin Lund,Curt Burgess +1 more
TL;DR: A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word, which provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).
Journal ArticleDOI
Contextual correlates of semantic similarity
TL;DR: This article investigated the relationship between semantic and contextual similarity for pairs of nouns that vary from high to low semantic similarity and concluded that the more often two words can be substituted into the same contexts, the more similar they are judged to be.
Journal Article
Placing search in context: the concept revisited.
Lev Finkelstein,Evgeniy Gabrilovich,Yossi Matias,Ehud Rivlin,Zach Solan,Gadi Wolfman,Eytan Ruppin +6 more
TL;DR: A new conceptual paradigm for performing search in context is presented, that largely automates the search process, providing even non-professional users with highly relevant results.
Journal ArticleDOI
Contextual correlates of synonymy
TL;DR: The shapes of the functions indicate that similarity of context is reliable as criterion only for detecting pairs of words that are very similar in meaning.
Proceedings ArticleDOI
Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors
TL;DR: An extensive evaluation of context-predicting models with classic, count-vector-based distributional semantic approaches, on a wide range of lexical semantics tasks and across many parameter settings shows that the buzz around these models is fully justified.