scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Distributional Models of Word Meaning

17 Jan 2018-Social Science Research Network (Annual Reviews)-Vol. 4, Iss: 1, pp 151-171
TL;DR: This review presents the state of the art in distributional semantics, focusing on its assets and limits as a model of meaning and as a method for semantic analysis.
Abstract: Distributional semantics is a usage-based model of meaning, based on the assumption that the statistical distribution of linguistic items in context plays a key role in characterizing their semantic behavior. Distributional models build semantic representations by extracting co-occurrences from corpora and have become a mainstream research paradigm in computational linguistics. In this review, I present the state of the art in distributional semantics, focusing on its assets and limits as a model of meaning and as a method for semantic analysis.
Citations
More filters
Posted Content
TL;DR: An extensive overview of the field of word embeddings evaluation is presented, highlighting main problems and proposing a typology of approaches to evaluation, summarizing 16 intrinsic methods and 12 extrinsic methods.
Abstract: Word embeddings are real-valued word representations able to capture lexical semantics and trained on natural language corpora. Models proposing these representations have gained popularity in the recent years, but the issue of the most adequate evaluation method still remains open. This paper presents an extensive overview of the field of word embeddings evaluation, highlighting main problems and proposing a typology of approaches to evaluation, summarizing 16 intrinsic methods and 12 extrinsic methods. I describe both widely-used and experimental methods, systematize information about evaluation datasets and discuss some key challenges.

147 citations

Journal ArticleDOI
TL;DR: This article identifies common misconceptions that arise as a result of incomplete descriptions, outdated arguments, and unclear distinctions between theory and implementation of the models of semantic representation and clarify and amend these points to provide a theoretical basis for future research and discussions on vector models of semantics representation.
Abstract: Models that represent meaning as high-dimensional numerical vectors-such as latent semantic analysis (LSA), hyperspace analogue to language (HAL), bound encoding of the aggregate language environment (BEAGLE), topic models, global vectors (GloVe), and word2vec-have been introduced as extremely powerful machine-learning proxies for human semantic representations and have seen an explosive rise in popularity over the past 2 decades. However, despite their considerable advancements and spread in the cognitive sciences, one can observe problems associated with the adequate presentation and understanding of some of their features. Indeed, when these models are examined from a cognitive perspective, a number of unfounded arguments tend to appear in the psychological literature. In this article, we review the most common of these arguments and discuss (a) what exactly these models represent at the implementational level and their plausibility as a cognitive theory, (b) how they deal with various aspects of meaning such as polysemy or compositionality, and (c) how they relate to the debate on embodied and grounded cognition. We identify common misconceptions that arise as a result of incomplete descriptions, outdated arguments, and unclear distinctions between theory and implementation of the models. We clarify and amend these points to provide a theoretical basis for future research and discussions on vector models of semantic representation.

132 citations


Cites background or methods from "Distributional Models of Word Meani..."

  • ...Excellent reviews in this respect are provided by Sahlgren (2006), Turney and Pantel (2010), Lenci (2008, 2018), Jones et al....

    [...]

  • ...Alongside this, we describe how traditional, count-based DSMs2 such as LSA, HAL, or GloVe are typically implemented (for comprehensive overviews, see Lenci, 2018; Turney & Pantel, 2010) and what type of information their word vectors actually represent....

    [...]

01 Jan 2016
TL;DR: In this paper, the authors describe the problem of people downloading semantic relations and the lexicon antonymy synonymy and other paradigms, but end up in harmful downloads, rather than reading a good book with a cup of coffee in the afternoon.
Abstract: Thank you very much for downloading semantic relations and the lexicon antonymy synonymy and other paradigms. As you may know, people have search numerous times for their favorite novels like this semantic relations and the lexicon antonymy synonymy and other paradigms, but end up in harmful downloads. Rather than reading a good book with a cup of coffee in the afternoon, instead they juggled with some infectious virus inside their desktop computer.

73 citations

01 Jan 1998
TL;DR: The aim of this paper is to introduce the framework of update semantics and to explain what kind of semantic phenomena may successfully be analysed in it and to give a detailed analysis of one such phenomenon: default reasoning.
Abstract: The aim of this paper is twofold: (i) to introduce the framework of update semantics and to explain what kind of semantic phenomena may successfully be analysed in it: (ii) to give a detailed analysis of one such phenomenon: default reasoning.

43 citations

Journal ArticleDOI
TL;DR: It is shown that word vectors can capture some types of perceptual and spatiotemporal information about concrete concepts and some relevant word categories, suggesting that language statistics can encode more perceptual knowledge than often expected.

33 citations

References
More filters
Proceedings ArticleDOI
01 Oct 2014
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Abstract: Recent methods for learning vector space representations of words have succeeded in capturing fine-grained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a word-word cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition.

30,558 citations


"Distributional Models of Word Meani..." refers background in this paper

  • ...…al. (2007) High-dimensional explorer (HiDEx) Generalization of HAL with a larger range of parameter settings Shaoul & Westbury (2010) Global vectors (GloVe) Word-by-word matrix reduced with weighted least-squares regression Pennington et al. (2014) Abbreviation: SVD, singular value decomposition....

    [...]

Proceedings Article
03 Jan 2001
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

25,546 citations

Proceedings Article
Tomas Mikolov1, Ilya Sutskever1, Kai Chen1, Greg S. Corrado1, Jeffrey Dean1 
05 Dec 2013
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

24,012 citations


"Distributional Models of Word Meani..." refers background or methods in this paper

  • ...Various types of “linguistic regularities” have been claimed to be identifiable by neural embeddings (Mikolov et al. 2013c)....

    [...]

  • ...The most popular neural DSM is the one implemented in the word2vec library, which uses the softmax function for predicting b given a (Mikolov et al. 2013a,b): (7) p(b |a) = exp(b · a)∑ b ′∈C exp(b′ · a) , where C is the set of context words and b and a are the vector representations for the context…...

    [...]

Posted Content
TL;DR: This paper proposed two novel model architectures for computing continuous vector representations of words from very large data sets, and the quality of these representations is measured in a word similarity task and the results are compared to the previously best performing techniques based on different types of neural networks.
Abstract: We propose two novel model architectures for computing continuous vector representations of words from very large data sets. The quality of these representations is measured in a word similarity task, and the results are compared to the previously best performing techniques based on different types of neural networks. We observe large improvements in accuracy at much lower computational cost, i.e. it takes less than a day to learn high quality word vectors from a 1.6 billion words data set. Furthermore, we show that these vectors provide state-of-the-art performance on our test set for measuring syntactic and semantic word similarities.

20,077 citations

Journal ArticleDOI
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Abstract: A new method for automatic indexing and retrieval is described. The approach is to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries. The particular technique used is singular-value decomposition, in which a large term by document matrix is decomposed into a set of ca. 100 orthogonal factors from which the original matrix can be approximated by linear combination. Documents are represented by ca. 100 item vectors of factor weights. Queries are represented as pseudo-document vectors formed from weighted combinations of terms, and documents with supra-threshold cosine values are returned. initial tests find this completely automatic method for retrieval to be promising.

12,443 citations

Trending Questions (1)