Topical word embeddings

Home
/
Papers
/
Topical word embeddings

Proceedings Article•

Topical word embeddings

Yang Liu¹, Zhiyuan Liu¹, Tat-Seng Chua², Maosong Sun¹•Institutions (2)

Tsinghua University¹, National University of Singapore²

25 Jan 2015-Vol. 29, Iss: 1, pp 2418-2424

TL;DR: The experimental results show that the TWE models outperform typical word embedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document models on text classification.

read less

Abstract: Most word embedding models typically represent each word using a single vector, which makes these models indiscriminative for ubiquitous homonymy and polysemy. In order to enhance discriminativeness, we employ latent topic models to assign topics for each word in the text corpus, and learn topical word embeddings (TWE) based on both words and their topics. In this way, contextual word embeddings can be flexibly obtained to measure contextual word similarity. We can also build document representations, which are more expressive than some widely-used document models such as latent topic models. In the experiments, we evaluate the TWE models on two tasks, contextual word similarity and text classification. The experimental results show that our models outperform typical word embedding models including the multi-prototype version on contextual word similarity, and also exceed latent topic models and other representative document models on text classification. The source code of this paper can be obtained from https://github.com/largelymfs/topical_word_embeddings.

...read moreread less

Content maybe subject to copyright Report

Citations

PDF

Open Access

More filters

Proceedings Article•DOI•

PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks

[...]

Jian Tang¹, Meng Qu², Qiaozhu Mei³•Institutions (3)

Microsoft¹, Peking University², University of Michigan³

10 Aug 2015

TL;DR: A semi-supervised representation learning method for text data, which is called the predictive text embedding (PTE), which is comparable or more effective, much more efficient, and has fewer parameters to tune.

...read moreread less

Abstract: Unsupervised text embedding methods, such as Skip-gram and Paragraph Vector, have been attracting increasing attention due to their simplicity, scalability, and effectiveness. However, comparing to sophisticated deep learning architectures such as convolutional neural networks, these methods usually yield inferior results when applied to particular machine learning tasks. One possible reason is that these text embedding methods learn the representation of text in a fully unsupervised way, without leveraging the labeled information available for the task. Although the low dimensional representations learned are applicable to many different tasks, they are not particularly tuned for any task. In this paper, we fill this gap by proposing a semi-supervised representation learning method for text data, which we call the predictive text embedding (PTE). Predictive text embedding utilizes both labeled and unlabeled data to learn the embedding of text. The labeled information and different levels of word co-occurrence information are first represented as a large-scale heterogeneous text network, which is then embedded into a low dimensional space through a principled and efficient algorithm. This low dimensional embedding not only preserves the semantic closeness of words and documents, but also has a strong predictive power for the particular task. Compared to recent supervised approaches based on convolutional neural networks, predictive text embedding is comparable or more effective, much more efficient, and has fewer parameters to tune.

...read moreread less

703 citations

Cites result from "Topical word embeddings"

...Similar results to ours are also reported in [14]....
[...]

Journal Article•DOI•

Improving Topic Models with Latent Feature Word Representations

[...]

Dat Quoc Nguyen¹, Richard Billingsley¹, Lan Du¹, Mark Johnson¹•Institutions (1)

Macquarie University¹

02 Jun 2015-Transactions of the Association for Computational Linguistics

TL;DR: This article extended two Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus.

...read moreread less

Abstract: Probabilistic topic models are widely used to discover latent topics in document collections, while latent feature vector representations of words have been used to obtain high performance in many NLP tasks. In this paper, we extend two different Dirichlet multinomial topic models by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus. Experimental results show that by using information from the external corpora, our new models produce significant improvements on topic coherence, document clustering and document classification tasks, especially on datasets with few or short documents.

...read moreread less

276 citations

Proceedings Article•DOI•

Query Expansion with Locally-Trained Word Embeddings

[...]

Fernando Diaz¹, Bhaskar Mitra¹, Nick Craswell¹•Institutions (1)

Microsoft¹

04 Jun 2016

TL;DR: It is demonstrated that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddlings for retrieval tasks, suggesting that other tasks benefiting from global embeddments may also benefit from local embeddins.

...read moreread less

Abstract: Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddings for retrieval tasks. These results suggest that other tasks benefiting from global embeddings may also benefit from local embeddings.

...read moreread less

265 citations

Cites background from "Topical word embeddings"

...The problem can be addressed by training a global model with multiple vector embeddings per word (Reisinger and Mooney, 2010a; Huang et al., 2012) or topicspecific embeddings (Liu et al., 2015)....
[...]
..., 2012) or topicspecific embeddings (Liu et al., 2015)....
[...]

Posted Content•

Improving Topic Models with Latent Feature Word Representations

[...]

Dat Quoc Nguyen¹, Richard Billingsley¹, Lan Du¹, Mark Johnson¹•Institutions (1)

Macquarie University¹

15 Oct 2018-arXiv: Computation and Language

TL;DR: Two different Dirichlet multinomial topic models are extended by incorporating latent feature vector representations of words trained on very large corpora to improve the word-topic mapping learnt on a smaller corpus.

...read moreread less

251 citations

Cites background from "Topical word embeddings"

...Recent approaches based on deep neural networks learn vectors by predicting words given their window-based context (Collobert and Weston, 2008; Mikolov et al., 2013; Pennington et al., 2014; Liu et al., 2015)....
[...]
... counts. Recent approaches based on deep neural networks learn vectors by predicting words given their window-based context (Collobert and Weston, 2008; Mikolov et al., 2013; Pennington et al., 2014; Liu et al., 2015). Mikolov et al. (2013)’s method maximizes the log likelihood of each word given its context. Pennington et al. (2014) used back-propagation to minimize the squared error of a prediction of the logfre...
[...]

Journal Article•DOI•

BERTopic: Neural topic modeling with a class-based TF-IDF procedure

[...]

Maarten Grootendorst

11 Mar 2022-arXiv.org

TL;DR: BERTopic is presented, a topic model that extends the process of topic modeling by extracting coherent topic representation through the development of a class-based variation of TF-IDF.

...read moreread less

Abstract: Topic models can be useful tools to discover latent topics in collections of documents. Recent studies have shown the feasibility of approach topic modeling as a clustering task. We present BERTopic, a topic model that extends this process by extracting coherent topic representation through the development of a class-based variation of TF-IDF. More specifically, BERTopic generates document embedding with pre-trained transformer-based language models, clusters these embeddings, and finally, generates topic representations with the class-based TF-IDF procedure. BERTopic generates coherent topics and remains competitive across a variety of benchmarks involving classical models and those that follow the more recent clustering approach of topic modeling.

...read moreread less

222 citations

1
2
3
4
…
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83

Collapse

References

PDF

Open Access

More filters

Journal Article•DOI•

Latent dirichlet allocation

[...]

David M. Blei¹, Andrew Y. Ng², Michael I. Jordan¹•Institutions (2)

University of California, Berkeley¹, Stanford University²

01 Mar 2003-Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Abstract: We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model.

...read moreread less

30,570 citations

"Topical word embeddings" refers methods in this paper

...We employ the widely used latent Dirichlet allocation (LDA) (Blei, Ng, and Jordan 2003) to obtain word topics, and perform collapsed Gibbs sampling (Griffiths and Steyvers 2004) to iteratively assign latent topics for each word token....
[...]

Proceedings Article•

Latent Dirichlet Allocation

[...]

David M. Blei¹, Andrew Y. Ng¹, Michael I. Jordan¹•Institutions (1)

University of California, Berkeley¹

03 Jan 2001

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Abstract: We propose a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams [6], and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI) [3]. In the context of text modeling, our model posits that each document is generated as a mixture of topics, where the continuous-valued mixture proportions are distributed as a latent Dirichlet random variable. Inference and learning are carried out efficiently via variational algorithms. We present empirical results on applications of this model to problems in text modeling, collaborative filtering, and text classification.

...read moreread less

25,546 citations

Proceedings Article•

Distributed Representations of Words and Phrases and their Compositionality

[...]

Tomas Mikolov¹, Ilya Sutskever¹, Kai Chen¹, Greg S. Corrado¹, Jeffrey Dean¹ - Show less +1 more•Institutions (1)

Google¹

05 Dec 2013

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Abstract: The recently introduced continuous Skip-gram model is an efficient method for learning high-quality distributed vector representations that capture a large number of precise syntactic and semantic word relationships. In this paper we present several extensions that improve both the quality of the vectors and the training speed. By subsampling of the frequent words we obtain significant speedup and also learn more regular word representations. We also describe a simple alternative to the hierarchical softmax called negative sampling. An inherent limitation of word representations is their indifference to word order and their inability to represent idiomatic phrases. For example, the meanings of "Canada" and "Air" cannot be easily combined to obtain "Air Canada". Motivated by this example, we present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible.

...read moreread less

24,012 citations

"Topical word embeddings" refers methods in this paper

...The training objective of CBOW is to combine the embeddings of context words to predict the target word; while Skip-Gram is to use the embedding of each target word to predict its context words (Mikolov et al. 2013)....
[...]
...We extend Skip-Gram (Mikolov et al. 2013), the stateof-the-art word embedding model, to implement our TWE models....
[...]
...In previous work, the task of word similarity is always used to evaluate the performance of word embedding methods (Mikolov et al. 2013; Baroni, Dinu, and Kruszewski 2014)....
[...]
...In order to make the model efficient for learning, the techniques of hierarchical softmax and negative sampling are used when learning Skip-Gram (Mikolov et al. 2013)....
[...]
...Skip-Gram is a well-known framework for learning word vectors (Mikolov et al. 2013), as shown in Fig....
[...]

Journal Article•DOI•

Learning representations by back-propagating errors

[...]

David E. Rumelhart¹, Geoffrey E. Hinton², Ronald J. Williams¹•Institutions (2)

University of California, San Diego¹, Carnegie Mellon University²

01 Jan 1988-Nature

TL;DR: Back-propagation repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector, which helps to represent important features of the task domain.

...read moreread less

Abstract: We describe a new learning procedure, back-propagation, for networks of neurone-like units. The procedure repeatedly adjusts the weights of the connections in the network so as to minimize a measure of the difference between the actual output vector of the net and the desired output vector. As a result of the weight adjustments, internal ‘hidden’ units which are not part of the input or output come to represent important features of the task domain, and the regularities in the task are captured by the interactions of these units. The ability to create useful new features distinguishes back-propagation from earlier, simpler methods such as the perceptron-convergence procedure1.

...read moreread less

23,814 citations

"Topical word embeddings" refers methods in this paper

...Word embeddings, first proposed in (Rumelhart, Hintont, and Williams 1986), have been successfully used in language models (Bengio et al. 2006; Mnih and Hinton 2008) and many NLP tasks, such as named entity recognition (Turian, Ratinov, and Bengio 2010), disambiguation (Collobert et al. 2011) and…...
[...]

Journal Article•DOI•

WordNet: a lexical database for English

[...]

George A. Miller¹•Institutions (1)

Princeton University¹

01 Nov 1995-Communications of The ACM

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.

...read moreread less

Abstract: Because meaningful sentences are composed of meaningful words, any system that hopes to process natural languages as people do must have information about words and their meanings. This information is traditionally provided through dictionaries, and machine-readable dictionaries are now widely available. But dictionary entries evolved for the convenience of human readers, not for machines. WordNet1 provides a more effective combination of traditional lexicographic information and modern computing. WordNet is an online lexical database designed for use under program control. English nouns, verbs, adjectives, and adverbs are organized into sets of synonyms, each representing a lexicalized concept. Semantic relations link the synonym sets [4].

...read moreread less

15,068 citations