Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts

doi:10.1145/3097983.3098009

Proceedings ArticleDOI

Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts

Guangxu Xun, +3 more

- pp 535-543

Chats0

TLDR

This paper empirically shows that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models.

Abstract:

A text corpus typically contains two types of context information -- global context and local context. Global context carries topical information which can be utilized by topic models to discover topic structures from the text corpus, while local context can train word embeddings to capture semantic regularities reflected in the text corpus. This encourages us to exploit the useful information in both the global and the local context information. In this paper, we propose a unified language model based on matrix factorization techniques which 1) takes the complementary global and local context information into consideration simultaneously, and 2) models topics and learns word embeddings collaboratively. We empirically show that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models. We also provide qualitative analysis that explains how the cooperation of global and local context information can result in better topic structures and word embeddings.

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Discriminative Topic Mining via Category-Name Guided Text Embedding

Yu Meng, +6 more

TL;DR: In this article, a new task, discriminative topic mining, is proposed, which leverages a set of user-provided category names to mine discriminating topics from text corpora, which helps a user understand clearly and distinctively the topics he/she is most interested in.

...read moreread less

Proceedings ArticleDOI

Word2Sense: Sparse Interpretable Word Embeddings.

Abhishek Panigrahi, +2 more

TL;DR: An unsupervised method to generate Word2Sense word embeddings that are interpretable — each dimension of the embedding space corresponds to a fine-grained sense, and the non-negative value of theembedding along the j-th dimension represents the relevance of theJ-th sense to the word.

...read moreread less

Journal ArticleDOI

MeSHProbeNet: a self-attentive probe net for MeSH indexing.

Guangxu Xun, +4 more

- 01 Oct 2019 -

Bioinformatics

TL;DR: An end-to-end framework, MeSHProbeNet (formerly named as xgx), which utilizes deep learning and self-attentive MeSH probes to index MeSH terms, and achieves the highest scores in all the F-measures.

...read moreread less

Proceedings ArticleDOI

Discriminative Topic Mining via Category-Name Guided Text Embedding

Yu Meng, +6 more

- 20 Aug 2019 -

arXiv: Computation and Language

TL;DR: CatE is developed, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discrim inative embedding space and discover category representative terms in an iterative manner.

...read moreread less

Proceedings ArticleDOI

Correlation Networks for Extreme Multi-label Text Classification

Guangxu Xun, +3 more

TL;DR: The Correlation Networks (CorNet) architecture for the extreme multi-label text classification (XMTC) task, where the objective is to tag an input text sequence with the most relevant subset of labels from an extremely large label set, is developed.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

Jeffrey Pennington, +2 more

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

Tomas Mikolov, +4 more

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.

...read moreread less

Journal ArticleDOI

Indexing by Latent Semantic Analysis

Scott Deerwester, +4 more

- 01 Sep 1990 -

Journal of the Association for Informati...

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.

...read moreread less

Collapse

Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts

Citations

Discriminative Topic Mining via Category-Name Guided Text Embedding

Word2Sense: Sparse Interpretable Word Embeddings.

MeSHProbeNet: a self-attentive probe net for MeSH indexing.

Discriminative Topic Mining via Category-Name Guided Text Embedding

Correlation Networks for Extreme Multi-label Text Classification

References

Latent dirichlet allocation

Glove: Global Vectors for Word Representation

Latent Dirichlet Allocation

Distributed Representations of Words and Phrases and their Compositionality

Indexing by Latent Semantic Analysis

Related Papers (5)

Latent dirichlet allocation

Glove: Global Vectors for Word Representation

Neural Word Embedding as Implicit Matrix Factorization

Distributed Representations of Words and Phrases and their Compositionality

A neural probabilistic language model