scispace - formally typeset
Proceedings ArticleDOI

Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts

Reads0
Chats0
TLDR
This paper empirically shows that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models.
Abstract
A text corpus typically contains two types of context information -- global context and local context. Global context carries topical information which can be utilized by topic models to discover topic structures from the text corpus, while local context can train word embeddings to capture semantic regularities reflected in the text corpus. This encourages us to exploit the useful information in both the global and the local context information. In this paper, we propose a unified language model based on matrix factorization techniques which 1) takes the complementary global and local context information into consideration simultaneously, and 2) models topics and learns word embeddings collaboratively. We empirically show that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models. We also provide qualitative analysis that explains how the cooperation of global and local context information can result in better topic structures and word embeddings.

read more

Citations
More filters
Proceedings ArticleDOI

Discriminative Topic Mining via Category-Name Guided Text Embedding

TL;DR: In this article, a new task, discriminative topic mining, is proposed, which leverages a set of user-provided category names to mine discriminating topics from text corpora, which helps a user understand clearly and distinctively the topics he/she is most interested in.
Proceedings ArticleDOI

Word2Sense: Sparse Interpretable Word Embeddings.

TL;DR: An unsupervised method to generate Word2Sense word embeddings that are interpretable — each dimension of the embedding space corresponds to a fine-grained sense, and the non-negative value of theembedding along the j-th dimension represents the relevance of theJ-th sense to the word.
Journal ArticleDOI

MeSHProbeNet: a self-attentive probe net for MeSH indexing.

TL;DR: An end-to-end framework, MeSHProbeNet (formerly named as xgx), which utilizes deep learning and self-attentive MeSH probes to index MeSH terms, and achieves the highest scores in all the F-measures.
Proceedings ArticleDOI

Discriminative Topic Mining via Category-Name Guided Text Embedding

TL;DR: CatE is developed, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discrim inative embedding space and discover category representative terms in an iterative manner.
Proceedings ArticleDOI

Correlation Networks for Extreme Multi-label Text Classification

TL;DR: The Correlation Networks (CorNet) architecture for the extreme multi-label text classification (XMTC) task, where the objective is to tag an input text sequence with the most relevant subset of labels from an extremely large label set, is developed.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings ArticleDOI

Glove: Global Vectors for Word Representation

TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Proceedings Article

Distributed Representations of Words and Phrases and their Compositionality

TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Related Papers (5)