Proceedings ArticleDOI
Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts
Guangxu Xun,Yaliang Li,Jing Gao,Aidong Zhang +3 more
- pp 535-543
Reads0
Chats0
TLDR
This paper empirically shows that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models.Abstract:
A text corpus typically contains two types of context information -- global context and local context. Global context carries topical information which can be utilized by topic models to discover topic structures from the text corpus, while local context can train word embeddings to capture semantic regularities reflected in the text corpus. This encourages us to exploit the useful information in both the global and the local context information. In this paper, we propose a unified language model based on matrix factorization techniques which 1) takes the complementary global and local context information into consideration simultaneously, and 2) models topics and learns word embeddings collaboratively. We empirically show that by incorporating both global and local context, this collaborative model can not only significantly improve the performance of topic discovery over the baseline topic models, but also learn better word embeddings than the baseline word embedding models. We also provide qualitative analysis that explains how the cooperation of global and local context information can result in better topic structures and word embeddings.read more
Citations
More filters
Proceedings ArticleDOI
Discriminative Topic Mining via Category-Name Guided Text Embedding
TL;DR: In this article, a new task, discriminative topic mining, is proposed, which leverages a set of user-provided category names to mine discriminating topics from text corpora, which helps a user understand clearly and distinctively the topics he/she is most interested in.
Proceedings ArticleDOI
Word2Sense: Sparse Interpretable Word Embeddings.
TL;DR: An unsupervised method to generate Word2Sense word embeddings that are interpretable — each dimension of the embedding space corresponds to a fine-grained sense, and the non-negative value of theembedding along the j-th dimension represents the relevance of theJ-th sense to the word.
Journal ArticleDOI
MeSHProbeNet: a self-attentive probe net for MeSH indexing.
TL;DR: An end-to-end framework, MeSHProbeNet (formerly named as xgx), which utilizes deep learning and self-attentive MeSH probes to index MeSH terms, and achieves the highest scores in all the F-measures.
Proceedings ArticleDOI
Discriminative Topic Mining via Category-Name Guided Text Embedding
TL;DR: CatE is developed, a novel category-name guided text embedding method for discriminative topic mining, which effectively leverages minimal user guidance to learn a discrim inative embedding space and discover category representative terms in an iterative manner.
Proceedings ArticleDOI
Correlation Networks for Extreme Multi-label Text Classification
TL;DR: The Correlation Networks (CorNet) architecture for the extreme multi-label text classification (XMTC) task, where the objective is to tag an input text sequence with the most relevant subset of labels from an extremely large label set, is developed.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings ArticleDOI
Glove: Global Vectors for Word Representation
TL;DR: A new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods and produces a vector space with meaningful substructure.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Proceedings Article
Distributed Representations of Words and Phrases and their Compositionality
TL;DR: This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Journal ArticleDOI
Indexing by Latent Semantic Analysis
TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.