Open AccessProceedings Article
Best Topic Word Selection for Topic Labelling
Jey Han Lau,David Newman,Sarvnaz Karimi,Timothy Baldwin +3 more
- pp 605-613
TLDR
This paper proposes a number of features intended to capture the best topic word, and shows that, in combination as inputs to a reranking model, they are able to consistently achieve results above the baseline of simply selecting the highest-ranked topic word.Abstract:
This paper presents the novel task of best topic word selection, that is the selection of the topic word that is the best label for a given topic, as a means of enhancing the interpretation and visualisation of topic models We propose a number of features intended to capture the best topic word, and show that, in combination as inputs to a reranking model, we are able to consistently achieve results above the baseline of simply selecting the highest-ranked topic word This is the case both when training in-domain over other labelled topics for that topic model, and cross-domain, using only labellings from independent topic models learned over document collections from different domains and genresread more
Citations
More filters
Proceedings ArticleDOI
Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality
TL;DR: This work explores the two tasks of automatic Evaluation of single topics and automatic evaluation of whole topic models, and provides recommendations on the best strategy for performing the two task, in addition to providing an open-source toolkit for topic and topic model evaluation.
Proceedings Article
Automatic Labelling of Topic Models
TL;DR: This work proposes a method for automatically labelling topics learned via LDA topic models using a combination of association measures and lexical features, optionally fed into a supervised ranking model.
Book
Applications of Topic Models
TL;DR: Applications of Topic Models describes the recent academic and industrial applications of topic models and reviews their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts.
Proceedings ArticleDOI
Latent topic feedback for information retrieval
David Andrzejewski,David Buttler +1 more
TL;DR: This work proposes to augment standard keyword search with user feedback on latent topics that are automatically learned from the corpus in an unsupervised manner and presented alongside search results.
Book ChapterDOI
Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements
Edoardo M. Airoldi,David M. Blei,Elena A. Erosheva,Stephen E. Fienberg,Jordan Boyd-Graber,David Mimno,David Newman +6 more
TL;DR: Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements and the Handbook of Mixed Membership Models and Their Applications.
References
More filters
Journal ArticleDOI
Latent dirichlet allocation
TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article
Latent Dirichlet Allocation
TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI
Finding scientific topics
TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Proceedings ArticleDOI
Training linear SVMs in linear time
TL;DR: A Cutting Plane Algorithm for training linear SVMs that provably has training time 0(s,n) for classification problems and o(sn log (n)) for ordinal regression problems and several orders of magnitude faster than decomposition methods like svm light for large datasets.
Proceedings Article
Reading Tea Leaves: How Humans Interpret Topic Models
TL;DR: New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.