scispace - formally typeset
Open AccessProceedings Article

Best Topic Word Selection for Topic Labelling

TLDR
This paper proposes a number of features intended to capture the best topic word, and shows that, in combination as inputs to a reranking model, they are able to consistently achieve results above the baseline of simply selecting the highest-ranked topic word.
Abstract
This paper presents the novel task of best topic word selection, that is the selection of the topic word that is the best label for a given topic, as a means of enhancing the interpretation and visualisation of topic models We propose a number of features intended to capture the best topic word, and show that, in combination as inputs to a reranking model, we are able to consistently achieve results above the baseline of simply selecting the highest-ranked topic word This is the case both when training in-domain over other labelled topics for that topic model, and cross-domain, using only labellings from independent topic models learned over document collections from different domains and genres

read more

Citations
More filters
Proceedings ArticleDOI

Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality

TL;DR: This work explores the two tasks of automatic Evaluation of single topics and automatic evaluation of whole topic models, and provides recommendations on the best strategy for performing the two task, in addition to providing an open-source toolkit for topic and topic model evaluation.
Proceedings Article

Automatic Labelling of Topic Models

TL;DR: This work proposes a method for automatically labelling topics learned via LDA topic models using a combination of association measures and lexical features, optionally fed into a supervised ranking model.
Book

Applications of Topic Models

TL;DR: Applications of Topic Models describes the recent academic and industrial applications of topic models and reviews their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts.
Proceedings ArticleDOI

Latent topic feedback for information retrieval

TL;DR: This work proposes to augment standard keyword search with user feedback on latent topics that are automatically learned from the corpus in an unsupervised manner and presented alongside search results.
Book ChapterDOI

Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements

TL;DR: Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements and the Handbook of Mixed Membership Models and Their Applications.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

Finding scientific topics

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Proceedings ArticleDOI

Training linear SVMs in linear time

TL;DR: A Cutting Plane Algorithm for training linear SVMs that provably has training time 0(s,n) for classification problems and o(sn log (n)) for ordinal regression problems and several orders of magnitude faster than decomposition methods like svm light for large datasets.
Proceedings Article

Reading Tea Leaves: How Humans Interpret Topic Models

TL;DR: New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.
Related Papers (5)