Best Topic Word Selection for Topic Labelling

Open AccessProceedings Article

Best Topic Word Selection for Topic Labelling

- pp 605-613

TLDR

This paper proposes a number of features intended to capture the best topic word, and shows that, in combination as inputs to a reranking model, they are able to consistently achieve results above the baseline of simply selecting the highest-ranked topic word.

Abstract:

This paper presents the novel task of best topic word selection, that is the selection of the topic word that is the best label for a given topic, as a means of enhancing the interpretation and visualisation of topic models We propose a number of features intended to capture the best topic word, and show that, in combination as inputs to a reranking model, we are able to consistently achieve results above the baseline of simply selecting the highest-ranked topic word This is the case both when training in-domain over other labelled topics for that topic model, and cross-domain, using only labellings from independent topic models learned over document collections from different domains and genres

Citations

PDF

Open Access

More filters

Proceedings ArticleDOI

Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality

Jey Han Lau, +2 more

TL;DR: This work explores the two tasks of automatic Evaluation of single topics and automatic evaluation of whole topic models, and provides recommendations on the best strategy for performing the two task, in addition to providing an open-source toolkit for topic and topic model evaluation.

...read moreread less

Proceedings Article

Automatic Labelling of Topic Models

Jey Han Lau, +3 more

TL;DR: This work proposes a method for automatically labelling topics learned via LDA topic models using a combination of association measures and lexical features, optionally fed into a supervised ranking model.

...read moreread less

Book

Applications of Topic Models

Jordan Boyd-Graber, +2 more

TL;DR: Applications of Topic Models describes the recent academic and industrial applications of topic models and reviews their successful use by researchers to help understand fiction, non-fiction, scientific publications, and political texts.

...read moreread less

Proceedings ArticleDOI

Latent topic feedback for information retrieval

David Andrzejewski, +1 more

TL;DR: This work proposes to augment standard keyword search with user feedback on latent topics that are automatically learned from the corpus in an unsupervised manner and presented alongside search results.

...read moreread less

Book ChapterDOI

Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements

Edoardo M. Airoldi, +6 more

TL;DR: Care and Feeding of Topic Models: Problems, Diagnostics, and Improvements and the Handbook of Mixed Membership Models and Their Applications.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Latent dirichlet allocation

David M. Blei, +2 more

- 01 Mar 2003 -

Journal of Machine Learning Research

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.

...read moreread less

Proceedings Article

Latent Dirichlet Allocation

David M. Blei, +2 more

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).

...read moreread less

Journal ArticleDOI

Finding scientific topics

Thomas L. Griffiths, +1 more

- 06 Apr 2004 -

Proceedings of the National Academy of S...

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.

...read moreread less

Proceedings ArticleDOI

Training linear SVMs in linear time

Thorsten Joachims

TL;DR: A Cutting Plane Algorithm for training linear SVMs that provably has training time 0(s,n) for classification problems and o(sn log (n)) for ordinal regression problems and several orders of magnitude faster than decomposition methods like svm light for large datasets.

...read moreread less

Proceedings Article

Reading Tea Leaves: How Humans Interpret Topic Models

Jonathan Chang, +4 more

TL;DR: New quantitative methods for measuring semantic meaning in inferred topics are presented, showing that they capture aspects of the model that are undetected by previous measures of model quality based on held-out likelihood.

...read moreread less