Pachinko allocation: DAG-structured mixture models of topic correlations

doi:10.1145/1143844.1143917

Proceedings ArticleDOI

Pachinko allocation: DAG-structured mixture models of topic correlations

Wei Li, +1 more

- pp 577-584

Chats0

TLDR

Improved performance of PAM is shown in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence.

Abstract:

Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. However, LDA does not capture correlations between topics. In this paper, we introduce the pachinko allocation model (PAM), which captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). The leaves of the DAG represent individual words in the vocabulary, while each interior node represents a correlation among its children, which may be words or other interior nodes (topics). PAM provides a flexible alternative to recent work by Blei and Lafferty (2006), which captures correlations only between pairs of topics. Using text data from newsgroups, historic NIPS proceedings and other research paper corpora, we show improved performance of PAM in document classification, likelihood of held-out data, the ability to support finer-grained topics, and topical keyword coherence.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Data clustering: 50 years beyond K-means

Anil K. Jain

TL;DR: A brief overview of clustering is provided, well known clustering methods are summarized, the major challenges and key issues in designing clustering algorithms are discussed, and some of the emerging and useful research directions are pointed out.

...read moreread less

Journal ArticleDOI

Probabilistic topic models

David M. Blei

- 01 Apr 2012 -

Communications of The ACM

TL;DR: Surveying a suite of algorithms that offer a solution to managing large document archives suggests they are well-suited to handle large amounts of data.

...read moreread less

Book ChapterDOI

Data Clustering: 50 Years Beyond K-means

Anil K. Jain

TL;DR: Cluster analysis as mentioned in this paper is the formal study of algorithms and methods for grouping objects according to measured or perceived intrinsic characteristics, which is one of the most fundamental modes of understanding and learning.

...read moreread less

Proceedings ArticleDOI

Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora

Daniel Ramage, +3 more

TL;DR: Labeled LDA is introduced, a topic model that constrains Latent Dirichlet Allocation by defining a one-to-one correspondence between LDA's latent topics and user tags that allows Labeled LDA to directly learn word-tag correspondences.

...read moreread less

Proceedings ArticleDOI

LDA-based document models for ad-hoc retrieval

Xing Wei, +1 more

TL;DR: This paper proposes an LDA-based document model within the language modeling framework, and evaluates it on several TREC collections, and shows that improvements over retrieval using cluster-based models can be obtained with reasonable efficiency.

...read moreread less

Michal Rosen-Zvi, +3 more

TL;DR: The author-topic model is introduced, a generative model for documents that extends Latent Dirichlet Allocation to include authorship information, and applications to computing similarity between authors and entropy of author output are demonstrated.

...read moreread less