Showing papers on "Hierarchical Dirichlet process published in 2008"

PDF

Open Access

Journal Article•DOI•

Retrospective Markov chain Monte Carlo methods for Dirichlet process hierarchical models

[...]

Omiros Papaspiliopoulos¹, Gareth O. Roberts¹•Institutions (1)

01 Mar 2008-Biometrika

TL;DR: In this paper, a Markov chain Monte Carlo (MCMCMC) algorithm is proposed to sample from the exact posterior distribution of quantities of interest, which can be used to approximate the Dirichlet process.

...read moreread less

Abstract: Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorized into marginal and conditional methods. The former integrate out analytically the infinite-dimensional component of the hierarchical model and sample from the marginal distribution of the remaining variables using the Gibbs sampler. Conditional methods impute the Dirichlet process and update it as a component of the Gibbs sampler. Since this requires imputation of an infinite-dimensional process, implementation of the conditional method has relied on finite approximations. In this paper, we show how to avoid such approximations by designing two novel Markov chain Monte Carlo algorithms which sample from the exact posterior distribution of quantities of interest. The approximations are avoided by the new technique of retrospective sampling. We also show how the algorithms can obtain samples from functionals of the Dirichlet process. The marginal and the conditional methods are compared and a careful simulation study is included, which involves a non-conjugate model, different datasets and prior specifications.

...read moreread less

406 citations

Journal Article•DOI•

The Nested Dirichlet Process

[...]

Abel Rodriguez¹, David B. Dunson, Alan E. Gelfand•Institutions (1)

University of California, Santa Cruz¹

01 Sep 2008-Journal of the American Statistical Association

TL;DR: In this article, the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered is addressed, and an efficient Markov chain Monte Carlo algorithm is developed for computation.

...read moreread less

Abstract: In multicenter studies, subjects in different centers may have different outcome distributions. This article is motivated by the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered. Starting with a stick-breaking representation of the Dirichlet process (DP), we replace the random atoms with random probability measures drawn from a DP. This results in a nested DP prior, which can be placed on the collection of distributions for the different centers, with centers drawn from the same DP component automatically clustered together. Theoretical properties are discussed, and an efficient Markov chain Monte Carlo algorithm is developed for computation. The methods are illustrated using a simulation study and an application to quality of care in U.S. hospitals.

...read moreread less

320 citations

Proceedings Article•DOI•

An HDP-HMM for systems with state persistence

[...]

Emily B. Fox¹, Erik B. Sudderth², Michael I. Jordan², Alan S. Willsky¹•Institutions (2)

Massachusetts Institute of Technology¹, University of California, Berkeley²

05 Jul 2008

TL;DR: A sampling algorithm is developed that employs a truncated approximation of the DP to jointly resample the full state sequence, greatly improving mixing rates and demonstrating the advantages of the sticky extension, and the utility of the HDP-HMM in real-world applications.

...read moreread less

Abstract: The hierarchical Dirichlet process hidden Markov model (HDP-HMM) is a flexible, nonparametric model which allows state spaces of unknown size to be learned from data We demonstrate some limitations of the original HDP-HMM formulation (Teh et al, 2006), and propose a sticky extension which allows more robust learning of smoothly varying dynamics Using DP mixtures, this formulation also allows learning of more complex, multimodal emission distributions We further develop a sampling algorithm that employs a truncated approximation of the DP to jointly resample the full state sequence, greatly improving mixing rates Via extensive experiments with synthetic data and the NIST speaker diarization database, we demonstrate the advantages of our sticky extension, and the utility of the HDP-HMM in real-world applications

...read moreread less

313 citations

Proceedings Article•

Nonparametric Bayesian Learning of Switching Linear Dynamical Systems

[...]

Emily B. Fox¹, Erik B. Sudderth², Michael I. Jordan², Alan S. Willsky¹•Institutions (2)

Massachusetts Institute of Technology¹, University of California, Berkeley²

08 Dec 2008

TL;DR: This work develops a sampling algorithm that combines a truncated approximation to the Dirichlet process with efficient joint sampling of the mode and state sequences in an unknown number of persistent, smooth dynamical modes.

...read moreread less

Abstract: Many nonlinear dynamical phenomena can be effectively modeled by a system that switches among a set of conditionally linear dynamical modes. We consider two such models: the switching linear dynamical system (SLDS) and the switching vector autoregressive (VAR) process. Our nonparametric Bayesian approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes. We develop a sampling algorithm that combines a truncated approximation to the Dirichlet process with efficient joint sampling of the mode and state sequences. The utility and flexibility of our model are demonstrated on synthetic data, sequences of dancing honey bees, and the IBOVESPA stock index.

...read moreread less

221 citations

Journal Article•DOI•

Describing Visual Scenes Using Transformed Objects and Parts

[...]

Erik B. Sudderth¹, Antonio Torralba², William T. Freeman², Alan S. Willsky²•Institutions (2)

University of California, Berkeley¹, Massachusetts Institute of Technology²

01 May 2008-International Journal of Computer Vision

TL;DR: This work develops hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them and proposes nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene.

...read moreread less

Abstract: We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.

...read moreread less

195 citations

Proceedings Article•DOI•

The dynamic hierarchical Dirichlet process

[...]

Lu Ren¹, David B. Dunson¹, Lawrence Carin¹•Institutions (1)

Duke University¹

05 Jul 2008

TL;DR: The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets, and a relatively simple Markov Chain Monte Carlo sampler is developed.

...read moreread less

Abstract: The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets. The data collected at any time point are represented via a mixture associated with an appropriate underlying model, in the framework of HDP. The statistical properties of data collected at consecutive time points are linked via a random parameter that controls their probabilistic similarity. The sharing mechanisms of the time-evolving data are derived, and a relatively simple Markov Chain Monte Carlo sampler is developed. Experimental results are presented to demonstrate the model.

...read moreread less

126 citations

Journal Article•DOI•

Clustering of Count Data Using Generalized Dirichlet Multinomial Distributions

[...]

Nizar Bouguila¹•Institutions (1)

Concordia University¹

01 Apr 2008-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This paper proposes a model called the multinomial generalized Dirichlet distribution (MGDD) that is the composition of the generalized Diriclet distribution and theMultinomial, in the same way that the MDD is the compositions of the Dirichlets and themultinomial.

...read moreread less

Abstract: In this paper, we examine the problem of count data clustering. We analyze this problem using finite mixtures of distributions. The multinomial distribution and the multinomial Dirichlet distribution (MDD) are widely accepted to model count data. We show that these two distributions cannot be the best choice in all the applications, and we propose another model called the multinomial generalized Dirichlet distribution (MGDD) that is the composition of the generalized Dirichlet distribution and the multinomial, in the same way that the MDD is the composition of the Dirichlet and the multinomial. The estimation of the parameters and the determination of the number of components in our model are based on the deterministic annealing expectation-maximization (DAEM) approach and the minimum description length (MDL) criterion, respectively. We compare our method to standard approaches such as multinomial and multinomial Dirichlet mixtures to show its merits. The comparison involves different applications such as spatial color image databases indexing, handwritten digit recognition, and text document clustering.

...read moreread less

97 citations

Proceedings Article•

Content-based musical similarity computation using the hierarchical dirichlet process

[...]

Matthew D. Hoffman, David M. Blei, Perry R. Cook

01 Jan 2008

TL;DR: A method for discovering the latent structure in MFCC feature data using the Hierarchical Dirichlet Process (HDP) and compute timbral similarity between recorded songs, which is faster than previous approaches that compare single Gaussian distributions directly.

...read moreread less

Abstract: We develop a method for discovering the latent structure in MFCC feature data using the Hierarchical Dirichlet Process (HDP) Based on this structure, we compute timbral similarity between recorded songs The HDP is a nonparametric Bayesian model Like the Gaussian Mixture Model (GMM), it represents each song as a mixture of some number of multivariate Gaussian distributions However, the number of mixture components is not fixed in the HDP, but is determined as part of the posterior inference process Moreover, in the HDP the same set of Gaussians is used to model all songs, with only the mixture weights varying from song to song We compute the similarity of songs based on these weights, which is faster than previous approaches that compare single Gaussian distributions directly Experimental results on a genre-based retrieval task illustrate that our HDPbased method is both faster and produces better retrieval quality than such previous approaches

...read moreread less

74 citations

Journal Article•DOI•

Infinite Hidden Markov Models for Unusual-Event Detection in Video

[...]

Iulian Pruteanu-Malinici¹, Lawrence Carin¹•Institutions (1)

Duke University¹

01 May 2008-IEEE Transactions on Image Processing

TL;DR: This work addresses the problem of unusual-event detection in a video sequence using an infinite hidden Markov model (iHMM), which is trained using ldquonormalrdquo/ldquotypical thirdquo video and evaluation of posterior distributions is achieved via Markov chain Monte Carlo and using a variational Bayes formulation.

...read moreread less

Abstract: We address the problem of unusual-event detection in a video sequence. Invariant subspace analysis (ISA) is used to extract features from the video, and the time-evolving properties of these features are modeled via an infinite hidden Markov model (iHMM), which is trained using ldquonormalrdquo/ldquotypicalrdquo video. The iHMM retains a full posterior density function on all model parameters, including the number of underlying HMM states. Anomalies (unusual events) are detected subsequently if a low likelihood is observed when associated sequential features are submitted to the trained iHMM. A hierarchical Dirichlet process framework is employed in the formulation of the iHMM. The evaluation of posterior distributions for the iHMM is achieved in two ways: via Markov chain Monte Carlo and using a variational Bayes formulation. Comparisons are made to modeling based on conventional maximum-likelihood-based HMMs, as well as to Dirichlet-process-based Gaussian-mixture models.

...read moreread less

72 citations

Proceedings Article•DOI•

Evolutionary Clustering by Hierarchical Dirichlet Process with Hidden Markov State

[...]

Tianbing Xu¹, Zhongfei Zhang¹, Philip S. Yu², Bo Long¹•Institutions (2)

Binghamton University¹, University of Illinois at Chicago²

15 Dec 2008

TL;DR: The HDP-HTM model substantially advances the literature on evolutionary clustering in the sense that not only it performs better than the existing literature, but more importantly it is capable of automatically learning the cluster numbers and structures and at the same time explicitly addresses the correspondence issue during the evolution.

...read moreread less

Abstract: This paper studies evolutionary clustering, which is a recently hot topic with many important applications, noticeably in social network analysis. In this paper, based on the recent literature on Hierarchical Dirichlet Process (HDP) and Hidden Markov Model (HMM), we have developed a statistical model HDP-HTM that combines HDP with a Hierarchical Transition Matrix (HTM) based on the proposed Infinite Hierarchical Hidden Markov State model (iH2MS) as an effective solution to this problem. The HDP-HTM model substantially advances the literature on evolutionary clustering in the sense that not only it performs better than the existing literature, but more importantly it is capable of automatically learning the cluster numbers and structures and at the same time explicitly addresses the correspondence issue during the evolution. Extensive evaluations have demonstrated the effectiveness and promise of this solution against the state-of-the-art literature.

...read moreread less

64 citations

Proceedings Article•DOI•

Annotating images and image objects using a hierarchical dirichlet process model

[...]

Oksana Yakhnenko¹, Vasant Honavar¹•Institutions (1)

Iowa State University¹

24 Aug 2008

TL;DR: A nonparametric Bayesian model which provides a generalization for multi-model latent Dirichlet allocation model (MoM-LDA) used for similar problems in the past and performs just as well as or better than the MoM- LDA model (regardless of the choice of the number of clusters) for predicting labels of objects in images containing multiple objects.

...read moreread less

Abstract: Many applications call for learning to label individual objects in an image where the only information available to the learner is a dataset of images with their associated captions, i.e., words that describe the image content without specifically labeling the individual objects. We address this problem using a multi-modal hierarchical Dirichlet process model (MoM-HDP) - a nonparametric Bayesian model which provides a generalization for multi-model latent Dirichlet allocation model (MoM-LDA) used for similar problems in the past. We apply this model for predicting labels of objects in images containing multiple objects. During training, the model has access to an un-segmented image and its caption, but not the labels for each object in the image. The trained model is used to predict the label for each region of interest in a segmented image. MoM-HDP generalizes a multi-modal latent Dirichlet allocation model in that it allows the number of components of the mixture model to adapt to the data. The model parameters are efficiently estimated using variational inference. Our experiments show that MoM-HDP performs just as well as or better than the MoM-LDA model (regardless the choice of the number of clusters in the MoM-LDA model).

...read moreread less

Proceedings Article•DOI•

A Dirichlet process mixture of dirichlet distributions for classification and prediction

[...]

Nizar Bouguila¹, Djemel Ziou•Institutions (1)

Concordia University¹

21 Nov 2008

TL;DR: A learning approach based on both Dirichlet process andDirichlet distribution which provide flexible nonparametric Bayesian framework for non-Gaussian data clustering and relies on the estimation of the posterior distribution of clusterings using Gibbs sampler.

...read moreread less

Abstract: A significant problem in clustering is the determination of the number of classes which best describes the data. This paper proposes a learning approach based on both Dirichlet process and Dirichlet distribution which provide flexible nonparametric Bayesian framework for non-Gaussian data clustering. Our approach is Bayesian and relies on the estimation of the posterior distribution of clusterings using Gibbs sampler. The experimental results involve data classification and image models prediction, and show the merits of our approach.

...read moreread less

Journal Article•DOI•

Probabilistic distance measures of the Dirichlet and Beta distributions

[...]

Thomas W. Rauber¹, Tim Braun², Karsten Berns²•Institutions (2)

Universidade Federal do Espírito Santo¹, Kaiserslautern University of Technology²

01 Feb 2008-Pattern Recognition

TL;DR: The analytical definitions of the Chernoff, Bhattacharyya and Jeffreys-Matusita probabilistic distances between two Dirichlet distributions and two Beta distributions are given and their inappropriateness is shown in the analytical case.

...read moreread less

Proceedings Article•DOI•

Latent dirichlet language model for speech recognition

[...]

Jen-Tzung Chien¹, Chuang-Hua Chueh¹•Institutions (1)

National Cheng Kung University¹

01 Dec 2008

TL;DR: A new latent Dirichlet language model (LDLM) is presented for modeling of word sequence by merging theDirichlet priors to characterize the uncertainty of latent topics of n-gram events and a new Bayesian framework is introduced.

...read moreread less

Abstract: Latent Dirichlet allocation (LDA) has been successfully presented for document modeling and classification. LDA calculates the document probability based on bag-of-words scheme without considering the sequence of words. This model discovers the topic structure at document level, which is different from the concern of word prediction in speech recognition. In this paper, we present a new latent Dirichlet language model (LDLM) for modeling of word sequence. A new Bayesian framework is introduced by merging the Dirichlet priors to characterize the uncertainty of latent topics of n-gram events. The robust topic-based language model is established accordingly. In the experiments, we implement LDLM for continuous speech recognition and obtain better performance than probabilistic latent semantic analysis (PLSA) based language method.

...read moreread less

Posted Content•

Component models for large networks

[...]

Janne Sinkkonen, Janne Aukia, Samuel Kaski

11 Mar 2008-arXiv: Machine Learning

TL;DR: This work introduces an alternative, interaction component model for communities (ICMc), where the whole network is a bag of links, stemming from different components, which assumes assortativity and finds community-like structures like the earlier methods motivated by physics.

...read moreread less

Abstract: Being among the easiest ways to find meaningful structure from discrete data, Latent Dirichlet Allocation (LDA) and related component models have been applied widely They are simple, computationally fast and scalable, interpretable, and admit nonparametric priors In the currently popular field of network modeling, relatively little work has taken uncertainty of data seriously in the Bayesian sense, and component models have been introduced to the field only recently, by treating each node as a bag of out-going links We introduce an alternative, interaction component model for communities (ICMc), where the whole network is a bag of links, stemming from different components The former finds both disassortative and assortative structure, while the alternative assumes assortativity and finds community-like structures like the earlier methods motivated by physics With Dirichlet Process priors and an efficient implementation the models are highly scalable, as demonstrated with a social network from the Lastfm web site, with 670,000 nodes and 189 million links

...read moreread less

Journal Article•DOI•

A hierarchical Dirichlet process mixture model for haplotype reconstruction from multi-population data

[...]

Kyung-Ah Sohn, Eric P. Xing¹•Institutions (1)

Carnegie Mellon University¹

26 Dec 2008-arXiv: Machine Learning

TL;DR: A new haplotype inference program, Haploi, is presented, which makes use of individual ethnic information and is readily applicable to genotype sequences with thousands of SNPs from heterogeneous populations, with competent and sometimes superior speed and accuracy comparing to the state-of-the-art programs.

...read moreread less

Abstract: The perennial problem of "how many clusters?" remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and open-ended. This problem gets further complicated in a co-clustering scenario in which one needs to solve multiple clustering problems simultaneously because of the presence of common centroids (e.g., ancestors) shared by clusters (e.g., possible descents from a certain ancestor) from different multiple-cluster samples (e.g., different human subpopulations). In this paper we present a hierarchical nonparametric Bayesian model to address this problem in the context of multi-population haplotype inference. Uncovering the haplotypes of single nucleotide polymorphisms is essential for many biological and medical applications. While it is uncommon for the genotype data to be pooled from multiple ethnically distinct populations, few existing programs have explicitly leveraged the individual ethnic information for haplotype inference. In this paper we present a new haplotype inference program, Haploi, which makes use of such information and is readily applicable to genotype sequences with thousands of SNPs from heterogeneous populations, with competent and sometimes superior speed and accuracy comparing to the state-of-the-art programs. Underlying Haploi is a new haplotype distribution model based on a nonparametric Bayesian formalism known as the hierarchical Dirichlet process, which represents a tractable surrogate to the coalescent process. The proposed model is exchangeable, unbounded, and capable of coupling demographic information of different populations.

...read moreread less

Proceedings Article•DOI•

Graph Mining with Variational Dirichlet Process Mixture Models

[...]

Koji Tsuda¹, M. J. Zaki•Institutions (1)

Max Planck Society¹

01 Apr 2008

TL;DR: A nonparametric Bayesian method for clustering graphs and selecting salient patterns at the same time, and Variational inference is adopted here, because sampling is not applicable due to extremely high dimensionality.

...read moreread less

Abstract: Graph data such as chemical compounds and XML documents are getting more common in many application domains. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraph patterns, the dimensionality gets too large for usual statistical methods. We propose a nonparametric Bayesian method for clustering graphs and selecting salient patterns at the same time. Variational inference is adopted here, because sampling is not applicable due to extremely high dimensionality. The feature set minimizing the free energy is efficiently collected with the DFS code tree, where the generation of useless subgraphs is suppressed by a tree pruning condition. In experiments, our method is compared with a simpler approach based on frequent subgraph mining, and graph kernels.

...read moreread less

Dirichlet Process Mixture Models for Verb Clustering

[...]

Andreas Vlachos¹, Zoubin Ghahramani¹, Anna Korhonen¹•Institutions (1)

University of Cambridge¹

16 Sep 2008

TL;DR: A method to add human supervision to the Dirichlet Process Mixture Models in order to to influence the solution with respect to some prior knowledge to highlight the benefits of the chosen method compared to previously used clustering approaches.

...read moreread less

Abstract: In this work we apply Dirichlet Process Mixture Models to a learning task in natural language processing (NLP): lexical-semantic verb clustering. We assess the performance on a dataset based on Levin’s (1993) verb classes using the recently introduced Vmeasure metric. In, we present a method to add human supervision to the model in order to to influence the solution with respect to some prior knowledge. The quantitative evaluation performed highlights the benefits of the chosen method compared to previously used clustering approaches.

...read moreread less

A new distribution on the simplex containing the Dirichlet family

[...]

Andrea Ongaro¹, Sonia Migliorati, G Monti•Institutions (1)

University of Milano-Bicocca¹

29 May 2008

TL;DR: The Flexible Dirichlet model as discussed by the authors is a special case of the simplex distributions that allows to model the means and (part of) the variance-covariance matrix separately.

...read moreread less

Abstract: The Dirichlet family owes its privileged status within simplex distributions to easyness of interpretation and good mathematical properties. In particular, we recall fundamental properties for the analysis of compositional data such as closure under amalgamation and subcomposition. From a probabilistic point of view, it is characterised (uniquely) by a variety of independence relationships which makes it indisputably the reference model for expressing the non trivial idea of substantial independence for compositions. Indeed, its well known inadequacy as a general model for compositional data stems from such an independence structure together with the poorness of its parametrisation. In this paper a new class of distributions (called Flexible Dirichlet) capable of handling various dependence structures and containing the Dirichlet as a special case is presented. The new model exhibits a considerably richer parametrisation which, for example, allows to model the means and (part of) the variance-covariance matrix separately. Moreover, such a model preserves some good mathematical properties of the Dirichlet, i.e. closure under amalgamation and subcomposition with new parameters simply related to the parent composition parameters. Furthermore, the joint and conditional distributions of subcompositions and relative totals can be expressed as simple mixtures of two Flexible Dirichlet distributions. The basis generating the Flexible Dirichlet, though keeping compositional invariance, shows a dependence structure which allows various forms of partitional dependence to be contemplated by the model (e.g. non-neutrality, subcompositional dependence and subcompositional non-invariance), independence cases being identified by suitable parameter configurations. In particular, within this model substantial independence among subsets of components of the composition naturally occurs when the subsets have a Dirichlet distribution

...read moreread less

Proceedings Article•

Unsupervised Language Model Adaptation Based on Topic and Role Information in Multiparty Meetings

[...]

Songfang Huang, Steve Renals¹•Institutions (1)

University of Edinburgh¹

01 Jan 2008

TL;DR: Experiments on a large meeting corpus of more than 70 hours speech data show consistent and significant improvements in terms of word error rate for language model adaptation based on the topic and role information.

...read moreread less

Abstract: We continue our previous work on the modeling of topic and role information from multiparty meetings using a hierarchical Dirichlet process (HDP), in the context of language model adaptation. In this paper we focus on three problems: 1) an empirical analysis of the HDP as a nonparametric topic model; 2) the mismatch problem of vocabularies of the baseline n-gram model and the HDP; and 3) an automatic speech recognition experiment to further verify the effectiveness of our adaptation framework. Experiments on a large meeting corpus of more than 70 hours speech data show consistent and significant improvements in terms of word error rate for language model adaptation based on the topic and role information.

...read moreread less

Proceedings Article•DOI•

Trajectory-Based Video Retrieval Using Dirichlet Process Mixture Models

[...]

Xi Li, Weiming Hu¹, Zhongfei Zhang, Xiaoqin Zhang¹, Guan Luo¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

01 Jan 2008

TL;DR: A trajectory-based video retrieval framework using Dirichlet process mixture models that has a nice scalability and adaptability in the sense that when new cluster data are presented, the framework automatically identifies the new cluster information without having to redo the training.

...read moreread less

Abstract: In this paper, we present a trajectory-based video retrieval framework using Dirichlet process mixture models The main contribution of this framework is four-fold (1) We apply a Dirichlet process mixture model (DPMM) to unsupervised trajectory learning DPMM is a countably infinite mixture model with its components growing by itself (2) We employ a time-sensitive Dirichlet process mixture model (tDPMM) to learn trajectories’ time-series characteristics Furthermore, a novel likelihood estimation algorithm for tDPMM is proposed for the first time (3) We develop a tDPMM-based probabilistic model matching scheme, which is empirically shown to be more error-tolerating and is able to deliver higher retrieval accuracy than the peer methods in the literature (4) The framework has a nice scalability and adaptability in the sense that when new cluster data are presented, the framework automatically identifies the new cluster information without having to redo the training Theoretic analysis and experimental evaluations against the state-of-the-art methods demonstrate the promise and effectiveness of the framework

...read moreread less

Book Chapter•DOI•

Modeling Topic and Role Information in Meetings Using the Hierarchical Dirichlet Process

[...]

Songfang Huang¹, Steve Renals¹•Institutions (1)

University of Edinburgh¹

08 Sep 2008

TL;DR: This paper presents a modeling framework for topic and role on the AMI Meeting Corpus, and illustrates the effectiveness of the approach in the context of adapting a baseline language model in a large-vocabulary automatic speech recognition system for multiparty meetings.

...read moreread less

Abstract: In this paper, we address the modeling of topic and role information in multiparty meetings, via a nonparametric Bayesian model called the hierarchical Dirichlet process. This model provides a powerful solution to topic modeling and a flexible framework for the incorporation of other cues such as speaker role information. We present our modeling framework for topic and role on the AMI Meeting Corpus, and illustrate the effectiveness of the approach in the context of adapting a baseline language model in a large-vocabulary automatic speech recognition system for multiparty meetings. The adapted LM produces significant improvements in terms of both perplexity and word error rate.

...read moreread less

Proceedings Article•DOI•

Using latent Dirichlet allocation to incorporate domain knowledge for topic transition detection.

[...]

Xiaodan Zhu¹, Xuming He¹, Cosmin Munteanu¹, Gerald Penn¹•Institutions (1)

University of Toronto¹

22 Sep 2008

TL;DR: This paper incorporates relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words, and trains latent Dirichlet allocation models on these materials and measures the similarity between slides and transcripts in the acquired hidden-topic space.

...read moreread less

Abstract: This paper studies automatic detection of topic transitions for recorded presentations. This can be achieved by matching slide content with presentation transcripts directly with some similarity metrics. Such literal matching, however, misses domain-specific knowledge and is sensitive to speech recognition errors. In this paper, we incorporate relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words. To this end, we train latent Dirichlet allocation (LDA) models on these materials and measure the similarity between slides and transcripts in the acquired hidden-topic space. This similarity is then combined with literal matchings. Experiments show that the proposed approach reduces the errors in slide transition detection by 17-41% on manual transcripts and 27-37% on automatic transcripts.

...read moreread less

Proceedings Article•

Unsupervised Learning for Natural Language Processing.

[...]

Dan Klein¹•Institutions (1)

University of California, Berkeley¹

01 Jan 2008

TL;DR: Two latent variable systems which achieve state-of-the-art results in domains previously dominated by fully supervised systems and a grammar induction technique which begins with coarse syntactic structures and iteratively refines them in an unsupervised fashion are presented.

...read moreread less

Abstract: Given the abundance of text data, unsupervised approaches are very appealing for natural language processing. We present three latent variable systems which achieve state-of-the-art results in domains previously dominated by fully supervised systems. For syntactic parsing, we describe a grammar induction technique which begins with coarse syntactic structures and iteratively refines them in an unsupervised fashion. The resulting coarse-to-fine grammars admit efficient coarse-to-fine inference schemes and have produced the best parsing results in a variety of languages. For coreference resolution, we describe a discourse model in which entities are shared across documents using a hierarchical Dirichlet process. In each document, entities are repeatedly rendered into mention strings by a sequential model of attentional state and anaphoric constraint. Despite being fully unsupervised, this approach is competitive with the best supervised approaches. Finally, for machine translation, we present a model which learns translation lexicons from non-parallel corpora. Alignments between word types are modeled by a prior over matchings. Given any fixed alignment, a joint density over word vectors derives from probabilistic canonical correlation analysis. This approach is capable of discovering high-precision translations, even when the underlying corpora and languages are divergent.

...read moreread less

Posted Content•

Prior Distributions for Partitions in Bayesian Nonparametrics

[...]

Lee H. Dicker, Shane T. Jensen

03 Jan 2008

TL;DR: This article investigated the predictive probabilities that underlie the Dirichlet and Pitman-Yor processes and the implicit "rich-get-richer" characteristic of the resulting partitions.

...read moreread less

Abstract: Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering -- the uniform process -- for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.

...read moreread less

Proceedings Article•DOI•

Knowledge discovery of semantic relationships between words using nonparametric bayesian graph model

[...]

Issei Sato¹, Minoru Yoshida¹, Hiroshi Nakagawa¹•Institutions (1)

University of Tokyo¹

24 Aug 2008

TL;DR: Evaluation of the model based on nonparametric Bayesian modeling for automatic discovery of semantic relationships between words taken from a corpus demonstrated that it outperforms other baseline models.

...read moreread less

Abstract: We developed a model based on nonparametric Bayesian modeling for automatic discovery of semantic relationships between words taken from a corpus. It is aimed at discovering semantic knowledge about words in particular domains, which has become increasingly important with the growing use of text mining, information retrieval, and speech recognition. The subject-predicate structure is taken as a syntactic structure with the noun as the subject and the verb as the predicate. This structure is regarded as a graph structure. The generation of this graph can be modeled using the hierarchical Dirichlet process and the Pitman-Yor process. The probabilistic generative model we developed for this graph structure consists of subject-predicate structures extracted from a corpus. Evaluation of this model by measuring the performance of graph clustering based on WordNet similarities demonstrated that it outperforms other baseline models.

...read moreread less

Proceedings Article•

Character categorization via latent Dirichlet allocation for Kana sequence segmentation with conditional random fields

[...]

Tomonari Masada¹•Institutions (1)

Nagasaki University¹

01 Jan 2008

TL;DR: This work assigns categories to Kana characters via latent Dirichlet allocation and uses the categories to compose additional features for conditional random fields (CRF) and compares the categories the method gives and those manually prepared by their efficiency in Kana sequence segmentation.

...read moreread less

Abstract: We propose an efficient Kana sequence segmentation as a component of faster and easier interfaces for e-learning systems We assign categories to Kana characters via latent Dirichlet allocation (LDA) and use the categories to compose additional features for conditional random fields (CRF) We compare the categories our method gives and those manually prepared by their efficiency in Kana sequence segmentation

...read moreread less

Posted Content•

The neutral population model and Bayesian non-parametrics

[...]

Stefano Favaro, Matteo Luca Ruggiero, Dario Spanò, Stephen G. Walker

01 Jan 2008-Research Papers in Economics

TL;DR: In this article, the atomic structure of the neutral diffusion model is studied and a finite dimensional particle process from the time-dependent random measure is derived. But the model is not suitable for large numbers of individuals.

...read moreread less

Abstract: Fleming-Viot processes are a wide class of probability-measure-valued diffusions which often arise as large population limits of so-called particle processes. Here we invert the procedure and show that a countable population process can be derived directly from the neutral diffusion model, with no arbitrary assumptions. We study the atomic structure of the neutral diffusion model, and elicit a finite dimensional particle process from the time-dependent random measure, for any chosen population size. The static properties are consequences of the fact that its stationary distribution is the Dirichlet process, and rely on a new representation for it. The dynamics are derived directly from the transition function of the neutral diffusion model.

...read moreread less

Book Chapter•DOI•

Unsupervised Text Learning Based on Context Mixture Model with Dirichlet Prior

[...]

Dongling Chen¹, Daling Wang², Ge Yu²•Institutions (2)

Shenyang University¹, Northeastern University (China)²

05 Nov 2008

TL;DR: A bayesian mixture model is proposed, in which introduce a context variable, which has Dirichlet prior, in a bayesian framework to model text multiple topics and then clustering, which is a novel unsupervised text learning algorithm to cluster large-scale web data.

...read moreread less

Abstract: In this paper, we proposed a bayesian mixture model, in which introduce a context variable, which has Dirichlet prior, in a bayesian framework to model text multiple topics and then clustering. It is a novel unsupervised text learning algorithm to cluster large-scale web data. In addition, parameters estimation we adopt Maximum Likelihood (ML) and EM algorithm to estimate the model parameters, and employed BIC principle to determine the number of clusters. Experimental results show that method we proposed distinctly outperformed baseline algorithms.

...read moreread less

Proceedings Article•

Dirichlet process mixture models for finding shared structure between two related data sets

[...]

Gayle Leen¹, Colin Fyfe¹•Institutions (1)

University of the West of Scotland¹

20 Feb 2008

TL;DR: A nonparametric Bayesian approach is used for the problem of learning from two related data sets using a Dirichlet process mixture model of probabilistic canonical correlation analysers to allow the flexibility of the mappings from shared feature to data spaces to be automatically determined from the data.

...read moreread less

Abstract: A nonparametric Bayesian approach is used for the problem of learning from two related data sets. We model the shared structure between two data sets using a Dirichlet process mixture model of probabilistic canonical correlation analysers, which allows the flexibility of the mappings from shared feature to data spaces to be automatically determined from the data.

...read moreread less