scispace - formally typeset
Search or ask a question

Showing papers on "Hierarchical Dirichlet process published in 2008"


Journal ArticleDOI
TL;DR: In this paper, a Markov chain Monte Carlo (MCMCMC) algorithm is proposed to sample from the exact posterior distribution of quantities of interest, which can be used to approximate the Dirichlet process.
Abstract: Inference for Dirichlet process hierarchical models is typically performed using Markov chain Monte Carlo methods, which can be roughly categorized into marginal and conditional methods. The former integrate out analytically the infinite-dimensional component of the hierarchical model and sample from the marginal distribution of the remaining variables using the Gibbs sampler. Conditional methods impute the Dirichlet process and update it as a component of the Gibbs sampler. Since this requires imputation of an infinite-dimensional process, implementation of the conditional method has relied on finite approximations. In this paper, we show how to avoid such approximations by designing two novel Markov chain Monte Carlo algorithms which sample from the exact posterior distribution of quantities of interest. The approximations are avoided by the new technique of retrospective sampling. We also show how the algorithms can obtain samples from functionals of the Dirichlet process. The marginal and the conditional methods are compared and a careful simulation study is included, which involves a non-conjugate model, different datasets and prior specifications.

406 citations


Journal ArticleDOI
TL;DR: In this article, the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered is addressed, and an efficient Markov chain Monte Carlo algorithm is developed for computation.
Abstract: In multicenter studies, subjects in different centers may have different outcome distributions. This article is motivated by the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered. Starting with a stick-breaking representation of the Dirichlet process (DP), we replace the random atoms with random probability measures drawn from a DP. This results in a nested DP prior, which can be placed on the collection of distributions for the different centers, with centers drawn from the same DP component automatically clustered together. Theoretical properties are discussed, and an efficient Markov chain Monte Carlo algorithm is developed for computation. The methods are illustrated using a simulation study and an application to quality of care in U.S. hospitals.

320 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: A sampling algorithm is developed that employs a truncated approximation of the DP to jointly resample the full state sequence, greatly improving mixing rates and demonstrating the advantages of the sticky extension, and the utility of the HDP-HMM in real-world applications.
Abstract: The hierarchical Dirichlet process hidden Markov model (HDP-HMM) is a flexible, nonparametric model which allows state spaces of unknown size to be learned from data We demonstrate some limitations of the original HDP-HMM formulation (Teh et al, 2006), and propose a sticky extension which allows more robust learning of smoothly varying dynamics Using DP mixtures, this formulation also allows learning of more complex, multimodal emission distributions We further develop a sampling algorithm that employs a truncated approximation of the DP to jointly resample the full state sequence, greatly improving mixing rates Via extensive experiments with synthetic data and the NIST speaker diarization database, we demonstrate the advantages of our sticky extension, and the utility of the HDP-HMM in real-world applications

313 citations


Proceedings Article
08 Dec 2008
TL;DR: This work develops a sampling algorithm that combines a truncated approximation to the Dirichlet process with efficient joint sampling of the mode and state sequences in an unknown number of persistent, smooth dynamical modes.
Abstract: Many nonlinear dynamical phenomena can be effectively modeled by a system that switches among a set of conditionally linear dynamical modes. We consider two such models: the switching linear dynamical system (SLDS) and the switching vector autoregressive (VAR) process. Our nonparametric Bayesian approach utilizes a hierarchical Dirichlet process prior to learn an unknown number of persistent, smooth dynamical modes. We develop a sampling algorithm that combines a truncated approximation to the Dirichlet process with efficient joint sampling of the mode and state sequences. The utility and flexibility of our model are demonstrated on synthetic data, sequences of dancing honey bees, and the IBOVESPA stock index.

221 citations


Journal ArticleDOI
TL;DR: This work develops hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them and proposes nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene.
Abstract: We develop hierarchical, probabilistic models for objects, the parts composing them, and the visual scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves detection accuracy when learning from few examples. Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. The resulting transformed Dirichlet process (TDP) leads to Monte Carlo algorithms which simultaneously segment and recognize objects in street and office scenes.

195 citations


Proceedings ArticleDOI
05 Jul 2008
TL;DR: The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets, and a relatively simple Markov Chain Monte Carlo sampler is developed.
Abstract: The dynamic hierarchical Dirichlet process (dHDP) is developed to model the time-evolving statistical properties of sequential data sets. The data collected at any time point are represented via a mixture associated with an appropriate underlying model, in the framework of HDP. The statistical properties of data collected at consecutive time points are linked via a random parameter that controls their probabilistic similarity. The sharing mechanisms of the time-evolving data are derived, and a relatively simple Markov Chain Monte Carlo sampler is developed. Experimental results are presented to demonstrate the model.

126 citations


Journal ArticleDOI
TL;DR: This paper proposes a model called the multinomial generalized Dirichlet distribution (MGDD) that is the composition of the generalized Diriclet distribution and theMultinomial, in the same way that the MDD is the compositions of the Dirichlets and themultinomial.
Abstract: In this paper, we examine the problem of count data clustering. We analyze this problem using finite mixtures of distributions. The multinomial distribution and the multinomial Dirichlet distribution (MDD) are widely accepted to model count data. We show that these two distributions cannot be the best choice in all the applications, and we propose another model called the multinomial generalized Dirichlet distribution (MGDD) that is the composition of the generalized Dirichlet distribution and the multinomial, in the same way that the MDD is the composition of the Dirichlet and the multinomial. The estimation of the parameters and the determination of the number of components in our model are based on the deterministic annealing expectation-maximization (DAEM) approach and the minimum description length (MDL) criterion, respectively. We compare our method to standard approaches such as multinomial and multinomial Dirichlet mixtures to show its merits. The comparison involves different applications such as spatial color image databases indexing, handwritten digit recognition, and text document clustering.

97 citations


Proceedings Article
01 Jan 2008
TL;DR: A method for discovering the latent structure in MFCC feature data using the Hierarchical Dirichlet Process (HDP) and compute timbral similarity between recorded songs, which is faster than previous approaches that compare single Gaussian distributions directly.
Abstract: We develop a method for discovering the latent structure in MFCC feature data using the Hierarchical Dirichlet Process (HDP) Based on this structure, we compute timbral similarity between recorded songs The HDP is a nonparametric Bayesian model Like the Gaussian Mixture Model (GMM), it represents each song as a mixture of some number of multivariate Gaussian distributions However, the number of mixture components is not fixed in the HDP, but is determined as part of the posterior inference process Moreover, in the HDP the same set of Gaussians is used to model all songs, with only the mixture weights varying from song to song We compute the similarity of songs based on these weights, which is faster than previous approaches that compare single Gaussian distributions directly Experimental results on a genre-based retrieval task illustrate that our HDPbased method is both faster and produces better retrieval quality than such previous approaches

74 citations


Journal ArticleDOI
TL;DR: This work addresses the problem of unusual-event detection in a video sequence using an infinite hidden Markov model (iHMM), which is trained using ldquonormalrdquo/ldquotypical thirdquo video and evaluation of posterior distributions is achieved via Markov chain Monte Carlo and using a variational Bayes formulation.
Abstract: We address the problem of unusual-event detection in a video sequence. Invariant subspace analysis (ISA) is used to extract features from the video, and the time-evolving properties of these features are modeled via an infinite hidden Markov model (iHMM), which is trained using ldquonormalrdquo/ldquotypicalrdquo video. The iHMM retains a full posterior density function on all model parameters, including the number of underlying HMM states. Anomalies (unusual events) are detected subsequently if a low likelihood is observed when associated sequential features are submitted to the trained iHMM. A hierarchical Dirichlet process framework is employed in the formulation of the iHMM. The evaluation of posterior distributions for the iHMM is achieved in two ways: via Markov chain Monte Carlo and using a variational Bayes formulation. Comparisons are made to modeling based on conventional maximum-likelihood-based HMMs, as well as to Dirichlet-process-based Gaussian-mixture models.

72 citations


Proceedings ArticleDOI
15 Dec 2008
TL;DR: The HDP-HTM model substantially advances the literature on evolutionary clustering in the sense that not only it performs better than the existing literature, but more importantly it is capable of automatically learning the cluster numbers and structures and at the same time explicitly addresses the correspondence issue during the evolution.
Abstract: This paper studies evolutionary clustering, which is a recently hot topic with many important applications, noticeably in social network analysis. In this paper, based on the recent literature on Hierarchical Dirichlet Process (HDP) and Hidden Markov Model (HMM), we have developed a statistical model HDP-HTM that combines HDP with a Hierarchical Transition Matrix (HTM) based on the proposed Infinite Hierarchical Hidden Markov State model (iH2MS) as an effective solution to this problem. The HDP-HTM model substantially advances the literature on evolutionary clustering in the sense that not only it performs better than the existing literature, but more importantly it is capable of automatically learning the cluster numbers and structures and at the same time explicitly addresses the correspondence issue during the evolution. Extensive evaluations have demonstrated the effectiveness and promise of this solution against the state-of-the-art literature.

64 citations


Proceedings ArticleDOI
24 Aug 2008
TL;DR: A nonparametric Bayesian model which provides a generalization for multi-model latent Dirichlet allocation model (MoM-LDA) used for similar problems in the past and performs just as well as or better than the MoM- LDA model (regardless of the choice of the number of clusters) for predicting labels of objects in images containing multiple objects.
Abstract: Many applications call for learning to label individual objects in an image where the only information available to the learner is a dataset of images with their associated captions, i.e., words that describe the image content without specifically labeling the individual objects. We address this problem using a multi-modal hierarchical Dirichlet process model (MoM-HDP) - a nonparametric Bayesian model which provides a generalization for multi-model latent Dirichlet allocation model (MoM-LDA) used for similar problems in the past. We apply this model for predicting labels of objects in images containing multiple objects. During training, the model has access to an un-segmented image and its caption, but not the labels for each object in the image. The trained model is used to predict the label for each region of interest in a segmented image. MoM-HDP generalizes a multi-modal latent Dirichlet allocation model in that it allows the number of components of the mixture model to adapt to the data. The model parameters are efficiently estimated using variational inference. Our experiments show that MoM-HDP performs just as well as or better than the MoM-LDA model (regardless the choice of the number of clusters in the MoM-LDA model).

Proceedings ArticleDOI
21 Nov 2008
TL;DR: A learning approach based on both Dirichlet process andDirichlet distribution which provide flexible nonparametric Bayesian framework for non-Gaussian data clustering and relies on the estimation of the posterior distribution of clusterings using Gibbs sampler.
Abstract: A significant problem in clustering is the determination of the number of classes which best describes the data. This paper proposes a learning approach based on both Dirichlet process and Dirichlet distribution which provide flexible nonparametric Bayesian framework for non-Gaussian data clustering. Our approach is Bayesian and relies on the estimation of the posterior distribution of clusterings using Gibbs sampler. The experimental results involve data classification and image models prediction, and show the merits of our approach.

Journal ArticleDOI
TL;DR: The analytical definitions of the Chernoff, Bhattacharyya and Jeffreys-Matusita probabilistic distances between two Dirichlet distributions and two Beta distributions are given and their inappropriateness is shown in the analytical case.

Proceedings ArticleDOI
01 Dec 2008
TL;DR: A new latent Dirichlet language model (LDLM) is presented for modeling of word sequence by merging theDirichlet priors to characterize the uncertainty of latent topics of n-gram events and a new Bayesian framework is introduced.
Abstract: Latent Dirichlet allocation (LDA) has been successfully presented for document modeling and classification. LDA calculates the document probability based on bag-of-words scheme without considering the sequence of words. This model discovers the topic structure at document level, which is different from the concern of word prediction in speech recognition. In this paper, we present a new latent Dirichlet language model (LDLM) for modeling of word sequence. A new Bayesian framework is introduced by merging the Dirichlet priors to characterize the uncertainty of latent topics of n-gram events. The robust topic-based language model is established accordingly. In the experiments, we implement LDLM for continuous speech recognition and obtain better performance than probabilistic latent semantic analysis (PLSA) based language method.

Posted Content
TL;DR: This work introduces an alternative, interaction component model for communities (ICMc), where the whole network is a bag of links, stemming from different components, which assumes assortativity and finds community-like structures like the earlier methods motivated by physics.
Abstract: Being among the easiest ways to find meaningful structure from discrete data, Latent Dirichlet Allocation (LDA) and related component models have been applied widely They are simple, computationally fast and scalable, interpretable, and admit nonparametric priors In the currently popular field of network modeling, relatively little work has taken uncertainty of data seriously in the Bayesian sense, and component models have been introduced to the field only recently, by treating each node as a bag of out-going links We introduce an alternative, interaction component model for communities (ICMc), where the whole network is a bag of links, stemming from different components The former finds both disassortative and assortative structure, while the alternative assumes assortativity and finds community-like structures like the earlier methods motivated by physics With Dirichlet Process priors and an efficient implementation the models are highly scalable, as demonstrated with a social network from the Lastfm web site, with 670,000 nodes and 189 million links

Journal ArticleDOI
TL;DR: A new haplotype inference program, Haploi, is presented, which makes use of individual ethnic information and is readily applicable to genotype sequences with thousands of SNPs from heterogeneous populations, with competent and sometimes superior speed and accuracy comparing to the state-of-the-art programs.
Abstract: The perennial problem of "how many clusters?" remains an issue of substantial interest in data mining and machine learning communities, and becomes particularly salient in large data sets such as populational genomic data where the number of clusters needs to be relatively large and open-ended. This problem gets further complicated in a co-clustering scenario in which one needs to solve multiple clustering problems simultaneously because of the presence of common centroids (e.g., ancestors) shared by clusters (e.g., possible descents from a certain ancestor) from different multiple-cluster samples (e.g., different human subpopulations). In this paper we present a hierarchical nonparametric Bayesian model to address this problem in the context of multi-population haplotype inference. Uncovering the haplotypes of single nucleotide polymorphisms is essential for many biological and medical applications. While it is uncommon for the genotype data to be pooled from multiple ethnically distinct populations, few existing programs have explicitly leveraged the individual ethnic information for haplotype inference. In this paper we present a new haplotype inference program, Haploi, which makes use of such information and is readily applicable to genotype sequences with thousands of SNPs from heterogeneous populations, with competent and sometimes superior speed and accuracy comparing to the state-of-the-art programs. Underlying Haploi is a new haplotype distribution model based on a nonparametric Bayesian formalism known as the hierarchical Dirichlet process, which represents a tractable surrogate to the coalescent process. The proposed model is exchangeable, unbounded, and capable of coupling demographic information of different populations.

Proceedings ArticleDOI
01 Apr 2008
TL;DR: A nonparametric Bayesian method for clustering graphs and selecting salient patterns at the same time, and Variational inference is adopted here, because sampling is not applicable due to extremely high dimensionality.
Abstract: Graph data such as chemical compounds and XML documents are getting more common in many application domains. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraph patterns, the dimensionality gets too large for usual statistical methods. We propose a nonparametric Bayesian method for clustering graphs and selecting salient patterns at the same time. Variational inference is adopted here, because sampling is not applicable due to extremely high dimensionality. The feature set minimizing the free energy is efficiently collected with the DFS code tree, where the generation of useless subgraphs is suppressed by a tree pruning condition. In experiments, our method is compared with a simpler approach based on frequent subgraph mining, and graph kernels.

16 Sep 2008
TL;DR: A method to add human supervision to the Dirichlet Process Mixture Models in order to to influence the solution with respect to some prior knowledge to highlight the benefits of the chosen method compared to previously used clustering approaches.
Abstract: In this work we apply Dirichlet Process Mixture Models to a learning task in natural language processing (NLP): lexical-semantic verb clustering. We assess the performance on a dataset based on Levin’s (1993) verb classes using the recently introduced Vmeasure metric. In, we present a method to add human supervision to the model in order to to influence the solution with respect to some prior knowledge. The quantitative evaluation performed highlights the benefits of the chosen method compared to previously used clustering approaches.

29 May 2008
TL;DR: The Flexible Dirichlet model as discussed by the authors is a special case of the simplex distributions that allows to model the means and (part of) the variance-covariance matrix separately.
Abstract: The Dirichlet family owes its privileged status within simplex distributions to easyness of interpretation and good mathematical properties. In particular, we recall fundamental properties for the analysis of compositional data such as closure under amalgamation and subcomposition. From a probabilistic point of view, it is characterised (uniquely) by a variety of independence relationships which makes it indisputably the reference model for expressing the non trivial idea of substantial independence for compositions. Indeed, its well known inadequacy as a general model for compositional data stems from such an independence structure together with the poorness of its parametrisation. In this paper a new class of distributions (called Flexible Dirichlet) capable of handling various dependence structures and containing the Dirichlet as a special case is presented. The new model exhibits a considerably richer parametrisation which, for example, allows to model the means and (part of) the variance-covariance matrix separately. Moreover, such a model preserves some good mathematical properties of the Dirichlet, i.e. closure under amalgamation and subcomposition with new parameters simply related to the parent composition parameters. Furthermore, the joint and conditional distributions of subcompositions and relative totals can be expressed as simple mixtures of two Flexible Dirichlet distributions. The basis generating the Flexible Dirichlet, though keeping compositional invariance, shows a dependence structure which allows various forms of partitional dependence to be contemplated by the model (e.g. non-neutrality, subcompositional dependence and subcompositional non-invariance), independence cases being identified by suitable parameter configurations. In particular, within this model substantial independence among subsets of components of the composition naturally occurs when the subsets have a Dirichlet distribution

Proceedings Article
01 Jan 2008
TL;DR: Experiments on a large meeting corpus of more than 70 hours speech data show consistent and significant improvements in terms of word error rate for language model adaptation based on the topic and role information.
Abstract: We continue our previous work on the modeling of topic and role information from multiparty meetings using a hierarchical Dirichlet process (HDP), in the context of language model adaptation. In this paper we focus on three problems: 1) an empirical analysis of the HDP as a nonparametric topic model; 2) the mismatch problem of vocabularies of the baseline n-gram model and the HDP; and 3) an automatic speech recognition experiment to further verify the effectiveness of our adaptation framework. Experiments on a large meeting corpus of more than 70 hours speech data show consistent and significant improvements in terms of word error rate for language model adaptation based on the topic and role information.

Proceedings ArticleDOI
Xi Li, Weiming Hu1, Zhongfei Zhang, Xiaoqin Zhang1, Guan Luo1 
01 Jan 2008
TL;DR: A trajectory-based video retrieval framework using Dirichlet process mixture models that has a nice scalability and adaptability in the sense that when new cluster data are presented, the framework automatically identifies the new cluster information without having to redo the training.
Abstract: In this paper, we present a trajectory-based video retrieval framework using Dirichlet process mixture models The main contribution of this framework is four-fold (1) We apply a Dirichlet process mixture model (DPMM) to unsupervised trajectory learning DPMM is a countably infinite mixture model with its components growing by itself (2) We employ a time-sensitive Dirichlet process mixture model (tDPMM) to learn trajectories’ time-series characteristics Furthermore, a novel likelihood estimation algorithm for tDPMM is proposed for the first time (3) We develop a tDPMM-based probabilistic model matching scheme, which is empirically shown to be more error-tolerating and is able to deliver higher retrieval accuracy than the peer methods in the literature (4) The framework has a nice scalability and adaptability in the sense that when new cluster data are presented, the framework automatically identifies the new cluster information without having to redo the training Theoretic analysis and experimental evaluations against the state-of-the-art methods demonstrate the promise and effectiveness of the framework

Book ChapterDOI
08 Sep 2008
TL;DR: This paper presents a modeling framework for topic and role on the AMI Meeting Corpus, and illustrates the effectiveness of the approach in the context of adapting a baseline language model in a large-vocabulary automatic speech recognition system for multiparty meetings.
Abstract: In this paper, we address the modeling of topic and role information in multiparty meetings, via a nonparametric Bayesian model called the hierarchical Dirichlet process. This model provides a powerful solution to topic modeling and a flexible framework for the incorporation of other cues such as speaker role information. We present our modeling framework for topic and role on the AMI Meeting Corpus, and illustrate the effectiveness of the approach in the context of adapting a baseline language model in a large-vocabulary automatic speech recognition system for multiparty meetings. The adapted LM produces significant improvements in terms of both perplexity and word error rate.

Proceedings ArticleDOI
22 Sep 2008
TL;DR: This paper incorporates relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words, and trains latent Dirichlet allocation models on these materials and measures the similarity between slides and transcripts in the acquired hidden-topic space.
Abstract: This paper studies automatic detection of topic transitions for recorded presentations. This can be achieved by matching slide content with presentation transcripts directly with some similarity metrics. Such literal matching, however, misses domain-specific knowledge and is sensitive to speech recognition errors. In this paper, we incorporate relevant written materials, e.g., textbooks for lectures, which convey semantic relationships, in particular domain-specific relationships, between words. To this end, we train latent Dirichlet allocation (LDA) models on these materials and measure the similarity between slides and transcripts in the acquired hidden-topic space. This similarity is then combined with literal matchings. Experiments show that the proposed approach reduces the errors in slide transition detection by 17-41% on manual transcripts and 27-37% on automatic transcripts.

Proceedings Article
01 Jan 2008
TL;DR: Two latent variable systems which achieve state-of-the-art results in domains previously dominated by fully supervised systems and a grammar induction technique which begins with coarse syntactic structures and iteratively refines them in an unsupervised fashion are presented.
Abstract: Given the abundance of text data, unsupervised approaches are very appealing for natural language processing. We present three latent variable systems which achieve state-of-the-art results in domains previously dominated by fully supervised systems. For syntactic parsing, we describe a grammar induction technique which begins with coarse syntactic structures and iteratively refines them in an unsupervised fashion. The resulting coarse-to-fine grammars admit efficient coarse-to-fine inference schemes and have produced the best parsing results in a variety of languages. For coreference resolution, we describe a discourse model in which entities are shared across documents using a hierarchical Dirichlet process. In each document, entities are repeatedly rendered into mention strings by a sequential model of attentional state and anaphoric constraint. Despite being fully unsupervised, this approach is competitive with the best supervised approaches. Finally, for machine translation, we present a model which learns translation lexicons from non-parallel corpora. Alignments between word types are modeled by a prior over matchings. Given any fixed alignment, a joint density over word vectors derives from probabilistic canonical correlation analysis. This approach is capable of discovering high-precision translations, even when the underlying corpora and languages are divergent.

Posted Content
03 Jan 2008
TL;DR: This article investigated the predictive probabilities that underlie the Dirichlet and Pitman-Yor processes and the implicit "rich-get-richer" characteristic of the resulting partitions.
Abstract: Prior distributions play a crucial role in Bayesian approaches to clustering. Two commonly-used prior distributions are the Dirichlet and Pitman-Yor processes. In this paper, we investigate the predictive probabilities that underlie these processes, and the implicit "rich-get-richer" characteristic of the resulting partitions. We explore an alternative prior for nonparametric Bayesian clustering -- the uniform process -- for applications where the "rich-get-richer" property is undesirable. We also explore the cost of this process: partitions are no longer exchangeable with respect to the ordering of variables. We present new asymptotic and simulation-based results for the clustering characteristics of the uniform process and compare these with known results for the Dirichlet and Pitman-Yor processes. We compare performance on a real document clustering task, demonstrating the practical advantage of the uniform process despite its lack of exchangeability over orderings.

Proceedings ArticleDOI
24 Aug 2008
TL;DR: Evaluation of the model based on nonparametric Bayesian modeling for automatic discovery of semantic relationships between words taken from a corpus demonstrated that it outperforms other baseline models.
Abstract: We developed a model based on nonparametric Bayesian modeling for automatic discovery of semantic relationships between words taken from a corpus. It is aimed at discovering semantic knowledge about words in particular domains, which has become increasingly important with the growing use of text mining, information retrieval, and speech recognition. The subject-predicate structure is taken as a syntactic structure with the noun as the subject and the verb as the predicate. This structure is regarded as a graph structure. The generation of this graph can be modeled using the hierarchical Dirichlet process and the Pitman-Yor process. The probabilistic generative model we developed for this graph structure consists of subject-predicate structures extracted from a corpus. Evaluation of this model by measuring the performance of graph clustering based on WordNet similarities demonstrated that it outperforms other baseline models.

Proceedings Article
01 Jan 2008
TL;DR: This work assigns categories to Kana characters via latent Dirichlet allocation and uses the categories to compose additional features for conditional random fields (CRF) and compares the categories the method gives and those manually prepared by their efficiency in Kana sequence segmentation.
Abstract: We propose an efficient Kana sequence segmentation as a component of faster and easier interfaces for e-learning systems We assign categories to Kana characters via latent Dirichlet allocation (LDA) and use the categories to compose additional features for conditional random fields (CRF) We compare the categories our method gives and those manually prepared by their efficiency in Kana sequence segmentation

Posted Content
TL;DR: In this article, the atomic structure of the neutral diffusion model is studied and a finite dimensional particle process from the time-dependent random measure is derived. But the model is not suitable for large numbers of individuals.
Abstract: Fleming-Viot processes are a wide class of probability-measure-valued diffusions which often arise as large population limits of so-called particle processes. Here we invert the procedure and show that a countable population process can be derived directly from the neutral diffusion model, with no arbitrary assumptions. We study the atomic structure of the neutral diffusion model, and elicit a finite dimensional particle process from the time-dependent random measure, for any chosen population size. The static properties are consequences of the fact that its stationary distribution is the Dirichlet process, and rely on a new representation for it. The dynamics are derived directly from the transition function of the neutral diffusion model.

Book ChapterDOI
05 Nov 2008
TL;DR: A bayesian mixture model is proposed, in which introduce a context variable, which has Dirichlet prior, in a bayesian framework to model text multiple topics and then clustering, which is a novel unsupervised text learning algorithm to cluster large-scale web data.
Abstract: In this paper, we proposed a bayesian mixture model, in which introduce a context variable, which has Dirichlet prior, in a bayesian framework to model text multiple topics and then clustering. It is a novel unsupervised text learning algorithm to cluster large-scale web data. In addition, parameters estimation we adopt Maximum Likelihood (ML) and EM algorithm to estimate the model parameters, and employed BIC principle to determine the number of clusters. Experimental results show that method we proposed distinctly outperformed baseline algorithms.

Proceedings Article
20 Feb 2008
TL;DR: A nonparametric Bayesian approach is used for the problem of learning from two related data sets using a Dirichlet process mixture model of probabilistic canonical correlation analysers to allow the flexibility of the mappings from shared feature to data spaces to be automatically determined from the data.
Abstract: A nonparametric Bayesian approach is used for the problem of learning from two related data sets. We model the shared structure between two data sets using a Dirichlet process mixture model of probabilistic canonical correlation analysers, which allows the flexibility of the mappings from shared feature to data spaces to be automatically determined from the data.