Showing papers on "Hierarchical Dirichlet process published in 2007"

PDF

Open Access

Journal Article•DOI•

[...]

01 Jun 2007-The Annals of Applied Statistics

TL;DR: The correlated topic model (CTM) as mentioned in this paper uses the logistic normal distribution to model the topic proportions, which is a variant of the Dirichlet distribution used in LDA.

...read moreread less

Abstract: Topic models, such as latent Dirichlet allocation (LDA), can be useful tools for the statistical analysis of document collections and other discrete data. The LDA model assumes that the words of each document arise from a mixture of topics, each of which is a distribution over the vocabulary. A limitation of LDA is the inability to model topic correlation even though, for example, a document about genetics is more likely to also be about disease than X-ray astronomy. This limitation stems from the use of the Dirichlet distribution to model the variability among the topic proportions. In this paper we develop the correlated topic model (CTM), where the topic proportions exhibit correlation via the logistic normal distribution [J. Roy. Statist. Soc. Ser. B 44 (1982) 139–177]. We derive a fast variational inference algorithm for approximate posterior inference in this model, which is complicated by the fact that the logistic normal is not conjugate to the multinomial. We apply the CTM to the articles from Science published from 1990–1999, a data set that comprises 57M words. The CTM gives a better fit of the data than LDA, and we demonstrate its use as an exploratory tool of large document collections.

...read moreread less

1,053 citations

Journal Article•

Multi-Task Learning for Classification with Dirichlet Process Priors

[...]

Ya Xue¹, Xuejun Liao¹, Lawrence Carin¹, Balaji Krishnapuram²•Institutions (2)

Duke University¹, Siemens²

01 May 2007-Journal of Machine Learning Research

TL;DR: Experimental results on two real life MTL problems indicate that the proposed algorithms automatically identify subgroups of related tasks whose training data appear to be drawn from similar distributions are more accurate than simpler approaches such as single-task learning, pooling of data across all tasks, and simplified approximations to DP.

...read moreread less

Abstract: Consider the problem of learning logistic-regression models for multiple classification tasks, where the training data set for each task is not drawn from the same statistical distribution. In such a multi-task learning (MTL) scenario, it is necessary to identify groups of similar tasks that should be learned jointly. Relying on a Dirichlet process (DP) based statistical model to learn the extent of similarity between classification tasks, we develop computationally efficient algorithms for two different forms of the MTL problem. First, we consider a symmetric multi-task learning (SMTL) situation in which classifiers for multiple tasks are learned jointly using a variational Bayesian (VB) algorithm. Second, we consider an asymmetric multi-task learning (AMTL) formulation in which the posterior density function from the SMTL model parameters (from previous tasks) is used as a prior for a new task: this approach has the significant advantage of not requiring storage and use of all previous data from prior tasks. The AMTL formulation is solved with a simple Markov Chain Monte Carlo (MCMC) construction. Experimental results on two real life MTL problems indicate that the proposed algorithms: (a) automatically identify subgroups of related tasks whose training data appear to be drawn from similar distributions; and (b) are more accurate than simpler approaches such as single-task learning, pooling of data across all tasks, and simplified approximations to DP.

...read moreread less

582 citations

Journal Article•DOI•

Sampling the Dirichlet Mixture Model with Slices

[...]

Stephen G. Walker¹•Institutions (1)

University of Kent¹

29 May 2007-Communications in Statistics - Simulation and Computation

TL;DR: The key to the algorithm detailed in this article, which also keeps the random distribution functions, is the introduction of a latent variable which allows a finite number of objects to be sampled within each iteration of a Gibbs sampler.

...read moreread less

Abstract: We provide a new approach to the sampling of the well known mixture of Dirichlet process model. Recent attention has focused on retention of the random distribution function in the model, but sampling algorithms have then suffered from the countably infinite representation these distributions have. The key to the algorithm detailed in this article, which also keeps the random distribution functions, is the introduction of a latent variable which allows a finite number, which is known, of objects to be sampled within each iteration of a Gibbs sampler.

...read moreread less

482 citations

Proceedings Article•DOI•

Unsupervised Activity Perception by Hierarchical Bayesian Models

[...]

Xiaogang Wang¹, Xiaoxu Ma¹, Eric Grimson¹•Institutions (1)

Massachusetts Institute of Technology¹

17 Jun 2007

TL;DR: A novel unsupervised learning framework for activity perception to understand activities in complicated scenes from visual data using a hierarchical Bayesian model to connect three elements: low-level visual features, simple "atomic" activities, and multi-agent interactions.

...read moreread less

Abstract: We propose a novel unsupervised learning framework for activity perception. To understand activities in complicated scenes from visual data, we propose a hierarchical Bayesian model to connect three elements: low-level visual features, simple "atomic" activities, and multi-agent interactions. Atomic activities are modeled as distributions over low-level visual features, and interactions are modeled as distributions over atomic activities. Our models improve existing language models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP) by modeling interactions without supervision. Our data sets are challenging video sequences from crowded traffic scenes with many kinds of activities co-occurring. Our approach provides a summary of typical atomic activities and interactions in the scene. Unusual activities and interactions are found, with natural probabilistic explanations. Our method supports flexible high-level queries on activities and interactions using atomic activities as components.

...read moreread less

350 citations

Journal Article•DOI•

Controlling the reinforcement in Bayesian non-parametric mixture models

[...]

Antonio Lijoi¹, Ramsés H. Mena², Igor Prünster³•Institutions (3)

University of Pavia¹, National Autonomous University of Mexico², University of Turin³

01 Sep 2007-Journal of The Royal Statistical Society Series B-statistical Methodology

TL;DR: A Bayesian non‐parametric approach is taken and adopt a hierarchical model with a suitable non-parametric prior obtained from a generalized gamma process to solve the problem of determining the number of components in a mixture model.

...read moreread less

Abstract: Summary. The paper deals with the problem of determining the number of components in a mixture model. We take a Bayesian non-parametric approach and adopt a hierarchical model with a suitable non-parametric prior for the latent structure. A commonly used model for such a problem is the mixture of Dirichlet process model. Here, we replace the Dirichlet process with a more general non-parametric prior obtained from a generalized gamma process. The basic feature of this model is that it yields a partition structure for the latent variables which is of Gibbs type. This relates to the well-known (exchangeable) product partition models. If compared with the usual mixture of Dirichlet process model the advantage of the generalization that we are examining relies on the availability of an additional parameter σ belonging to the interval (0,1): it is shown that such a parameter greatly influences the clustering behaviour of the model. A value of σ that is close to 1 generates a large number of clusters, most of which are of small size. Then, a reinforcement mechanism which is driven by σ acts on the mass allocation by penalizing clusters of small size and favouring those few groups containing a large number of elements. These features turn out to be very useful in the context of mixture modelling. Since it is difficult to specify a priori the reinforcement rate, it is reasonable to specify a prior for σ. Hence, the strength of the reinforcement mechanism is controlled by the data.

...read moreread less

230 citations

Proceedings Article•

The Infinite PCFG Using Hierarchical Dirichlet Processes

[...]

Percy Liang, Slav Petrov, Michael I. Jordan, Dan Klein¹•Institutions (1)

University of California, Berkeley¹

01 Jun 2007

TL;DR: This work presents a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP) and develops an efficient variational inference procedure that can be applied to full-scale parsing applications.

...read moreread less

Abstract: We present a nonparametric Bayesian model of tree structures based on the hierarchical Dirichlet process (HDP). Our HDP-PCFG model allows the complexity of the grammar to grow as more training data is available. In addition to presenting a fully Bayesian model for the PCFG, we also develop an efficient variational inference procedure. On synthetic data, we recover the correct grammar without having to specify its complexity in advance. We also show that our techniques can be applied to full-scale parsing applications by demonstrating its effectiveness in learning state-split grammars.

...read moreread less

197 citations

Journal Article•DOI•

Generalized spatial dirichlet process models

[...]

Jason A. Duan¹, Michele Guindani², Alan E. Gelfand³•Institutions (3)

Yale University¹, University of New Mexico², Duke University³

01 Dec 2007-Biometrika

TL;DR: In this article, a generalized spatial Dirichlet process is proposed for point-referenced data, which allows different surface selection at different sites, and the marginal distribution of the effect at each site still comes from a Gaussian process.

...read moreread less

Abstract: SUMMARY Many models for the study of point-referenced data explicitly introduce spatial random effects to capture residual spatial association. These spatial effects are customarily modelled as a zeromean stationary Gaussian process. The spatial Dirichlet process introduced by Gelfand et al. (2005) produces a random spatial process which is neither Gaussian nor stationary. Rather, it varies about a process that is assumed to be stationary and Gaussian. The spatial Dirichlet process arises as a probability-weighted collection of random surfaces. This can be limiting for modelling and inferential purposes since it insists that a process realization must be one of these surfaces. We introduce a random distribution for the spatial effects that allows different surface selection at different sites. Moreover, we can specify the model so that the marginal distribution of the effect at each site still comes from a Dirichlet process. The development is offered constructively, providing a multivariate extension of the stick-breaking representation of the weights. We then introduce mixing using this generalized spatial Dirichlet process. We illustrate with a simulated dataset of independent replications and note that we can embed the generalized process within a dynamic model specification to eliminate the independence assumption.

...read moreread less

188 citations

Proceedings Article•

Collapsed Variational Inference for HDP

[...]

Yee Whye Teh¹, Kenichi Kurihara², Max Welling³•Institutions (3)

University College London¹, Tokyo Institute of Technology², University of California, Irvine³

03 Dec 2007

TL;DR: This work obtains the first variational algorithm to deal with the hierarchical Dirichlet process and with hyperparameters ofDirichlet variables, and shows a significant improvement in accuracy.

...read moreread less

Abstract: A wide variety of Dirichlet-multinomial 'topic' models have found interesting applications in recent years. While Gibbs sampling remains an important method of inference in such models, variational techniques have certain advantages such as easy assessment of convergence, easy optimization without the need to maintain detailed balance, a bound on the marginal likelihood, and side-stepping of issues with topic-identifiability. The most accurate variational technique thus far, namely collapsed variational latent Dirichlet allocation, did not deal with model selection nor did it include inference for hyperparameters. We address both issues by generalizing the technique, obtaining the first variational algorithm to deal with the hierarchical Dirichlet process and to deal with hyperparameters of Dirichlet variables. Experiments show a significant improvement in accuracy.

...read moreread less

178 citations

Journal Article•DOI•

High-Dimensional Unsupervised Selection and Estimation of a Finite Generalized Dirichlet Mixture Model Based on Minimum Message Length

[...]

Nizar Bouguila¹, Djemel Ziou²•Institutions (2)

Concordia University¹, Université de Sherbrooke²

01 Oct 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This work considers the application of the minimum message length (MML) principle to determine the number of clusters in a finite mixture model based on the generalized Dirichlet distribution.

...read moreread less

Abstract: We consider the problem of determining the structure of high-dimensional data without prior knowledge of the number of clusters. Data are represented by a finite mixture model based on the generalized Dirichlet distribution. The generalized Dirichlet distribution has a more general covariance structure than the Dirichlet distribution and offers high flexibility and ease of use for the approximation of both symmetric and asymmetric distributions. This makes the generalized Dirichlet distribution more practical and useful. An important problem in mixture modeling is the determination of the number of clusters. Indeed, a mixture with too many or too few components may not be appropriate to approximate the true model. Here, we consider the application of the minimum message length (MML) principle to determine the number of clusters. The MML is derived so as to choose the number of clusters in the mixture model that best describes the data. A comparison with other selection criteria is performed. The validation involves synthetic data, real data clustering, and two interesting real applications: classification of Web pages, and texture database summarization for efficient retrieval.

...read moreread less

156 citations

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation

[...]

Bernhard Schölkopf, John Platt, Thomas Hofmann

01 Jan 2007

TL;DR: In this article, collapsed variational Bayes and Gibbs sampling have been used for LDA, and showed that it is computationally efficient, easy to implement and significantly more accurate than standard variational bayesian inference.

...read moreread less

Abstract: Latent Dirichlet allocation (LDA) is a Bayesian network that has recently gained much popularity in applications ranging from document modeling to computer vision Due to the large scale nature of these applications, current inference procedures like variational Bayes and Gibbs sampling have been found lacking In this paper we propose the collapsed variational Bayesian inference algorithm for LDA, and show that it is computationally efficient, easy to implement and significantly more accurate than standard variational Bayesian inference for LDA

...read moreread less

127 citations

Proceedings Article•DOI•

Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes

[...]

Jyri Kivinen¹, Erik B. Sudderth², Michael I. Jordan²•Institutions (2)

Helsinki University of Technology¹, University of California, Berkeley²

26 Dec 2007

TL;DR: This work develops nonparametric Bayesian models for multiscale representations of images depicting natural scene categories that capture interesting qualitative structure in natural scenes, and more accurately categorize novel images than models which ignore spatial relationships among features.

...read moreread less

Abstract: We develop nonparametric Bayesian models for multiscale representations of images depicting natural scene categories. Individual features or wavelet coefficients are marginally described by Dirichlet process (DP) mixtures, yielding the heavy-tailed marginal distributions characteristic of natural images. Dependencies between features are then captured with a hidden Markov tree, and Markov chain Monte Carlo methods used to learn models whose latent state space grows in complexity as more images are observed. By truncating the potentially infinite set of hidden states, we are able to exploit efficient belief propagation methods when learning these hierarchical Dirichlet process hidden Markov trees (HDP-HMTs) from data. We show that our generative models capture interesting qualitative structure in natural scenes, and more accurately categorize novel images than models which ignore spatial relationships among features.

...read moreread less

Unifying Rational Models of Categorization via the Hierarchical Dirichlet Process - eScholarship

[...]

Thomas L. Griffiths, Kevin Robert Canini, Adam N. Sanborn, Daniel J. Navarro

01 Jan 2007

TL;DR: The authors show that existing rational models of categorization are spe- cial cases of a statistical model called the hierarchical Dirichlet process, which can be used to automatically infer a represen- tation of the appropriate complexity for a given category.

...read moreread less

Abstract: Unifying Rational Models of Categorization via the Hierarchical Dirichlet Process Thomas L. Griffiths (tom griffiths@berkeley.edu) Department of Psychology, University of California, Berkeley, Berkeley, CA 94720-1650 USA Kevin R. Canini (kevin@cs.berkeley.edu) Department of Computer Science, University of California, Berkeley, Berkeley, CA 94720-1776 USA Adam N. Sanborn (asanborn@indiana.edu) Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN 47405, USA Daniel J. Navarro (daniel.navarro@adelaide.edu.au) School of Psychology, University of Adelaide, Adelaide SA 5005, Australia gory learning leave a number of questions open. In particular, many categorization experiments have explored whether peo- ple represent categories with exemplars or prototypes. One desideratum for a rational account of category learning might be that it can indicate when a learner should choose to use one of these forms of representation over the other. The greater flexibility of nonparametric density estimation has motivated the claim that exemplar models are to be preferred as ratio- nal models of category learning (Nosofsky, 1998). However, nonparametric and parametric methods have different advan- tages and disadvantages: the greater flexibility of nonpara- metric methods comes at a cost of requiring more data to es- timate a distribution. The decision as to which representation scheme to use should be determined by the stimuli presented to the learner, and existing rational analyses do not indicate how this decision should be made (although a similar argu- ment is made by Briscoe & Feldman, 2006). Abstract Models of categorization make different representational as- sumptions, with categories being represented by prototypes, sets of exemplars, and everything in between. Rational mod- els of categorization justify these representational assumptions in terms of different schemes for estimating probability distri- butions. However, they do not answer the question of which scheme should be used in representing a given category. We show that existing rational models of categorization are spe- cial cases of a statistical model called the hierarchical Dirichlet process, which can be used to automatically infer a represen- tation of the appropriate complexity for a given category. Keywords: rational analysis, categorization, Dirichlet process Rational models of cognition aim to explain human be- havior as an optimal solution to the computational problems posed by our environment (Anderson, 1990). Examining these computational problems provides a deeper understand- ing of the assumptions behind successful models of human cognition, and can lead to new models. In this paper, we pursue a rational analysis of category learning: inferring the structure of categories from a set of stimuli labeled as be- longing to those categories. The knowledge acquired through this process can ultimately be used to make decisions about how to categorize new stimuli. Existing rational analyses of category learning (Anderson, 1990; Ashby & Alfonso-Reese, 1995; Rosseel, 2002) agree that the computational problem involved is one of density estimation: determining the proba- bility distributions over stimuli associated with different cat- egory labels. Viewing category learning as density estimation helps to clarify the assumptions behind the two main classes of psy- chological models: exemplar models and prototype models. Exemplar models assume that a category is represented by a set of stored exemplars, and categorization involves compar- ing new stimuli to the set of exemplars in each category (e.g., Medin & Schaffer, 1978; Nosofsky, 1986). Prototype models assume that a category is associated with a single prototype and categorization involves comparing new stimuli to these prototypes (e.g., Reed, 1972). These approaches to category learning correspond to different strategies for density estima- tion, being nonparametric and parametric density estimation respectively (Ashby & Alfonso-Reese, 1995). Despite providing insight into the assumptions behind models of categorization, existing rational analyses of cate- The question of how to represent categories is complicated by the fact that prototype and exemplar models are not the only options. A number of models have recently explored possibilities between these extremes, representing categories using clusters of several exemplars (Anderson, 1990; Van- paemel, Storms, & Ons, 2005; Rosseel, 2002; Love, Medin, & Gureckis, 2004). The range of representations possible in these models emphasizes the importance of being able to identify an appropriate representation for a category from the stimuli themselves: with more options for the representation of categories, it becomes more important to be able to say which option a learner should choose. Our goal in this paper is to build on previous rational analy- ses of category learning to provide not just a unifying frame- work which can be used to understand the assumptions be- hind existing models of categorization, but a unifying model of which these models are special cases. This model goes beyond previous unifying models of category learning (e.g., Rosseel, 2002; Vanpaemel et al., 2005) by providing a ratio- nal solution to the question of which representation should be chosen based purely on the structure of a category. These results are achieved by identifying connections between mod- els of human category learning and ideas from nonparametric Bayesian statistics. In particular, we show that all of the mod- els mentioned above can be viewed as variants of a stochastic

...read moreread less

Proceedings Article•

Nonparametric Bayes pachinko allocation

[...]

Wei Li¹, David M. Blei², Andrew McCallum¹•Institutions (2)

University of Massachusetts Amherst¹, Princeton University²

19 Jul 2007

TL;DR: This paper proposed a nonparametric Bayesian prior for PAM based on a variant of the hierarchical Dirichlet process (HDP), which can capture topic correlations defined by nested data structure, but it does not automatically discover such correlations from unstructured data.

...read moreread less

Abstract: Recent advances in topic models have explored complicated structured distributions to represent topic correlation. For example, the pachinko allocation model (PAM) captures arbitrary, nested, and possibly sparse correlations between topics using a directed acyclic graph (DAG). While PAM provides more flexibility and greater expressive power than previous models like latent Dirichlet allocation (LDA), it is also more difficult to determine the appropriate topic structure for a specific dataset. In this paper, we propose a nonparametric Bayesian prior for PAM based on a variant of the hierarchical Dirichlet process (HDP). Although the HDP can capture topic correlations defined by nested data structure, it does not automatically discover such correlations from unstructured data. By assuming an HDP-based prior for PAM, we are able to learn both the number of topics and how the topics are correlated. We evaluate our model on synthetic and real-world text datasets, and show that nonparametric PAM achieves performance matching the best of PAM without manually tuning the number of topics.

...read moreread less

Book•DOI•

Stochastic classiﬁcation models

[...]

Peter McCullagh, Jie Yang

15 May 2007

Proceedings Article•DOI•

Hierarchical Dirichlet processes for tracking maneuvering targets

[...]

Emily B. Fox¹, Erik B. Sudderth², Alan S. Willsky¹•Institutions (2)

Massachusetts Institute of Technology¹, University of California, Berkeley²

09 Jul 2007

TL;DR: A learning algorithm and computational results are presented that demonstrate the utility of the HDP for tracking, and show that it efficiently learns typical dynamics from noisy data.

...read moreread less

Abstract: We consider the problem of state estimation for a dynamic system driven by unobserved, correlated inputs. We model these inputs via an uncertain set of temporally correlated dynamic models, where this uncertainty includes the number of modes, their associated statistics, and the rate of mode transitions. The dynamic system is formulated via two interacting graphs: a hidden Markov model (HMM) and a linear-Gaussian state space model. The HMM's state space indexes system modes, while its outputs are the unobserved inputs to the linear dynamical system. This Markovian structure accounts for temporal persistence of input regimes, but avoids rigid assumptions about their detailed dynamics. Via a hierarchical Dirichlet process (HDP) prior, the complexity of our infinite state space robustly adapts to new observations. We present a learning algorithm and computational results that demonstrate the utility of the HDP for tracking, and show that it efficiently learns typical dynamics from noisy data.

...read moreread less

Proceedings Article•

Fast search for Dirichlet process mixture models

[...]

Hal Daumé

11 Mar 2007

TL;DR: Experimental results show that using these techniques is it possible to apply DP mixture models to very large data sets and that search algorithms provide a practical alternative to expensive MCMC and variational techniques.

...read moreread less

Abstract: Dirichlet process (DP) mixture models provide a flexible Bayesian framework for density estimation. Unfortunately, their flexibility comes at a cost: inference in DP mixture models is computationally expensive, even when conjugate distributions are used. In the common case when one seeks only a maximum a posteriori assignment of data points to clusters, we show that search algorithms provide a practical alternative to expensive MCMC and variational techniques. When a true posterior sample is desired, the solution found by search can serve as a good initializer for MCMC. Experimental results show that using these techniques is it possible to apply DP mixture models to very large data sets.

...read moreread less

Proceedings Article•

Efficient Bayesian task-level transfer learning

[...]

Daniel M. Roy¹, Leslie Pack Kaelbling¹•Institutions (1)

Massachusetts Institute of Technology¹

06 Jan 2007

TL;DR: A hierarchical extension of the classic Naive Bayes classifier that couples multiple NaiveBayes classifiers by placing a Dirichlet Process prior over their parameters is presented and how recent advances in approximate inference in the DirICHlet Process mixture model enable efficient inference is shown.

...read moreread less

Abstract: In this paper, we show how using the Dirichlet Process mixture model as a generative model of data sets provides a simple and effective method for transfer learning. In particular, we present a hierarchical extension of the classic Naive Bayes classifier that couples multiple Naive Bayes classifiers by placing a Dirichlet Process prior over their parameters and show how recent advances in approximate inference in the Dirichlet Process mixture model enable efficient inference. We evaluate the resulting model in a meeting domain, in which the system decides, based on a learned model of the user's behavior, whether to accept or reject the request on his or her behalf. The extended model outperforms the standard Naive Bayes model by using data from other users to influence its predictions.

...read moreread less

Book Chapter•DOI•

Large sample asymptotics for the two-parameter Poisson--Dirichlet process

[...]

Lancelot F. James

31 Aug 2007-arXiv: Probability

TL;DR: In this paper, the authors explore the consistency and weak convergence of the two-parameter Poisson-Dirichlet posterior process in the context of estimating an unknown probability measure, viewing this process as a natural extension of the Dirichlet process.

...read moreread less

Abstract: This paper explores large sample properties of the two-parameter $(\alpha,\theta)$ Poisson--Dirichlet Process in two contexts. In a Bayesian context of estimating an unknown probability measure, viewing this process as a natural extension of the Dirichlet process, we explore the consistency and weak convergence of the the two-parameter Poisson--Dirichlet posterior process. We also establish the weak convergence of properly centered two-parameter Poisson--Dirichlet processes for large $\theta+n\alpha.$ This latter result complements large $\theta$ results for the Dirichlet process and Poisson--Dirichlet sequences, and complements a recent result on large deviation principles for the two-parameter Poisson--Dirichlet process. A crucial component of our results is the use of distributional identities that may be useful in other contexts.

...read moreread less

Proceedings Article•DOI•

Multi-task learning for sequential data via iHMMs and the nested Dirichlet process

[...]

Kai Ni¹, Lawrence Carin¹, David B. Dunson²•Institutions (2)

Duke University¹, National Institutes of Health²

20 Jun 2007

TL;DR: A new hierarchical nonparametric Bayesian model is proposed for the problem of multitask learning (MTL) with sequential data that allows us to perform task-level clustering and data- level clustering simultaneously, with which the learning for individual iHMMs is enhanced and between-task similarities are learned.

...read moreread less

Abstract: A new hierarchical nonparametric Bayesian model is proposed for the problem of multitask learning (MTL) with sequential data. Sequential data are typically modeled with a hidden Markov model (HMM), for which one often must choose an appropriate model structure (number of states) before learning. Here we model sequential data from each task with an infinite hidden Markov model (iHMM), avoiding the problem of model selection. The MTL for iHMMs is implemented by imposing a nested Dirichlet process (nDP) prior on the base distributions of the iHMMs. The nDP-iHMM MTL method allows us to perform task-level clustering and data-level clustering simultaneously, with which the learning for individual iHMMs is enhanced and between-task similarities are learned. Learning and inference for the nDP-iHMM MTL are based on a Gibbs sampler. The effectiveness of the framework is demonstrated using synthetic data as well as real music data.

...read moreread less

Proceedings Article•DOI•

Image Denoising with Nonparametric Hidden Markov Trees

[...]

Jyri Kivinen¹, Erik B. Sudderth², Michael I. Jordan²•Institutions (2)

Helsinki University of Technology¹, University of California, Berkeley²

12 Nov 2007

TL;DR: A hierarchical, nonparametric statistical model for wavelet representations of natural images that automatically adapts to the complexity of different images and wavelet bases through a Monte Carlo learning algorithm.

...read moreread less

Abstract: We develop a hierarchical, nonparametric statistical model for wavelet representations of natural images. Extending previous work on Gaussian scale mixtures, wavelet coefficients are marginally distributed according to infinite, Dirichlet process mixtures. A hidden Markov tree is then used to couple the mixture assignments at neighboring nodes. Via a Monte Carlo learning algorithm, the resulting hierarchical Dirichlet process hidden Markov tree (HDP-HMT) model automatically adapts to the complexity of different images and wavelet bases. Image denoising results demonstrate the effectiveness of this learning process.

...read moreread less

Book Chapter•DOI•

High level group analysis of FMRI data based on dirichlet process mixture models

[...]

Bertrand Thirion¹, Alan Tucholka², Merlin Keller¹, Philippe Pinel³, Alexis Roche², Jean-François Mangin², Jean-Baptiste Poline² - Show less +3 more•Institutions (3)

French Institute for Research in Computer Science and Automation¹, IBM², French Institute of Health and Medical Research³

02 Jul 2007

TL;DR: A new procedure is developed that extracts structures individually and compares them at the group level and uses a Dirichlet Process Mixture Model for inference about spatial locations of interest.

...read moreread less

Abstract: Inferring the position of functionally active regions from a multi-subject fMRI dataset involves the comparison of the individual data and the inference of a common activity model. While voxel-based analyzes, e.g. Random Effect statistics, are widely used, they do not model each individual activation pattern. Here, we develop a new procedure that extracts structures individually and compares them at the group level. For inference about spatial locations of interest, a Dirichlet Process Mixture Model is used. Finally, inter-subject correspondences are computed with Bayesian Network models. We show the power of the technique on both simulated and real datasets and compare it with standard inference techniques.

...read moreread less

Journal Article•DOI•

The Dirichlet Distribution and Process through Neutralities

[...]

Konstancja Bobecka¹, Jacek Wesołowski¹•Institutions (1)

Warsaw University of Technology¹

26 Apr 2007-Journal of Theoretical Probability

TL;DR: In this paper, a new characterization of the Dirichlet distribution based on the notion of complete neutrality and a regression version of neutrality is derived, which unifies earlier characterizations by James and Mosimann (Ann. Stat. 8, 183−189, 1980) and by Seshadri and Wesolowski (Sankhyā, A 65, 248−291, 2003).

...read moreread less

Abstract: A new characterization of the Dirichlet distribution, based on the notion of complete neutrality and a regression version of neutrality, is derived. It unifies earlier characterizations by James and Mosimann (Ann. Stat. 8, 183–189, 1980) and by Seshadri and Wesolowski (Sankhyā, A 65, 248–291, 2003). Also new results on identification of the Dirichlet process in the class of neutral-to-the-right processes are obtained. The proof of the main result makes an extensive use of the method of moments.

...read moreread less

Proceedings Article•DOI•

Dirichlet Process HMM Mixture Models with Application to Music Analysis

[...]

Yuting Qi¹, John Paisley¹, Lawrence Carin¹•Institutions (1)

Duke University¹

15 Apr 2007

TL;DR: This work focuses on exploring music similarities as an important application, highlighting the effectiveness of the HMM mixture model, and evaluation of posterior distributions for all model parameters is achieved via a variational Bayes formulation.

...read moreread less

Abstract: A hidden Markov mixture model is developed using a Dirichlet process (DP) prior, to represent the statistics of sequential data for which a single hidden Markov model (HMM) may not be sufficient. The DP prior has an intrinsic clustering property that encourages parameter sharing, naturally revealing the proper number of mixture components. The evaluation of posterior distributions for all model parameters is achieved via a variational Bayes formulation. We focus on exploring music similarities as an important application, highlighting the effectiveness of the HMM mixture model. Experimental results are presented from classical music clips.

...read moreread less

Posted Content•

New Dirichlet mean identities

[...]

Lancelot F. James

04 Aug 2007-arXiv: Probability

TL;DR: In this paper, the authors introduce two distributional operations, which consist of multiplying a mean functional by an independent beta random variable and an operation involving an exponential change of measure, which identify relationships between different means and their densities.

...read moreread less

Abstract: An important line of research is the investigation of the laws of random variables known as Dirichlet means as discussed in Cifarelli and Regazzini(1990). However there is not much information on inter-relationships between different Dirichlet means. Here we introduce two distributional operations, which consist of multiplying a mean functional by an independent beta random variable and an operation involving an exponential change of measure. These operations identify relationships between different means and their densities. This allows one to use the often considerable analytic work to obtain results for one Dirichlet mean to obtain results for an entire family of otherwise seemingly unrelated Dirichlet means. Additionally, it allows one to obtain explicit densities for the related class of random variables that have generalized gamma convolution distributions, and the finite-dimensional distribution of their associated L\'evy processes. This has implications in, for instance, the explicit description of Bayesian nonparametric prior and posterior models, and more generally in a variety of applications in probability and statistics involving Levy processes.

...read moreread less

Proceedings Article•DOI•

Multi-Aspect Target Classification and Detection via the Infinite Hidden Markov Model

[...]

Kai Ni¹, Yuting Qi¹, Lawrence Carin¹•Institutions (1)

Duke University¹

15 Apr 2007

TL;DR: A new multi-aspect target detection method is presented based on the infinite hidden Markov model (iHMM), where the scattering of waves from multiple targets is modeled as an iHMM with the number of underlying states treated as infinite.

...read moreread less

Abstract: A new multi-aspect target detection method is presented based on the infinite hidden Markov model (iHMM). The scattering of waves from multiple targets is modeled as an iHMM with the number of underlying states treated as infinite, from which a full posterior distribution on the number of states associated with the targets is inferred and the target-dependent states are learned collectively. A set of Dirichlet processes (DPs) are used to define the rows of the HMM transition matrix and these DPs are linked and shared via a hierarchical Dirichlet process (HDP). Learning and inference for the iHMM are based on an effective Gibbs sampler. The framework is demonstrated using measured acoustic scattering data.

...read moreread less

Journal Issue•DOI•

Topic-based language models using Dirichlet Mixtures

[...]

Sadamitsu Kugatsu¹, Takuya Mishina², Mikio Yamamoto¹•Institutions (2)

University of Tsukuba¹, IBM²

01 Nov 2007-Systems and Computers in Japan

TL;DR: A generative text model using Dirichlet Mixtures as a distribution for parameters of a multinomial distribution, whose compound distribution is Polya Mixtures, is proposed and it is shown that the model exhibits high performance in application to statistical language models.

...read moreread less

Abstract: We propose a generative text model using Dirichlet Mixtures as a distribution for parameters of a multinomial distribution, whose compound distribution is Polya Mixtures, and show that the model exhibits high performance in application to statistical language models. In this paper, we discuss some methods for estimating parameters of Dirichlet Mixtures and for estimating the expectation values of the a posteriori distribution needed for adaptation, and then compare them with two previous text models. The first conventional model is the Mixture of Unigrams, which is often used for incorporating topics into statistical language models. The second one is LDA (Latent Dirichlet Allocation), a typical generative text model. In an experiment using document probabilities and dynamic adaptation of n-gram models for newspaper articles, we show that the proposed model, in comparison with the two previous models, can achieve a lower perplexity at low mixture numbers. © 2007 Wiley Periodicals, Inc. Syst Comp Jpn, 38(12): 76– 85, 2007; Published online in Wiley InterScience (). DOI 10.1002-scj.20629

...read moreread less

Journal Article•DOI•

Multiaspect target detection via the infinite hidden Markov model.

[...]

Kai Ni¹, Yuting Qi, Lawrence Carin•Institutions (1)

Duke University¹

04 May 2007-Journal of the Acoustical Society of America

TL;DR: A new multiaspect target detection method is presented based on the infinite hidden Markov model (iHMM), where the scattering of waves from a target is modeled as an iHMM with the number of underlying states treated as infinite.

...read moreread less

Abstract: A new multiaspect target detection method is presented based on the infinite hidden Markov model (iHMM). The scattering of waves from a target is modeled as an iHMM with the number of underlying states treated as infinite, from which a full posterior distribution on the number of states associated with the targets is inferred and the target-dependent states are learned collectively. A set of Dirichlet processes (DPs) are used to define the rows of the HMM transition matrix and these DPs are linked and shared via a hierarchical Dirichlet process. Learning and inference for the iHMM are based on a Gibbs sampler. The basic framework is applied to a detailed analysis of measured acoustic scattering data.

...read moreread less

Hierarchical Dirichlet Process-Based Models For Discovery of Cross-species Mammalian Gene Expression

[...]

Georg K. Gerber, Robin D. Dowell, Tommi S. Jaakkola, David K. Gifford¹•Institutions (1)

Massachusetts Institute of Technology¹

06 Jul 2007

TL;DR: GeneProgram is a new unsupervised computational framework that uses expression data to simultaneously organize genes into overlapping programs and tissues into groups to produce maps of inter-species expression programs, which are sorted by generality scores that exploit the automatically learned groupings.

...read moreread less

Abstract: An important research problem in computational biology is the identification of expression programs, sets of co-activated genes orchestrating physiological processes, and the characterization of the functional breadth of these programs. The use of mammalian expression data compendia for discovery of such programs presents several challenges, including: 1) cellular inhomogeneity within samples, 2) genetic and environmental variation across samples, and 3) uncertainty in the numbers of programs and sample populations. We developed GeneProgram, a new unsupervised computational framework that uses expression data to simultaneously organize genes into overlapping programs and tissues into groups to produce maps of inter-species expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Our method addresses each of the above challenges by using a probabilistic model that: 1) allocates mRNA to different expression programs that may be shared across tissues, 2) is hierarchical, treating each tissue as a sample from a population of related tissues, and 3) uses Dirichlet Processes, a non-parametric Bayesian method that provides prior distributions over numbers of sets while penalizing model complexity. Using real gene expression data, we show that GeneProgram outperforms several popular expression analysis methods in recovering biologically interpretable gene sets. From a large compendium of mouse and human expression data, GeneProgram discovers 19 tissue groups and 100 expression programs active in mammalian tissues. Our method automatically constructs a comprehensive, body-wide map of expression programs and characterizes their functional generality. This map can be used for guiding future biological experiments, such as discovery of genes for new drug targets that exhibit minimal “cross-talk” with unintended organs, or genes that maintain general physiological responses that go awry in disease states. Further, our method is general, and can be applied readily to novel compendia of biological data.

...read moreread less

Journal Article•DOI•

Half Dirichlet Problems and Decompositions of Poisson Kernels

[...]

Richard Delanghe¹, Tao Qian²•Institutions (2)

Ghent University¹, University of Macau²

09 Jun 2007-Advances in Applied Clifford Algebras

TL;DR: In this paper, it was shown that the only domains for which the half-Dirichlet problems are solvable in the same pattern are balls and half-spaces, and the solutions further lead to decompositions of the Poisson kernels and the fact that the classical Dirichlet problem may be solved merely by using Cauchy transformation in the respective two contexts.

...read moreread less

Abstract: Following the previous study on the unit ball of Delanghe et al, half-Dirichlet problems for the upper-half space are presented and solved. The solutions further lead to decompositions of the Poisson kernels, and the fact that the classical Dirichlet problems may be solved merely by using Cauchy transformation in the respective two contexts. We show that the only domains for which the half-Dirichlet problems are solvable in the same pattern are balls and half-spaces.

...read moreread less

Proceedings Article•DOI•

Infinite Hidden Markov Models and ISA Features for Unusual-Event Detection in Video

[...]

Iulian Pruteanu-Malinici¹, Lawrence Carin¹•Institutions (1)

Duke University¹

12 Nov 2007

TL;DR: This work addresses the problem of unusual-event detection in a video sequence by modeling time-evolving properties of features modeled via an infinite hidden Markov model (iHMM), which is trained using "normal"/"typical" video data.

...read moreread less

Abstract: We address the problem of unusual-event detection in a video sequence. Invariant subspace analysis (ISA) is used to extract features from the video, and the time-evolving properties of these features are modeled via an infinite hidden Markov model (iHMM), which is trained using "normal"/"typical" video data. The iHMM automatically determines the proper number of HMM states, and it retains a full posterior density function on all model parameters. Anomalies (unusual events) are detected subsequently if a low likelihood is observed when associated sequential features are submitted to the trained iHMM. A hierarchical Dirichlet process (HDP) framework is employed in the formulation of the iHMM. The evaluation of posterior distributions for the iHMM is achieved in two ways: via MCMC and using a variational Bayes (VB) formulation.

...read moreread less