scispace - formally typeset
Search or ask a question

Showing papers on "Hierarchical Dirichlet process published in 2016"


Proceedings ArticleDOI
01 Aug 2016
TL;DR: This paper proposed to use the von Mises-Fisher distribution to model the density of words over a unit sphere and use a Hierarchical Dirichlet Process for their base topic model and propose an efficient inference algorithm based on stochastic variational inference.
Abstract: Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.

89 citations


Journal ArticleDOI
TL;DR: The problem of classifying human activities occurring in depth image sequences is addressed and a two level hierarchical Hidden Markov Model, with independent Markov chains for the joint positions and depth image pattern, is used to model the features.

48 citations


Journal ArticleDOI
TL;DR: A variational Bayesian learning framework for the infinite generalized Dirichlet mixture model that has proven its capability to model complex multidimensional data and integrates a “feature selection” approach to highlight the features that are most informative in order to construct an appropriate model in terms of clustering accuracy.
Abstract: We developed a variational Bayesian learning framework for the infinite generalized Dirichlet mixture model (i.e. a weighted mixture of Dirichlet process priors based on the generalized inverted Dirichlet distribution) that has proven its capability to model complex multidimensional data. We also integrate a "feature selection" approach to highlight the features that are most informative in order to construct an appropriate model in terms of clustering accuracy. Experiments on synthetic data as well as real data generated from visual scenes and handwritten digits datasets illustrate and validate the proposed approach.

38 citations


Proceedings ArticleDOI
27 Feb 2016
TL;DR: A new Stochastic Variational Dual Hierarchical Dirichlet Process (SV-DHDP) model is presented, based on finding latent Path Patterns in both real and simulated data in order to analyze and compare them.
Abstract: Crowd simulation has been an active and important area of research in the field of interactive 3D graphics for several decades. However, only recently has there been an increased focus on evaluating the fidelity of the results with respect to real-world situations. The focus to date has been on analyzing the properties of low-level features such as pedestrian trajectories, or global features such as crowd densities. We propose a new approach based on finding latent Path Patterns in both real and simulated data in order to analyze and compare them. Unsupervised clustering by non-parametric Bayesian inference is used to learn the patterns, which themselves provide a rich visualization of the crowd's behaviour. To this end, we present a new Stochastic Variational Dual Hierarchical Dirichlet Process (SV-DHDP) model. The fidelity of the patterns is then computed with respect to a reference, thus allowing the outputs of different algorithms to be compared with each other and/or with real data accordingly.

37 citations


Journal ArticleDOI
TL;DR: A novel hypergraph based vertex-reinforced random walk framework for multi-document summarization that exploits the Hierarchical Dirichlet Process (HDP) topic model to learn a word-topic probability distribution in sentences and a time-variant random walk algorithm for hypergraphs is developed to rank sentences which ensures sentence diversity by vertex- reinforcement in summaries.
Abstract: We propose a novel hybrid method to capture group relation of sentences.We cluster sentences with a KL-divergence based on word-topic distribution.We proposed a vertex reinforcement random walk process in a hypergraph model.The process simultaneously consider the query similarity, the centrality and the diversity of sentences.We implement our framework and verify improvement over appropriate baselines. General graph random walk has been successfully applied in multi-document summarization, but it has some limitations to process documents by this way. In this paper, we propose a novel hypergraph based vertex-reinforced random walk framework for multi-document summarization. The framework first exploits the Hierarchical Dirichlet Process (HDP) topic model to learn a word-topic probability distribution in sentences. Then the hypergraph is used to capture both cluster relationship based on the word-topic probability distribution and pairwise similarity among sentences. Finally, a time-variant random walk algorithm for hypergraphs is developed to rank sentences which ensures sentence diversity by vertex-reinforcement in summaries. Experimental results on the public available dataset demonstrate the effectiveness of our framework.

34 citations


Journal ArticleDOI
TL;DR: A probabilistic generative model that integrates LM and AM, i.e., HDP-HLM is developed, an inference procedure is derived using the blocked Gibbs sampler, and the NPB-DAA can discover words directly from continuous human speech signals in an unsupervised manner.
Abstract: Human infants can discover words directly from unsegmented speech signals without any explicitly labeled data. Current machine learning methods cannot efficiently estimate language model (LM) and acoustic model (AM) and discover words directly from continuous human speech signals in an unsupervised manner. To solve this problem, we propose an integrative generative model that combines an LM and an AM into a single generative model called the hierarchical Dirichlet process hidden LM (HDP-HLM). The HDP-HLM is obtained by extending the hierarchical Dirichlet process hidden semi-Markov model (HDP-HSMM) proposed by Johnson $et~al$ . An inference procedure for the HDP-HLM is derived using the blocked Gibbs sampler originally proposed for the HDP-HSMM. This procedure enables the simultaneous and direct inference of LM and AM from continuous speech signals. Based on the HDP-HLM and its inference procedure, we develop a novel machine learning method called nonparametric Bayesian double articulation analyzer (NPB-DAA) that can directly acquire LM and AM from observed continuous speech signals. By assuming HDP-HLM as a generative model of observed time series data, and by inferring latent variables of the model, the method can analyze latent double articulation structure, i.e., hierarchically organized latent words and phonemes, of the data in an unsupervised manner. We also carried out two evaluation experiments using synthetic data and actual human continuous speech signals representing Japanese vowel sequences. In the word acquisition and phoneme categorization tasks, the NPB-DAA outperformed a conventional double articulation analyzer and baseline automatic speech recognition system whose AM was trained in a supervised manner. The main contributions of this paper are as follows: 1) we develop a probabilistic generative model that integrates LM and AM, i.e., HDP-HLM; 2) we derive an inference method for this, and propose the NPB-DAA; and 3) we show that the NPB-DAA can discover words directly from continuous human speech signals in an unsupervised manner.

27 citations


Posted Content
TL;DR: This paper uses a Hierarchical Dirichlet Process for the base topic model and proposes an efficient inference algorithm based on Stochastic Variational Inference that enables it to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics.
Abstract: Traditional topic models do not account for semantic regularities in language. Recent distributional representations of words exhibit semantic consistency over directional metrics such as cosine similarity. However, neither categorical nor Gaussian observational distributions used in existing topic models are appropriate to leverage such correlations. In this paper, we propose to use the von Mises-Fisher distribution to model the density of words over a unit sphere. Such a representation is well-suited for directional data. We use a Hierarchical Dirichlet Process for our base topic model and propose an efficient inference algorithm based on Stochastic Variational Inference. This model enables us to naturally exploit the semantic structures of word embeddings while flexibly discovering the number of topics. Experiments demonstrate that our method outperforms competitive approaches in terms of topic coherence on two different text corpora while offering efficient inference.

23 citations


Journal ArticleDOI
Xianghua Fu1, Jianqiang Li1, Kun Yang1, Laizhong Cui1, Lei Yang1 
TL;DR: Compared with other related topic models on Chinese social media dataset Tianya-80299, the experiment results show that DOHDP model provides the best performance for discovering the evolutionary topics of Chinese social texts.

22 citations


Journal ArticleDOI
TL;DR: A novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics, and evaluates the proposed models on two real-world medical datasets.
Abstract: Electronic Medical Record (EMR) has established itself as a valuable resource for large scale analysis of health data. A hospital EMR dataset typically consists of medical records of hospitalized patients. A medical record contains diagnostic information (diagnosis codes), procedures performed (procedure codes) and admission details. Traditional topic models, such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet process (HDP), can be employed to discover disease topics from EMR data by treating patients as documents and diagnosis codes as words. This topic modeling helps to understand the constitution of patient diseases and offers a tool for better planning of treatment. In this paper, we propose a novel and flexible hierarchical Bayesian nonparametric model, the word distance dependent Chinese restaurant franchise (wddCRF), which incorporates word-to-word distances to discover semantically-coherent disease topics. We are motivated by the fact that diagnosis codes are connected in the form of ICD-10 tree structure which presents semantic relationships between codes. We exploit a decay function to incorporate distances between words at the bottom level of wddCRF. Efficient inference is derived for the wddCRF by using MCMC technique. Furthermore, since procedure codes are often correlated with diagnosis codes, we develop the correspondence wddCRF (Corr-wddCRF) to explore conditional relationships of procedure codes for a given disease pattern. Efficient collapsed Gibbs sampling is derived for the Corr-wddCRF. We evaluate the proposed models on two real-world medical datasets - PolyVascular disease and Acute Myocardial Infarction disease. We demonstrate that the Corr-wddCRF model discovers more coherent topics than the Corr-HDP. We also use disease topic proportions as new features and show that using features from the Corr-wddCRF outperforms the baselines on 14-days readmission prediction. Beside these, the prediction for procedure codes based on the Corr-wddCRF also shows considerable accuracy.

20 citations


Proceedings Article
01 Jan 2016
TL;DR: Two new models that extend LDA in a simple and intuitive fashion by directly expressing a distribution over the number of topics are proposed and a new online Bayesian moment matching technique to learn the parameters and theNumber of topics of those models based on streaming data is proposed.
Abstract: Latent Dirichlet Allocation (LDA) is a very popular model for topic modeling as well as many other problems with latent groups. It is both simple and effective. When the number of topics (or latent groups) is unknown, the Hierarchical Dirichlet Process (HDP) provides an elegant non-parametric extension; however, it is a complex model and it is difficult to incorporate prior knowledge since the distribution over topics is implicit. We propose two new models that extend LDA in a simple and intuitive fashion by directly expressing a distribution over the number of topics. We also propose a new online Bayesian moment matching technique to learn the parameters and the number of topics of those models based on streaming data. The approach achieves higher log-likelihood than batch and online HDP with fixed hyperparameters on several corpora.

16 citations


Journal ArticleDOI
01 Mar 2016
TL;DR: The proposed clustering approach tackles the problem of modelling grouped data where observations are organized into groups that are allowed to remain statistically linked by sharing mixture components and is learned using a principled variational Bayes inference-based algorithm that is developed.
Abstract: Data clustering is a fundamental unsupervised learning task in several domains such as data mining, computer vision, information retrieval, and pattern recognition. In this paper, we propose and analyze a new clustering approach based on both hierarchical Dirichlet processes and the generalized Dirichlet distribution, which leads to an interesting statistical framework for data analysis and modelling. Our approach can be viewed as a hierarchical extension of the infinite generalized Dirichlet mixture model previously proposed in Bouguila and Ziou (IEEE Trans Neural Netw 21(1):107---122, 2010). The proposed clustering approach tackles the problem of modelling grouped data where observations are organized into groups that we allow to remain statistically linked by sharing mixture components. The resulting clustering model is learned using a principled variational Bayes inference-based algorithm that we have developed. Extensive experiments and simulations, based on two challenging applications namely images categorization and web service intrusion detection, demonstrate our model usefulness and merits.

Journal ArticleDOI
TL;DR: An unsupervised language identification approach based on Latent Dirichlet Allocation where the raw n-gram count is taken as features without any smoothing, pruning or interpolation is proposed.

Journal ArticleDOI
TL;DR: In this paper, a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain is developed.
Abstract: This article develops a method for simultaneous estimation of density functions for a collection of populations of protein backbone angle pairs using a data-driven, shared basis that is constructed by bivariate spline functions defined on a triangulation of the bivariate domain. The circular nature of angular data is taken into account by imposing appropriate smoothness constraints across boundaries of the triangles. Maximum penalized likelihood is used to fit the model and an alternating blockwise Newton-type algorithm is developed for computation. A simulation study shows that the collective estimation approach is statistically more efficient than estimating the densities individually. The proposed method was used to estimate neighbor-dependent distributions of protein backbone dihedral angles (i.e., Ramachandran distributions). The estimated distributions were applied to protein loop modeling, one of the most challenging open problems in protein structure prediction, by feeding them into an angular-sampling-based loop structure prediction framework. Our estimated distributions compared favorably to the Ramachandran distributions estimated by fitting a hierarchical Dirichlet process model; and in particular, our distributions showed significant improvements on the hard cases where existing methods do not work well.

Journal ArticleDOI
TL;DR: A model based on HDPHMM that shares data points between states, model non-ergodic structures, and models non-emitting states is introduced that produces a 20% relative reduction in error rate for phoneme classification and an 18% relative reduced on a speech recognition task on the TIMIT Corpus compared to a baseline system consisting of a parametric HMM.
Abstract: Nonparametric Bayesian models use a Bayesian framework to learn model complexity automatically from the data, eliminating the need for a complex model selection process. A Hierarchical Dirichlet Process Hidden Markov Model (HDPHMM) is the nonparametric Bayesian equivalent of a hidden Markov model (HMM), but is restricted to an ergodic topology that uses a Dirichlet Process Model to achieve a mixture distribution-like model. For applications involving ordered sequences (e.g., speech recognition), it is desirable to impose a left-to-right structure on the model. In this paper, we introduce a model based on HDPHMM that: 1) shares data points between states, 2) models non-ergodic structures, and 3) models non-emitting states. The first point is particularly important because Gaussian mixture models, which support such sharing, have been very effective at modeling modalities in a signal (e.g., speaker variability). Further, sharing data points allows models to be estimated more accurately, an important consideration for applications such as speech recognition in which some mixture components occur infrequently. We demonstrate that this new model produces a 20% relative reduction in error rate for phoneme classification and an 18% relative reduction on a speech recognition task on the TIMIT Corpus compared to a baseline system consisting of a parametric HMM.

Journal ArticleDOI
TL;DR: This is the first work that discusses and distinguishes between two groups of particularly challenging topic evolution phenomena: topic splitting and speciation and topic convergence and merging, in addition to the more widely recognized emergence and disappearance and gradual evolution.
Abstract: The rapidly expanding corpus of medical research literature presents major challenges in the understanding of previous work, the extraction of maximum information from collected data, and the identification of promising research directions. We present a case for the use of advanced machine learning techniques as an aide in this task and introduce a novel methodology that is shown to be capable of extracting meaningful information from large longitudinal corpora and of tracking complex temporal changes within it. Our framework is based on (i) the discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes. More specifically, this is the first work that discusses and distinguishes between two groups of particularly challenging topic evolution phenomena: topic splitting and speciation and topic convergence and merging, in addition to the more widely recognized emergence and disappearance and gradual evolution. The proposed framework is evaluated on a public medical literature corpus.

Journal ArticleDOI
TL;DR: In this article, the construction of pairwise dependence between m random density functions, each of which is modeled as a mixture of Dirichlet processes, is considered, and the key to this is how to create dependencies between random DPs.

Journal ArticleDOI
TL;DR: These are shown to belong to classes of finite mixtures of Dirichlet processes and gamma random measures for the two models respectively, yielding conjugacy of these classes to the type of data the authors consider, and explicit algorithms are provided to recursively compute the parameters of the mixtures.
Abstract: We extend classic characterisations of posterior distributions under Dirichlet process and gamma random measures priors to a dynamic framework. We consider the problem of learning, from indirect observations, two families of time-dependent processes of interest in Bayesian nonparametrics: the first is a dependent Dirichlet process driven by a Fleming-Viot model, and the data are random samples from the process state at discrete times; the second is a collection of dependent gamma random measures driven by a Dawson-Watanabe model, and the data are collected according to a Poisson point process with intensity given by the process state at discrete times. Both driving processes are diffusions taking values in the space of discrete measures whose support varies with time, and are stationary and reversible with respect to Dirichlet and gamma priors respectively. A common methodology is developed to obtain in closed form the time-marginal posteriors given past and present data. These are shown to belong to classes of finite mixtures of Dirichlet processes and gamma random measures for the two models respectively, yielding conjugacy of these classes to the type of data we consider. We provide explicit results on the parameters of the mixture components and on the mixing weights, which are time-varying and drive the mixtures towards the respective priors in absence of further data. Explicit algorithms are provided to recursively compute the parameters of the mixtures. Our results are based on the projective properties of the signals and on certain duality properties of their projections.

Posted Content
TL;DR: The generative model is used to synthesize both time-independent and time-dependent behaviors by relying on the principles of shared and autonomous control and yields a scalable online sequence clustering algorithm that is non-parametric in the number of clusters and the subspace dimension of each cluster.
Abstract: Small variance asymptotics is emerging as a useful technique for inference in large scale Bayesian non-parametric mixture models. This paper analyses the online learning of robot manipulation tasks with Bayesian non-parametric mixture models under small variance asymptotics. The analysis yields a scalable online sequence clustering (SOSC) algorithm that is non-parametric in the number of clusters and the subspace dimension of each cluster. SOSC groups the new datapoint in its low dimensional subspace by online inference in a non-parametric mixture of probabilistic principal component analyzers (MPPCA) based on Dirichlet process, and captures the state transition and state duration information online in a hidden semi-Markov model (HSMM) based on hierarchical Dirichlet process. A task-parameterized formulation of our approach autonomously adapts the model to changing environmental situations during manipulation. We apply the algorithm in a teleoperation setting to recognize the intention of the operator and remotely adjust the movement of the robot using the learned model. The generative model is used to synthesize both time-independent and time-dependent behaviours by relying on the principles of shared and autonomous control. Experiments with the Baxter robot yield parsimonious clusters that adapt online with new demonstrations and assist the operator in performing remote manipulation tasks.

Journal ArticleDOI
Xunan Zhang1, Shiji Song1, Lei Zhu, Keyou You1, Cheng Wu1 
TL;DR: This study extends a finite mixture model to the infinite case by considering Dirichlet process mixtures and compute the posterior distributions using the variational Bayesian expectation maximization algorithm, which optimizes the evidence lower bound on the complete-data log marginal likelihood.
Abstract: This study presents a novel approach to unsupervised learning for clustering with missing data. We first extend a finite mixture model to the infinite case by considering Dirichlet process mixtures, which can automatically determine the number of mixture components or clusters. Furthermore, we view the missing features as latent variables and compute the posterior distributions using the variational Bayesian expectation maximization algorithm, which optimizes the evidence lower bound on the complete-data log marginal likelihood. We demonstrate the performance on several artificial data sets with missing values. The experimental results indicate that the proposed method outperforms some classic imputation methods. We finally present an application to seabed hydrothermal sulfide color images analysis problem.

Journal ArticleDOI
Wenjun Cheng, Luyao Ma, Tiejun Yang, Jiali Liang1, Yan Zhang1 
09 Sep 2016-PLOS ONE
TL;DR: This paper presents a novel framework that jointly segments multiple lung computed tomography (CT) images via hierarchical Dirichlet process (HDP) based on the assumption that lung CT images from different patients share similar image structure (organ sets and relative positioning).
Abstract: Accurate lung CT image segmentation is of great clinical value, especially when it comes to delineate pathological regions including lung tumor. In this paper, we present a novel framework that jointly segments multiple lung computed tomography (CT) images via hierarchical Dirichlet process (HDP). In specifics, based on the assumption that lung CT images from different patients share similar image structure (organ sets and relative positioning), we derive a mathematical model to segment them simultaneously so that shared information across patients could be utilized to regularize each individual segmentation. Moreover, compared to many conventional models, the algorithm requires little manual involvement due to the nonparametric nature of Dirichlet process (DP). We validated proposed model upon clinical data consisting of healthy and abnormal (lung cancer) patients. We demonstrate that, because of the joint segmentation fashion, more accurate and consistent segmentations could be obtained.

Journal ArticleDOI
TL;DR: This paper proposes a nonparametric Bayesian framework called VariScan for simultaneous clustering, variable selection, and prediction in high-throughput regression settings and demonstrates that VariScan often outperforms several well-known statistical methods.
Abstract: This paper proposes a nonparametric Bayesian framework called VariScan for simultaneous clustering, variable selection, and prediction in high-throughput regression settings. Poisson-Dirichlet processes are utilized to detect lower-dimensional latent clusters of covariates. An adaptive nonlinear prediction model is constructed for the response, achieving a balance between model parsimony and flexibility. Contrary to conventional belief, cluster detection is shown to be aposteriori consistent for a general class of models as the number of covariates and subjects grows. Simulation studies and data analyses demonstrate that VariScan often outperforms several well-known statistical methods.

Journal ArticleDOI
17 May 2016-PLOS ONE
TL;DR: This work developed a method of classifying and subsequently generating couple dynamics using a Hierarchical Dirichlet Process Hidden semi-Markov Model (HDP-HSMM), and reviews how this unsupervised learning technique generates plausible dyadic sequences that are sensitive to relationship quality and provide a natural mechanism for computational models of behavioral and affective micro-social processes.
Abstract: Sequential affect dynamics generated during the interaction of intimate dyads, such as married couples, are associated with a cascade of effects—some good and some bad—on each partner, close family members, and other social contacts. Although the effects are well documented, the probabilistic structures associated with micro-social processes connected to the varied outcomes remain enigmatic. Using extant data we developed a method of classifying and subsequently generating couple dynamics using a Hierarchical Dirichlet Process Hidden semi-Markov Model (HDP-HSMM). Our findings indicate that several key aspects of existing models of marital interaction are inadequate: affect state emissions and their durations, along with the expected variability differences between distressed and nondistressed couples are present but highly nuanced; and most surprisingly, heterogeneity among highly satisfied couples necessitate that they be divided into subgroups. We review how this unsupervised learning technique generates plausible dyadic sequences that are sensitive to relationship quality and provide a natural mechanism for computational models of behavioral and affective micro-social processes.

Journal ArticleDOI
TL;DR: This work proposes a variation of a non-parametric Bayesian modeling for supervised clustering as a mixture of Gaussians with the constraint of encouraging clusters of points with the same label to estimate the number of clusters.

Journal ArticleDOI
TL;DR: A novel feature that has a strong correlation with the movement intensity is computed and the hierarchical Dirichlet process (HDP) model is used to detect the activity levels from this feature.

Journal ArticleDOI
TL;DR: Experimental results show that the best strategy is to choose the largest value calculated from the statistics in a row, and that the parameter estimation method can efficiently solve for the parameters of generalized Dirichlet priors to significantly improve the performance of the multinomial naive Bayesian classifier.

Book ChapterDOI
19 Sep 2016
TL;DR: The authors proposed a nonparametric Bayesian mixture model that simultaneously optimizes the topic extraction and group clustering while allowing all topics to be shared by all clusters for grouped data, and formulated the model so that it can use a closed-form variational Bayesian method to approximately calculate the posterior distribution.
Abstract: We propose a nonparametric Bayesian mixture model that simultaneously optimizes the topic extraction and group clustering while allowing all topics to be shared by all clusters for grouped data. In addition, in order to enhance the computational efficiency on par with today’s large-scale data, we formulate our model so that it can use a closed-form variational Bayesian method to approximately calculate the posterior distribution. Experimental results with corpus data show that our model has a better performance than existing models, achieving a 22 % improvement against state-of-the-art model. Moreover, an experiment with location data from mobile phones shows that our model performs well in the field of big data analysis.

Proceedings ArticleDOI
01 Oct 2016
TL;DR: An unsupervised on-line learning algorithm that uses Bayesian nonparametrics for categorizing multimodal sensory signals such as audio, visual, and haptic information for robots and significantly improves both speech recognition and multi-modal categorization performances.
Abstract: One of the biggest challenges in intelligent robotics is to build robots that can learn to use language. To this end, we think that the practical long-term on-line concept/word learning algorithm for robots is a key issue to be addressed. In this paper, we develop an unsupervised on-line learning algorithm that uses Bayesian nonparametrics for categorizing multimodal sensory signals such as audio, visual, and haptic information for robots. The robot uses its physical body to grasp and observe an object from various viewpoints as well as listen to the sound during the observation. The most important property of the proposed framework is to learn multimodal concepts and the language model simultaneously. This mutual learning framework of concepts and language significantly improves both speech recognition and multimodal categorization performances. We conducted a long-term experiment where a human subject interacted with a real robot over 100 hours using 499 objects. Some interesting results of the experiment are discussed in this paper.

Journal ArticleDOI
TL;DR: Experimental results based on a challenging problem namely visual scenes categorization demonstrate the merits of the proposed statistical framework for data clustering which uses hierarchical Dirichlet processes and Beta-Liouville distributions.
Abstract: In this work, we develop a statistical framework for data clustering which uses hierarchical Dirichlet processes and Beta-Liouville distributions The parameters of this framework are leaned using two variational Bayes approaches The first one considers batch settings and the second one takes into account the dynamic nature of real data Experimental results based on a challenging problem namely visual scenes categorization demonstrate the merits of the proposed framework

Proceedings ArticleDOI
06 Jun 2016
TL;DR: This paper proposes a novel topic model named sequential correspondence hierarchical Dirichlet Processes (Seq-cHDP) to learn the hidden structure within video data and demonstrates that the model outperforms than other baseline models.
Abstract: Multimedia data mining based on topic models as an emerging technique has become a very popular research topic in recent years. In this paper, we propose a novel topic model named sequential correspondence hierarchical Dirichlet Processes (Seq-cHDP) to learn the hidden structure within video data. The Seq-cHDP model can be considered as an extended hierarchical Dirichlet processes (HDP) model containing two important features: one is the time-dependency mechanism that connects neighboring video frames on the basis of a time dependent Markovian assumption, and the other is the data correspondence mechanism that provides a solution for dealing with the multimodal data such as the mixture of visual words and speech words extracted from video files. We present a comprehensive evaluation for Seq-cHDP through experimentation and finally demonstrate that our model outperforms than other baseline models.

Proceedings ArticleDOI
20 Mar 2016
TL;DR: The results showed that the frequencies of appearance of certain clusters are statistically significantly different across the two groups, indicating that certain clusters may relate to pathological fetal heart rate patterns.
Abstract: In this paper, we propose to analyze fetal heart rate (FHR) signals by hierarchical Dirichlet process (HDP) mixture models. We investigate whether the clustering results of real-world FHR time series obtained by these models are informative in terms of determining the health status of a fetus. The FHR signals are divided into two groups, healthy and unhealthy, according to the umbilical arterial blood pH values of the fetuses. We computed the frequencies of clusters appearing in each of the groups, and applied the MannWhitney U test to compare the frequencies. The results showed that the frequencies of appearance of certain clusters are statistically significantly different across the two groups. This indicates that certain clusters may relate to pathological fetal heart rate patterns.