Showing papers on "Hierarchical Dirichlet process published in 2018"

PDF

Open Access

Journal Article•DOI•

Extracting Traffic Primitives Directly From Naturalistically Logged Data for Self-Driving Applications

[...]

Wenshuo Wang¹, Ding Zhao²•Institutions (2)

University of California, Berkeley¹, University of Michigan²

17 Jan 2018

TL;DR: In this article, a sticky hierarchical Dirichlet process hidden Markov model is proposed to automatically extract primitives from multidimensional traffic data without prior knowledge of the primitive settings.

...read moreread less

Abstract: Developing an automated vehicle, that can handle complicated driving scenarios and appropriately interact with other road users, requires the ability to semantically learn and understand driving environment, oftentimes, based on analyzing massive amounts of naturalistic driving data. An important paradigm that allows automated vehicles to both learn from human drivers and gain insights is understanding the principal compositions of the entire traffic, termed as traffic primitives. However, the exploding data growth presents a great challenge in extracting primitives from high-dimensional time-series traffic data with various types of road users engaged. Therefore, automatically extracting primitives is becoming one of the cost-efficient ways to help autonomous vehicles understand and predict the complex traffic scenarios. In addition, the extracted primitives from raw data should 1) be appropriate for automated driving applications and also 2) be easily used to generate new traffic scenarios. However, existing literature does not provide a method to automatically learn these primitives from large-scale traffic data. The contribution of this letter has two manifolds. The first one is that we proposed a new framework to generate new traffic scenarios from a handful of limited traffic data. The second one is that, we introduce a nonparametric Bayesian learning method—a sticky hierarchical Dirichlet process hidden Markov model—to automatically extract primitives from multidimensional traffic data without prior knowledge of the primitive settings. The developed method is then validated using one day of naturalistic driving data. Experiment results show that the nonparametric Bayesian learning method is able to extract primitives from traffic scenarios where both the binary and continuous events coexist.

...read moreread less

58 citations

Journal Article•DOI•

Integrating Driving Behavior and Traffic Context Through Signal Symbolization for Data Reduction and Risky Lane Change Detection

[...]

Ekim Yurtsever¹, Suguru Yamazaki², Chiyomi Miyajima¹, Kazuya Takeda¹, Masataka Mori², Kentarou Hitomi², Masumi Egawa² - Show less +3 more•Institutions (2)

Nagoya University¹, Denso²

01 Jun 2018

TL;DR: This symbolization framework is proposed as a data reduction method for naturalistic driving studies and co-occurrence chunking with clustering provided the best risky lane change detection.

...read moreread less

Abstract: A novel method for integrating driving behavior and traffic context through signal symbolization is presented in this paper. This symbolization framework is proposed as a data reduction method for naturalistic driving studies. Continuous sensor signals have been converted and reduced into sequences of symbols (chunks) using a sticky hierarchical Dirichlet process hidden Markov model and a nested Pitman–Yor language model. Then, co-occurrence chunking (COOC), the proposed integration method, has been applied to the driver behavior and the traffic context chunks. After the integration, COOC chunks have been associated with prototype driving scenes by using latent Dirichlet allocation. Finally, the translated sequence of chunks has been clustered into groups. Risky lane change detection experiments have been conducted with the symbolized data for evaluation purposes. A dataset comprised of 988 lane change scenes has been utilized for this process. Co-occurrence chunking with clustering provided the best risky lane change detection.

...read moreread less

29 citations

Proceedings Article•DOI•

A Stochastic Hybrid Framework for Driver Behavior Modeling Based on Hierarchical Dirichlet Process

[...]

Hossein Nourkhiz Mahjoub¹, Behrad Toghi¹, Yaser P. Fallah¹•Institutions (1)

University of Central Florida¹

01 Aug 2018

TL;DR: In this article, a stochastic hybrid modeling framework based on a non-parametric Bayesian inference method is investigated to solve the scalability problem for real-world V2V network realization.

...read moreread less

Abstract: Scalability is one of the major issues for real- world Vehicle-to-Vehicle network realization. To tackle this challenge, a stochastic hybrid modeling framework based on a non-parametric Bayesian inference method, i.e., hierarchical Dirichlet process (HDP), is investigated in this paper. This framework is able to jointly model driver/vehicle behavior through forecasting the vehicle dynamical time-series. This modeling framework could be merged with the notion of model-based information networking, which is recently proposed in the vehicular literature, to overcome the scalability challenges in dense vehicular networks via broadcasting the behavioral models instead of raw information dissemination. This modeling approach has been applied on several scenarios from the realistic Safety Pilot Model Deployment (SPMD) driving data set and the results show a higher performance of this model in comparison with the zero-hold method as the baseline.

...read moreread less

24 citations

Journal Article•DOI•

Discovering topic structures of a temporally evolving document corpus

[...]

Adham Beykikhoshk¹, Ognjen Arandjelovic², Dinh Phung¹, Svetha Venkatesh¹•Institutions (2)

Deakin University¹, University of St Andrews²

01 Jun 2018-Knowledge and Information Systems

TL;DR: A novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension is described and the algorithm is shown to capture well the actual developments in these fields.

...read moreread less

Abstract: In this paper we describe a novel framework for the discovery of the topical content of a data corpus, and the tracking of its complex structural changes across the temporal dimension. In contrast to previous work our model does not impose a prior on the rate at which documents are added to the corpus nor does it adopt the Markovian assumption which overly restricts the type of changes that the model can capture. Our key technical contribution is a framework based on (i) discretization of time into epochs, (ii) epoch-wise topic discovery using a hierarchical Dirichlet process-based model, and (iii) a temporal similarity graph which allows for the modelling of complex topic changes: emergence and disappearance, evolution, splitting, and merging. The power of the proposed framework is demonstrated on two medical literature corpora concerned with the autism spectrum disorder (ASD) and the metabolic syndrome (MetS)—both increasingly important research subjects with significant social and healthcare consequences. In addition to the collected ASD and metabolic syndrome literature corpora which we made freely available, our contribution also includes an extensive empirical analysis of the proposed framework. We describe a detailed and careful examination of the effects that our algorithms’s free parameters have on its output and discuss the significance of the findings both in the context of the practical application of our algorithm as well as in the context of the existing body of work on temporal topic analysis. Our quantitative analysis is followed by several qualitative case studies highly relevant to the current research on ASD and MetS, on which our algorithm is shown to capture well the actual developments in these fields.

...read moreread less

22 citations

Journal Article•DOI•

AdOn HDP-HMM: An Adaptive Online Model for Segmentation and Classification of Sequential Data

[...]

Ava Bargi¹, Richard Yi Da Xu¹, Massimo Piccardi¹•Institutions (1)

University of Technology, Sydney¹

01 Sep 2018-IEEE Transactions on Neural Networks

TL;DR: This paper presents a principled solution for the automated classification of sequential data based on an adaptive online system leveraging Markov switching models and hierarchical Dirichlet process priors, capable of classifying the sequential data over an unlimited number of classes while meeting the memory and delay constraints typical of streaming contexts.

...read moreread less

Abstract: Recent years have witnessed an increasing need for the automated classification of sequential data, such as activities of daily living, social media interactions, financial series, and others. With the continuous flow of new data, it is critical to classify the observations on-the-fly and without being limited by a predetermined number of classes. In addition, a model should be able to update its parameters in response to a possible evolution in the distributions of the classes. This compelling problem, however, does not seem to have been adequately addressed in the literature, since most studies focus on offline classification over predefined class sets. In this paper, we present a principled solution for this problem based on an adaptive online system leveraging Markov switching models and hierarchical Dirichlet process priors. This adaptive online approach is capable of classifying the sequential data over an unlimited number of classes while meeting the memory and delay constraints typical of streaming contexts. In this paper, we introduce an adaptive “learning rate” that is responsible for balancing the extent to which the model retains its previous parameters or adapts to new observations. Experimental results on stationary and evolving synthetic data and two video data sets, TUM Assistive Kitchen and collated Weizmann, show a remarkable performance in terms of segmentation and classification, particularly for sequences from evolutionary distributions and/or those containing previously unseen classes.

...read moreread less

20 citations

Journal Article•DOI•

Latent Dirichlet mixture model

[...]

Jen-Tzung Chien, Chao Hsi Lee¹, Zheng-Hua Tan²•Institutions (2)

National Chiao Tung University¹, Aalborg University²

22 Feb 2018-Neurocomputing

TL;DR: This paper proposes a new latent variable model where latent topics and their proportionals are learned by incorporating the prior based on Dirichlet mixture model, and carries out the inference for LDMM according to the variational Bayes and the collapsed variationalBayes.

...read moreread less

20 citations

Journal Article•DOI•

Multiple Hierarchical Dirichlet Processes for anomaly detection in traffic

[...]

Vagia Kaltsa¹, Alexia Briassouli, Ioannis Kompatsiaris, Michael G. Strintzis¹•Institutions (1)

Aristotle University of Thessaloniki¹

01 Apr 2018-Computer Vision and Image Understanding

TL;DR: Experiments on benchmark datasets containing various scenarios in traffic scenes prove the method’s efficacy and generality, leading to higher accuracy than the current State of the Art (SoA), and at a lower computational cost.

...read moreread less

16 citations

Journal Article•DOI•

Supervised Topic Modeling Using Hierarchical Dirichlet Process-Based Inverse Regression: Experiments on E-Commerce Applications

[...]

Weifeng Li¹, Junming Yin¹, Hsinchsun Chen¹•Institutions (1)

University of Arizona¹

01 Jun 2018-IEEE Transactions on Knowledge and Data Engineering

TL;DR: A novel supervised topic model, Hierarchical Dirichlet Process-based Inverse Regression (HDP-IR) is proposed, which characterizes the corpus with a flexible number of topics, which prove to retain as much predictive information as the original corpus.

...read moreread less

Abstract: The proliferation of e-commerce calls for mining consumer preferences and opinions from user-generated text. To this end, topic models have been widely adopted to discover the underlying semantic themes (i.e., topics). Supervised topic models have emerged to leverage discovered topics for predicting the response of interest (e.g., product quality and sales). However, supervised topic modeling remains a challenging problem because of the need to prespecify the number of topics, the lack of predictive information in topics, and limited scalability. In this paper, we propose a novel supervised topic model, Hierarchical Dirichlet Process-based Inverse Regression (HDP-IR). HDP-IR characterizes the corpus with a flexible number of topics, which prove to retain as much predictive information as the original corpus. Moreover, we develop an efficient inference algorithm capable of examining large-scale corpora (millions of documents or more). Three experiments were conducted to evaluate the predictive performance over major e-commerce benchmark testbeds of online reviews. Overall, HDP-IR outperformed existing state-of-the-art supervised topic models. Particularly, retaining sufficient predictive information improved predictive R-squared by over 17.6 percent; having topic structure flexibility contributed to predictive R-squared by at least 4.1 percent. HDP-IR provides an important step for future study on user-generated texts from a topic perspective.

...read moreread less

16 citations

Posted Content•

Understanding V2V Driving Scenarios through Traffic Primitives

[...]

Wenshuo Wang, Weiyang Zhang, Ding Zhao

27 Jul 2018-arXiv: Learning

TL;DR: Zhang et al. as mentioned in this paper presented a framework of analyzing various encountering behaviors through decomposing driving encounter data into small building blocks, called driving primitives, using nonparametric Bayesian learning (NPBL) approaches, which offers a flexible way to gain an insight into the complex driving encounters without any prerequisite knowledge.

...read moreread less

Abstract: Semantically understanding complex drivers' encountering behavior, wherein two or multiple vehicles are spatially close to each other, does potentially benefit autonomous car's decision-making design. This paper presents a framework of analyzing various encountering behaviors through decomposing driving encounter data into small building blocks, called driving primitives, using nonparametric Bayesian learning (NPBL) approaches, which offers a flexible way to gain an insight into the complex driving encounters without any prerequisite knowledge. The effectiveness of our proposed primitive-based framework is validated based on 976 naturalistic driving encounters, from which more than 4000 driving primitives are learned using NPBL - a sticky HDP-HMM, combined a hidden Markov model (HMM) with a hierarchical Dirichlet process (HDP). After that, a dynamic time warping method integrated with k-means clustering is then developed to cluster all these extracted driving primitives into groups. Experimental results find that there exist 20 kinds of driving primitives capable of representing the basic components of driving encounters in our database. This primitive-based analysis methodology potentially reveals underlying information of vehicle-vehicle encounters for self-driving applications.

...read moreread less

14 citations

Journal Article•DOI•

Dual Sticky Hierarchical Dirichlet Process Hidden Markov Model and Its Application to Natural Language Description of Motions

[...]

Weiming Hu¹, Guodong Tian¹, Yongxin Kang¹, Chunfeng Yuan¹, Stephen J. Maybank² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Birkbeck, University of London²

01 Oct 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The dual sticky hierarchical Dirichlet process hidden Markov model is proposed for mining activities from a collection of time series data such as trajectories and a Bayesian inference method is proposed to decompose a given trajectory into a sequence of atomic activities.

...read moreread less

Abstract: In this paper, a new nonparametric Bayesian model called the dual sticky hierarchical Dirichlet process hidden Markov model (HDP-HMM) is proposed for mining activities from a collection of time series data such as trajectories. All the time series data are clustered. Each cluster of time series data, corresponding to a motion pattern, is modeled by an HMM. Our model postulates a set of HMMs that share a common set of states (topics in an analogy with topic models for document processing), but have unique transition distributions. The number of HMMs and the number of topics are both automatically determined. The sticky prior avoids redundant states and makes our HDP-HMM more effective to model multimodal observations. For the application to motion trajectory modeling, topics correspond to motion activities. The learnt topics are clustered into atomic activities which are assigned predicates. We propose a Bayesian inference method to decompose a given trajectory into a sequence of atomic activities. The sources and sinks in the scene are learnt by clustering endpoints (origins and destinations) of trajectories. The semantic motion regions are learnt using the points in trajectories. On combining the learnt sources and sinks, the learnt semantic motion regions, and the learnt sequence of atomic activities, the action represented by a trajectory can be described in natural language in as automatic a way as possible. The effectiveness of our dual sticky HDP-HMM is validated on several trajectory datasets. The effectiveness of the natural language descriptions for motions is demonstrated on the vehicle trajectories extracted from a traffic scene.

...read moreread less

14 citations

Journal Article•DOI•

Video Event Recognition and Anomaly Detection by Combining Gaussian Process and Hierarchical Dirichlet Process Models

[...]

Michael Ying Yang, Wentong Liao, Yanpeng Cao, Bodo Rosenhahn

01 Apr 2018-Photogrammetric Engineering and Remote Sensing

TL;DR: Li et al. as discussed by the authors presented an unsupervised learning framework for analyzing activities and interactions in surveillance videos, where three levels of video events are connected by Hierarchical Dirichlet Process (HDP) model: low-level visual features, simple atomic activities, and multi-agent interactions.

...read moreread less

Abstract: In this paper, we present an unsupervised learning framework for analyzing activities and interactions in surveillance videos. In our framework, three levels of video events are connected by Hierarchical Dirichlet Process (HDP) model: low-level visual features, simple atomic activities, and multi-agent interactions. Atomic activities are represented as distribution of low-level features, while complicated interactions are represented as distribution of atomic activities. This learning process is unsupervised. Given a training video sequence, low-level visual features are extracted based on optic flow and then clustered into different atomic activities and video clips are clustered into different interactions. The HDP model automatically decides the number of clusters, i.e., the categories of atomic activities and interactions. Based on the learned atomic activities and interactions, a training dataset is generated to train the Gaussian Process (GP) classifier. Then, the trained GP models work in newly captured video to classify interactions and detect abnormal events in real time. Furthermore, the temporal dependencies between video events learned by HDP-Hidden Markov Models (HMM) are effectively integrated into GP classifier to enhance the accuracy of the classification in newly captured videos. Our framework couples the benefits of the generative model (HDP) with the discriminant model (GP). We provide detailed experiments showing that our framework enjoys favorable performance in video event classification in real-time in a crowded traffic scene.

...read moreread less

Journal Article•DOI•

Multimodal Hierarchical Dirichlet Process-Based Active Perception by a Robot.

[...]

Tadahiro Taniguchi¹, Ryo Yoshino¹, Toshiaki Takano²•Institutions (2)

Ritsumeikan University¹, Shizuoka Institute of Science and Technology²

22 May 2018-Frontiers in Neurorobotics

TL;DR: The experimental results show that the method enables the robot to select a set of actions that allow it to recognize target objects quickly and accurately, and can work appropriately even when the number of actions is large and the set of target objects involves objects categorized into multiple classes.

...read moreread less

Abstract: In this paper, we propose an active perception method for recognizing object categories based on the multimodal hierarchical Dirichlet process (MHDP). The MHDP enables a robot to form object categories using multimodal information, e.g., visual, auditory, and haptic information, which can be observed by performing actions on an object. However, performing many actions on a target object requires a long time. In a real-time scenario, i.e., when the time is limited, the robot has to determine the set of actions that is most effective for recognizing a target object. We propose an active perception for MHDP method that uses the information gain (IG) maximization criterion and lazy greedy algorithm. We show that the IG maximization criterion is optimal in the sense that the criterion is equivalent to a minimization of the expected Kullback-Leibler divergence between a final recognition state and the recognition state after the next set of actions. However, a straightforward calculation of IG is practically impossible. Therefore, we derive a Monte Carlo approximation method for IG by making use of a property of the MHDP. We also show that the IG has submodular and non-decreasing properties as a set function because of the structure of the graphical model of the MHDP. Therefore, the IG maximization problem is reduced to a submodular maximization problem. This means that greedy and lazy greedy algorithms are effective and have a theoretical justification for their performance. We conducted an experiment using an upper-torso humanoid robot and a second one using synthetic data. The experimental results show that the method enables the robot to select a set of actions that allow it to recognize target objects quickly and accurately. The numerical experiment using the synthetic data shows that the proposed method can work appropriately even when the number of actions is large and a set of target objects involves objects categorized into multiple classes. The results support our theoretical outcomes.

...read moreread less

Journal Article•DOI•

Bayesian Nonparametric Learning for Hierarchical and Sparse Topics

[...]

Jen-Tzung Chien¹•Institutions (1)

National Chiao Tung University¹

01 Feb 2018-IEEE Transactions on Audio, Speech, and Language Processing

TL;DR: The proposed nIBP reduces the error rate of nCRP and nhDP by 18% and 8% on Reuters task for document classification, respectively and improves the variety of topic representation for heterogeneous documents.

...read moreread less

Abstract: This paper presents the Bayesian nonparametric (BNP) learning for hierarchical and sparse topics from natural language. Traditionally, the Indian buffet process provides the BNP prior on a binary matrix for an infinite latent feature model consisting of a flat layer of topics. The nested model paves an avenue to construct a tree model instead of a flat-layer model. This paper presents the nested Indian buffet process (nIBP) to achieve the sparsity and flexibility in topic model where the model complexity and topic hierarchy are learned from the groups of words. The mixed membership modeling is conducted by representing a document using the tree nodes or dishes that a document or a customer chooses according to the nIBP scenario. A tree stick-breaking process is implemented to select topic weights from a subtree for flexible topic modeling. Such an nIBP relaxes the constraint of adopting a single tree path in the nested Chinese restaurant process (nCRP) and, therefore, improves the variety of topic representation for heterogeneous documents. A Gibbs sampling procedure is developed to infer the nIBP topic model. Compared to the nested hierarchical Dirichlet process (nhDP), the compactness of the estimated topics in a tree using nIBP is improved. Experimental results show that the proposed nIBP reduces the error rate of nCRP and nhDP by 18% and 8% on Reuters task for document classification, respectively.

...read moreread less

Journal Article•DOI•

Game Changers: Detecting Shifts in Overdispersed Count Data

[...]

Matthew Blackwell¹•Institutions (1)

Harvard University¹

01 Apr 2018-Political Analysis

TL;DR: A Bayesian model for detecting changepoints in a time series of overdispersed counts, such as contributions to candidates over the course of a campaign or counts of terrorist violence, which incorporates a hierarchical Dirichlet process prior to estimate the number of changepoints as well as their location.

...read moreread less

Abstract: In this paper, I introduce a Bayesian model for detecting changepoints in a time series of overdispersed counts, such as contributions to candidates over the course of a campaign or counts of terrorist violence. To avoid having to specify the number of changepoint ex ante, this model incorporates a hierarchical Dirichlet process prior to estimate the number of changepoints as well as their location. This allows researchers to discover salient structural breaks and perform inference on the number of such breaks in a given time series. I demonstrate the usefulness of the model with applications to campaign contributions in the 2012 U.S. Republican presidential primary and incidences of global terrorism from 1970 to 2015.

...read moreread less

Journal Article•DOI•

Bayesian nonparametric inference beyond the Gibbs-type framework

[...]

Federico Camerlenghi¹, Federico Camerlenghi², Federico Camerlenghi³, Antonio Lijoi³, Antonio Lijoi¹, Igor Prünster³ - Show less +2 more•Institutions (3)

Collegio Carlo Alberto¹, University of Milano-Bicocca², Bocconi University³

01 Dec 2018-Scandinavian Journal of Statistics

TL;DR: This paper presents a nonparametric hierarchical structure based on transformations of completely random measures, which extends the popular hierarchical Dirichlet process and derives the induced partition structure and the prediction rules and characterize the posterior distribution.

...read moreread less

Abstract: The definition and investigation of general classes of nonparametric priors has recently been an active research line in Bayesian statistics. Among the various proposals, the Gibbs‐type family, which includes the Dirichlet process as a special case, stands out as the most tractable class of nonparametric priors for exchangeable sequences of observations. This is the consequence of a key simplifying assumption on the learning mechanism, which, however, has justification except that of ensuring mathematical tractability. In this paper, we remove such an assumption and investigate a general class of random probability measures going beyond the Gibbs‐type framework. More specifically, we present a nonparametric hierarchical structure based on transformations of completely random measures, which extends the popular hierarchical Dirichlet process. This class of priors preserves a good degree of tractability, given that we are able to determine the fundamental quantities for Bayesian inference. In particular, we derive the induced partition structure and the prediction rules and characterize the posterior distribution. These theoretical results are also crucial to devise both a marginal and a conditional algorithm for posterior inference. An illustration concerning prediction in genomic sequencing is also provided.

...read moreread less

Journal Article•DOI•

Hyperspectral Image Restoration under Complex Multi-Band Noises

[...]

Zongsheng Yue, Deyu Meng, Yongqing Sun, Qian Zhao

14 Oct 2018-Remote Sensing

TL;DR: This study elaborately constructs a new HSI restoration technique, aimed at more faithfully and comprehensively taking such noise characteristics into account, through a two-level hierarchical Dirichlet process (HDP) to model the HSI noise structure.

...read moreread less

Abstract: Hyperspectral images (HSIs) are always corrupted by complicated forms of noise during the acquisition process, such as Gaussian noise, impulse noise, stripes, deadlines and so on. Specifically, different bands of the practical HSIs generally contain different noises of evidently distinct type and extent. While current HSI restoration methods give less consideration to such band-noise-distinctness issues, this study elaborately constructs a new HSI restoration technique, aimed at more faithfully and comprehensively taking such noise characteristics into account. Particularly, through a two-level hierarchical Dirichlet process (HDP) to model the HSI noise structure, the noise of each band is depicted by a Dirichlet process Gaussian mixture model (DP-GMM), in which its complexity can be flexibly adapted in an automatic manner. Besides, the DP-GMM of each band comes from a higher level DP-GMM that relates the noise of different bands. The variational Bayes algorithm is also designed to solve this model, and closed-form updating equations for all involved parameters are deduced. The experiment indicates that, in terms of the mean peak signal-to-noise ratio (MPSNR), the proposed method is on average 1 dB higher compared with the existing state-of-the-art methods, as well as performing better in terms of the mean structural similarity index (MSSIM) and Erreur Relative Globale Adimensionnelle de Synthese (ERGAS).

...read moreread less

Proceedings Article•DOI•

WDMTI: Wireless Device Manufacturer and Type Identification Using Hierarchical Dirichlet Process

[...]

Lingjing Yu¹, Tao Liu, Zhaoyu Zhou¹, Yujia Zhu, Qingyun Liu¹, Jianlong Tan¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

01 Oct 2018

TL;DR: A wireless device manufacturer and type identification (WDMTI) system that is both scalable and accurate, and capable of adapting to unknown types of devices on-the-fly is presented.

...read moreread less

Abstract: Wireless devices have been widely adopted across all domains. With the convenience brought by wireless communication technology, increasing number of conventional (wired) devices are evolving to become wireless. However, significant security issues arise with the popularity of wireless devices. To start an attack, the adversary usually performs a network reconnaissance to discover exposed devices, identify device manufacturers and types, and then scan for vulnerabilities. From the defense side, network administrators are expected to identify the potential vulnerabilities/risks and enforce Network Access Control (or Network Admission Control, NAC) on all the connecting devices. To do this, it is essential to accurately identify the make/model/type of each device that attempts to connect to the network, e.g., MacBooks, Samsung smart phones (Android), Amazon kindles, DLink surveillance cameras, TP-Link smart plugs, etc. In this paper, we present a novel approach, namely WDMTI, for the identification of wireless device manufacturer and type. We tackle the challenge from two aspects: the features and the classification model. First, we claim that it is critical to discover the device manufacturer and type as soon as the device requests to join the WLAN, and it is unrealistic to make other assumptions on the status of the device, e.g., assuming that the device is booting up or initializing a new connection to corresponding servers/clouds. We primarily depend on the features extracted from the network connection phase, while features from device booting are considered "bonus". In particular, we propose to utilize features from the raw HDCP packets, which is shown to be sufficient for device manufacturer and type recognition with high accuracy. Meanwhile, in the WDMTI system, we employ the Hierarchical Dirichlet Process (HDP), which is a nonparametric Bayesian model for grouped data. HDP allows new groups to be introduced with new data being added, i.e. previously unknown devices connect to the network and the extracted features receive new labels. The WDMTI mechanism is dynamically retrained on-line, instead of requiring a time-consuming off-line retraining process. Our experiments show that WDMTI identifies known types of devices with average accuracy of 0.89, and new types of devices with average accuracy of 0.96, both of which is higher than the state-of-art approaches. In summary, we present a wireless device manufacturer and type identification (WDMTI) system that is both scalable and accurate, and capable of adapting to unknown types of devices on-the-fly.

...read moreread less

Book Chapter•DOI•

Unsupervised Bioacoustic Segmentation by Hierarchical Dirichlet Process Hidden Markov Model

[...]

Vincent Roger¹, Marius Bartcus¹, Faicel Chamroukhi², Hervé Glotin¹•Institutions (2)

Aix-Marseille University¹, University of Caen Lower Normandy²

01 Mar 2018

TL;DR: An automatic segmentation model for real-world bioacoustic scenes in order to infer hidden states referred as song units based on the Hierarchical Dirichlet Process (HDP-HMM), a Bayesian non-parametric (BNP) model is proposed to tackle this challenging problem.

...read moreread less

Abstract: Bioacoustics is powerful for monitoring biodiversity. We investigate in this paper automatic segmentation model for real-world bioacoustic scenes in order to infer hidden states referred as song units. Nevertheless, the number of these acoustic units is often unknown, unlike in human speech recognition. Hence, we propose a bioacoustic segmentation based on the Hierarchical Dirichlet Process (HDP-HMM), a Bayesian non-parametric (BNP) model to tackle this challenging problem. Hence, we focus our approach on unsupervised learning from bioacoustic sequences. It consists in simultaneously finding the structure of hidden song units, and automatically infers the unknown number of the hidden states. We investigate two real bioacoustic scenes: whale, and multi-species birds songs. We learn the models using Markov-Chain Monte Carlo (MCMC) sampling techniques on Mel Frequency Cepstral Coefficients (MFCC). Our results, scored by bioacoustic expert, show that the model generates correct song unit segmentation. This study demonstrates new insights for unsupervised analysis of complex soundscapes and illustrates their potential of chunking non-human animal signals into structured units. This can yield to new representations of the calls of a target species, but also to the structuration of inter-species calls. It gives to experts a tracktable approach for efficient bioacoustic research as requested in Kershenbaum et al. (Biol Rev 91(1):13–52, 2016).

...read moreread less

Proceedings Article•DOI•

Sequence Pattern Extraction by Segmenting Time Series Data Using GP-HSMM with Hierarchical Dirichlet Process

[...]

Masatoshi Nagano¹, Tomoaki Nakamura¹, Takayuki Nagai¹, Daichi Mochihashi, Ichiro Kobayashi², Masahide Kaneko¹ - Show less +2 more•Institutions (2)

University of Electro-Communications¹, Ochanomizu University²

01 Oct 2018

TL;DR: The proposed method for dividing continuous time-series data into segments in an unsupervised manner based on a hidden semi-Markov model with Gaussian process (GP-HSMM) is extended to a nonparametric Bayesian model by introducing a hierarchical Dirichlet process (HDP), and the parameters of the proposed HDP-GP- HSMM are estimated by applying slice sampling.

...read moreread less

Abstract: Humans recognize perceived continuous information by dividing it into significant segments such as words and unit motions. We believe that such unsupervised segmentation is also an important ability that robots need to learn topics such as language and motions. Hence, in this paper, we propose a method for dividing continuous time-series data into segments in an unsupervised manner. To this end, we proposed a method based on a hidden semi-Markov model with Gaussian process (GP-HSMM). If Gaussian processes, which are nonparametric models, are used, unit motion patterns can be extracted from complicated continuous motion. However, this approach requires the number of classes of segments in the time-series data in advance. To overcome this problem, in this paper, we extend GP-HSMM to a nonparametric Bayesian model by introducing a hierarchical Dirichlet process (HDP) and propose the hierarchical Dirichlet processes-Gaussian process-hidden semi-Markov model (HDP-GP-HSMM). In the nonparametric Bayesian model, an infinite number of classes is assumed and it becomes difficult to estimate the parameters naively. Instead, the parameters of the proposed HDP-GP-HSMM are estimated by applying slice sampling. In the experiments, we use various synthetic and motion-capture data to show that our proposed model can estimate a more correct number of classes and achieve more accurate segmentation than baseline methods.

...read moreread less

Book Chapter•DOI•

Generation of Author Topic Models Using LDA

[...]

G. S. Mahalakshmi¹, G. Muthu Selvi¹, S. Sendhilkumar¹•Institutions (1)

Anna University¹

01 Jan 2018

TL;DR: This paper proposes the generation of ATMs by applying Latent Dirichlet Allocation (LDA) to represent the research ideas using topic distributions to substantiate the measuring of author contributions.

...read moreread less

Abstract: Copyright and ownership of research ideas is questionable as to which author the credit should be attached to. Mining author contributions has to be approached more semantically to solve this issue. Representing the research ideas using topic distributions substantiate the measuring of author contributions. Author Topic Models (ATM) are generally obtained by applying topic modeling approaches over an author’s research articles. ATMs form the blueprints of an author. Given a research paper and the blueprints of it’s’ authors, identifying the contribution of every author in the article becomes easy. This paper proposes the generation of ATMs by applying Latent Dirichlet Allocation (LDA).

...read moreread less

Proceedings Article•DOI•

Analysis of the Health Status of Railway Vehicle Bearings Based on Improved HDP-HMM

[...]

Zaidong Sun¹, Ning Zhang¹•Institutions (1)

Beijing Jiaotong University¹

01 Nov 2018

TL;DR: A bearing health status analysis model based on improved HDP-HMM that uses the nonparametric properties of the hierarchical Dirichlet process to infer the number of hidden states, compensates for the defects of HMM, and utilizes Bayesian Optimization and Mann-Kendall criteria optimize its hyperparameters.

...read moreread less

Abstract: In order to solve the problem that state number of the hidden Markov model (HMM) must be specified in advance and the convergence result of HDP-HMM is sensitive to hyperparameters, a bearing health status analysis model based on improved HDP-HMM is proposed in this paper. Based on the Hierarchical Dirichlet Process (HDP) and Hidden Markov Models, the model uses the nonparametric properties of the hierarchical Dirichlet process to infer the number of hidden states, compensates for the defects of HMM, and utilizes Bayesian Optimization and Mann-Kendall criteria optimize its hyperparameters. At the same time, considering the ergodic topology of the traditional HDP-HMM is not suitable for the timing monitoring data of the bearings, we convert the HDPHMM topology into a left-to-right mode, which is more suitable for the needs of health status analysis. In addition, taking the nonlinear characteristics of the performance degradation process of railway vehicle bearings into consideration, we use GKPCA (greedy kernel principal components analysis) to extract features of bearing degradation. Finally, the model is verified by using the monitoring data collected during the train running such as the bearing temperatures. The experimental results show that the proposed HDP-HMM can effectively identify multiple health status of the railway vehicle bearings and has reliable performance. It provides an important basis for the state repair of the railway vehicles bearings under actual conditions.

...read moreread less

Proceedings Article•DOI•

Homogeneity-Based Transmissive Process to Model True and False News in Social Networks

[...]

Jooyeon Kim¹, Dongkwan Kim¹, Alice Oh¹•Institutions (1)

KAIST¹

16 Nov 2018-arXiv: Computers and Society

TL;DR: A novel Bayesian nonparametric model that incorporates homogeneity of news stories as the key component that regulates the topical similarity between the posting and sharing users' topical interests and is trained on a real-world social network dataset.

...read moreread less

Abstract: An overwhelming number of true and false news stories are posted and shared in social networks, and users diffuse the stories based on multiple factors. Diffusion of news stories from one user to another depends not only on the stories' content and the genuineness but also on the alignment of the topical interests between the users. In this paper, we propose a novel Bayesian nonparametric model that incorporates homogeneity of news stories as the key component that regulates the topical similarity between the posting and sharing users' topical interests. Our model extends hierarchical Dirichlet process to model the topics of the news stories and incorporates Bayesian Gaussian process latent variable model to discover the homogeneity values. We train our model on a real-world social network dataset and find homogeneity values of news stories that strongly relate to their labels of genuineness and their contents. Finally, we show that the supervised version of our model predicts the labels of news stories better than the state-of-the-art neural network and Bayesian models.

...read moreread less

Patent•

Method for Dynamic Simulation Parameter Calibration by Machine Learning

[...]

Moon Il Chul

07 Jun 2018

TL;DR: In this article, a method for calibrating a dynamic simulation parameter based on machine learning performed by a computing device is presented, which comprises the steps of: generating a set of the N number of parameter hypotheses; obtaining result values corresponding to each of the n number of hypotheses; calculating likelihoods corresponding to the result values; applying a Hierarchical Dirichlet Process Hidden Semi-Markov Model (HDP-HSMM); obtaining regime search result values.

...read moreread less

Abstract: The present invention relates to a method for calibrating a dynamic simulation parameter based on machine learning performed by a computing device. The method comprises the steps of: generating a set of the N number of parameter hypotheses; obtaining result values corresponding to each of the N number of parameter hypotheses; calculating likelihoods corresponding to each of the result values; applying a Hierarchical Dirichlet Process Hidden Semi-Markov Model (HDP-HSMM); obtaining regime search result values; obtaining maximum likelihood estimation data for each regime by applying a maximum likelihood estimation method based on the regime search result values; and determining a maximum likelihood parameter based on the maximum likelihood estimation data.

...read moreread less

Proceedings Article•DOI•

Data Extraction and Integration for Scholar Recommendation System

[...]

Jaydeep Chakraborty¹, Gurusrikar Thopugunta¹, Srividya K. Bansal•Institutions (1)

Arizona State University¹

01 Jan 2018

TL;DR: A content-based mining approach to go through all relevant resources, extract required information, and use it to recommend a list of scholars based on student's area of interest, and presents a comparative analysis of the following topic model algorithms.

...read moreread less

Abstract: Recommendation systems have been an integral part of massive open online courses (MOOCs). With a large amount of availability of data and resources, recommending scholars and professors through general reviews and academic advisor applications has become a tiresome job. Finding professors and scholars relevant to a student's area of interest involves a combination of multiple factors like field of study, depth of research area, research background of professors, ongoing research opportunities, etc. As recommending scholars and professors deals with so many different factors, it is very complex and unreliable when done manually. In this paper, we present a content-based mining approach to go through all relevant resources, extract required information, and use it to recommend a list of scholars based on student's area of interest. For our experimental model, we gathered information about a number of professors at our institution from various web resources such as IEEE, Springer, ACM, Sciencedirect, arxiv and department website. We use topic modeling and clustering algorithms in our content-based mining approach. We present a comparative analysis of the following topic model algorithms: latent dirichlet allocation (LDA), hierarchical dirichlet process(HDP), latent semantic analysis (LSA) and clustering techniques: k-means and hierarchical clustering in determining the most accurate recommendation list of professors or scholars.

...read moreread less

Proceedings Article•DOI•

Data-Driven Automatic Calibration for Validation of Agent-Based Social Simulations

[...]

Il-Chul Moon¹, Dongjun Kim¹, Tae-Sub Yun¹, Jang Won Bae², Dong-Oh Kang², Euihyun Paik² - Show less +2 more•Institutions (2)

KAIST¹, Electronics and Telecommunications Research Institute²

07 Oct 2018

TL;DR: This paper presents a noble framework of augmenting machine learning techniques to agent-based models for better calibration and validation, and is generally usable in any agent based models with temporal macro parameters, which could be true in many existing models.

...read moreread less

Abstract: Though agent based models are used in many domains, the usage have been either very abstract model for conceptual experiments or very detailed models with huge engineering efforts in their modeling and calibration. One reason of this limited usage comes from the difficulties in calibrating and validating the model with observed data because the models are very generative in its nature with many hand-picked parameters. This paper presents a noble framework of augmenting machine learning techniques to agent-based models for better calibration and validation. The framework identifies periods of deviation between the simulation and the observation with hierarchical Dirichlet process hidden Markov Model, or HDP-HMM, and the framework automatically calibrates the temporal macro parameters by searching parameter spaces with more likelihoods of validation. After iterations of this framework, our experiments demonstrated sucessful validations on a hypothestical simple segregation model as well as a real world real estate model. This framework is generally usable in any agent based models with temporal macro parameters, which could be true in many existing models.

...read moreread less

Posted Content•

Hierarchical Dirichlet Process-based Open Set Recognition.

[...]

Chuanxing Geng, Songcan Chen

29 Jun 2018

TL;DR: This paper proposes a novel hierarchical dirichlet process-based classification framework for open set recognition (HDP-OSR) where new categories' samples unseen in training appear during testing, and reconsiders this problem from the perspective of generative model.

...read moreread less

Abstract: In this paper, we proposed a novel hierarchical dirichlet process-based classification framework for open set recognition (HDP-OSR) where new categories' samples unseen in training appear during testing. Unlike the existing methods which deal with this problem from the perspective of discriminative model, we reconsider this problem from the perspective of generative model. We model each known class data in training set as a group in hierarchical dirichlet process (HDP) while the testing set as a whole is treated in the same way, then co-clustering all the groups under the HDP framework. Based on the properties of HDP, our HDP-OSR does not overly depend on training samples and can achieve adaptive change as the data changes. More precisely, HDP-OSR can automatically reserve space for unknown categories while it can also discover new categories, meaning it naturally adapts to the open set recognition scenario. Furthermore, treating the testing set as a whole makes our framework take the correlations among the testing samples into account whereas the existing methods obviously ignore this information. Experimental results on a set of benchmark data sets indicate the validity of our learning framework.

...read moreread less

Book Chapter•DOI•

Bayesian Complex Network Community Detection Using Nonparametric Topic Model

[...]

Ruimin Zhu¹, Wenxin Jiang¹•Institutions (1)

Northwestern University¹

11 Dec 2018

TL;DR: This work conducts random walks on the network and applies the Hierarchical Dirichlet Process topic model on the random walk data to explore the community structure of the network.

...read moreread less

Abstract: Network community detection is an important area of research. In this work, we propose a novel nonparametric probabilistic model for this task. We conduct random walks on the network and apply the Hierarchical Dirichlet Process topic model on the random walk data to explore the community structure of the network. Our work is among the very few endeavors in nonparametric probabilistic modeling in complex networks. Our proposed model is highly flexible. The nonparametric nature allows it to automatically detect the number of communities without prior knowledge. Our model is also quite powerful. It demonstrates significant improvements compared to other models in several experiments.

...read moreread less

Proceedings Article•DOI•

Modelling of Topic from Hindi Corpus using Word2Vec

[...]

Sabitra Sankalp Panigrahi¹, Narayan Panigrahi, Biswajit Paul•Institutions (1)

Indian Institutes of Information Technology¹

01 Sep 2018

TL;DR: A novel method for detection of topics from Hindi corpus is designed and applied through clustering of semantic space; generated using word2vec word embedding and the results obtained are encouraging.

...read moreread less

Abstract: Detection of theme word or key word describing a collection of words is an important text processing method in natural language processing known as topic detection (TD). An accurate topic detection method depends upon goodness of topic modelling technique. There are several topic modelling techniques implemented successfully, some prominent names are LSA, LDA, Hierarchical Dirichlet Process, Non-Negative Matrix Factorization. Most topic modelling/detection techniques applied well for English corpus but very little work is available when it comes to Indian Languages. In this paper we have designed and applied a novel method for detection of topics from Hindi corpus. The proposed method discovers topics through clustering of semantic space; generated using word2vec word embedding. The results obtained are encouraging.

...read moreread less

Book Chapter•DOI•

Bayesian Nonparametric Sparse Vector Autoregressive Models

[...]

Monica Billio, Roberto Casarin, Luca Rossini

01 Jan 2018

TL;DR: A hierarchical Dirichlet process prior (DPP) is proposed for SUR models, which allows shrinkage of coefficients toward multiple locations and allows for clustering of the coefficients.

...read moreread less

Abstract: Seemingly unrelated regression (SUR) models are useful in studying the interactions among economic variables. In a high dimensional setting, these models require a large number of parameters to be estimated and suffer of inferential problems. To avoid overparametrization issues, we propose a hierarchical Dirichlet process prior (DPP) for SUR models, which allows shrinkage of coefficients toward multiple locations. We propose a two-stage hierarchical prior distribution, where the first stage of the hierarchy consists in a lasso conditionally independent prior of the Normal-Gamma family for the coefficients. The second stage is given by a random mixture distribution, which allows for parameter parsimony through two components: the first is a random Dirac point-mass distribution, which induces sparsity in the coefficients; the second is a DPP, which allows for clustering of the coefficients.

...read moreread less

Posted Content•

Video Event Recognition and Anomaly Detection by Combining Gaussian Process and Hierarchical Dirichlet Process Models

[...]

Michael Ying Yang, Wentong Liao, Yanpeng Cao, Bodo Rosenhahn

09 Feb 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This framework couples the benefits of the generative model (HDP) with the discriminant model (GP) and enjoys favorable performance in video event classification in real-time in a crowded traffic scene.

...read moreread less

Abstract: In this paper, we present an unsupervised learning framework for analyzing activities and interactions in surveillance videos. In our framework, three levels of video events are connected by Hierarchical Dirichlet Process (HDP) model: low-level visual features, simple atomic activities, and multi-agent interactions. Atomic activities are represented as distribution of low-level features, while complicated interactions are represented as distribution of atomic activities. This learning process is unsupervised. Given a training video sequence, low-level visual features are extracted based on optic flow and then clustered into different atomic activities and video clips are clustered into different interactions. The HDP model automatically decide the number of clusters, i.e. the categories of atomic activities and interactions. Based on the learned atomic activities and interactions, a training dataset is generated to train the Gaussian Process (GP) classifier. Then the trained GP models work in newly captured video to classify interactions and detect abnormal events in real time. Furthermore, the temporal dependencies between video events learned by HDP-Hidden Markov Models (HMM) are effectively integrated into GP classifier to enhance the accuracy of the classification in newly captured videos. Our framework couples the benefits of the generative model (HDP) with the discriminant model (GP). We provide detailed experiments showing that our framework enjoys favorable performance in video event classification in real-time in a crowded traffic scene.

...read moreread less