scispace - formally typeset
Search or ask a question

Showing papers on "Markov chain published in 2009"


Journal ArticleDOI
TL;DR: This work considers approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non‐Gaussian response variables and can directly compute very accurate approximations to the posterior marginals.
Abstract: Structured additive regression models are perhaps the most commonly used class of models in statistical applications. It includes, among others, (generalized) linear models, (generalized) additive models, smoothing spline models, state space models, semiparametric regression, spatial and spatiotemporal models, log-Gaussian Cox processes and geostatistical and geoadditive models. We consider approximate Bayesian inference in a popular subset of structured additive regression models, latent Gaussian models, where the latent field is Gaussian, controlled by a few hyperparameters and with non-Gaussian response variables. The posterior marginals are not available in closed form owing to the non-Gaussian response variables. For such models, Markov chain Monte Carlo methods can be implemented, but they are not without problems, in terms of both convergence and computational time. In some practical applications, the extent of these problems is such that Markov chain Monte Carlo sampling is simply not an appropriate tool for routine analysis. We show that, by using an integrated nested Laplace approximation and its simplified version, we can directly compute very accurate approximations to the posterior marginals. The main benefit of these approximations is computational: where Markov chain Monte Carlo algorithms need hours or days to run, our approximations provide more precise estimates in seconds or minutes. Another advantage with our approach is its generality, which makes it possible to perform Bayesian analysis in an automatic, streamlined way, and to compute model comparison criteria and various predictive measures so that models can be compared and the model under study can be challenged.

4,164 citations


Proceedings Article
15 Apr 2009
TL;DR: A new learning algorithm for Boltzmann machines that contain many layers of hidden variables that is made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass.
Abstract: We present a new learning algorithm for Boltzmann machines that contain many layers of hidden variables Data-dependent expectations are estimated using a variational approximation that tends to focus on a single mode, and dataindependent expectations are approximated using persistent Markov chains The use of two quite different techniques for estimating the two types of expectation that enter into the gradient of the log-likelihood makes it practical to learn Boltzmann machines with multiple hidden layers and millions of parameters The learning can be made more efficient by using a layer-by-layer “pre-training” phase that allows variational inference to be initialized with a single bottomup pass We present results on the MNIST and NORB datasets showing that deep Boltzmann machines learn good generative models and perform well on handwritten digit and visual object recognition tasks

2,221 citations


Journal ArticleDOI
TL;DR: Results indicate that IS-NMF correctly captures the semantics of audio and is better suited to the representation of music signals than NMF with the usual Euclidean and KL costs.
Abstract: This letter presents theoretical, algorithmic, and experimental results about nonnegative matrix factorization (NMF) with the Itakura-Saito (IS) divergence. We describe how IS-NMF is underlaid by a well-defined statistical model of superimposed gaussian components and is equivalent to maximum likelihood estimation of variance parameters. This setting can accommodate regularization constraints on the factors through Bayesian priors. In particular, inverse-gamma and gamma Markov chain priors are considered in this work. Estimation can be carried out using a space-alternating generalized expectation-maximization (SAGE) algorithm; this leads to a novel type of NMF algorithm, whose convergence to a stationary point of the IS cost function is guaranteed. We also discuss the links between the IS divergence and other cost functions used in NMF, in particular, the Euclidean distance and the generalized Kullback-Leibler (KL) divergence. As such, we describe how IS-NMF can also be performed using a gradient multiplicative algorithm (a standard algorithm structure in NMF) whose convergence is observed in practice, though not proven. Finally, we report a furnished experimental comparative study of Euclidean-NMF, KL-NMF, and IS-NMF algorithms applied to the power spectrogram of a short piano sequence recorded in real conditions, with various initializations and model orders. Then we show how IS-NMF can successfully be employed for denoising and upmix (mono to stereo conversion) of an original piece of early jazz music. These experiments indicate that IS-NMF correctly captures the semantics of audio and is better suited to the representation of music signals than NMF with the usual Euclidean and KL costs.

1,200 citations


Journal ArticleDOI
TL;DR: Computer simulations indicate that the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters during a run perform very well compared to nonadaptive algorithms, even in high dimension.
Abstract: We investigate the use of adaptive MCMC algorithms to automatically tune the Markov chain parameters during a run. Examples include the Adaptive Metropolis (AM) multivariate algorithm of Haario, Saksman, and Tamminen (2001), Metropolis-within-Gibbs algorithms for nonconjugate hierarchical models, regionally adjusted Metropolis algorithms, and logarithmic scalings. Computer simulations indicate that the algorithms perform very well compared to nonadaptive algorithms, even in high dimension.

1,054 citations


Journal ArticleDOI
TL;DR: The DREAM scheme significantly enhances the applicability of MCMC simulation to complex, multi-modal search problems andErgodicity of the algorithm is proved, and various examples involving nonlinearity, high-dimensionality, and multimodality show that DREAM is generally superior to other adaptive MCMC sampling approaches.
Abstract: Markov chain Monte Carlo (MCMC) methods have found widespread use in many fields of study to estimate the average properties of complex systems, and for posterior inference in a Bayesian framework. Existing theory and experiments prove convergence of well constructed MCMC schemes to the appropriate limiting distribution under a variety of different conditions. In practice, however this convergence is often observed to be disturbingly slow. This is frequently caused by an inappropriate selection of the proposal distribution used to generate trial moves in the Markov Chain. Here we show that significant improvements to the efficiency of MCMC simulation can be made by using a self-adaptive Differential Evolution learning strategy within a population-based evolutionary framework. This scheme, entitled DiffeRential Evolution Adaptive Metropolis or DREAM, runs multiple different chains simultaneously for global exploration, and automatically tunes the scale and orientation of the proposal distribution in randomized subspaces during the search. Ergodicity of the algorithm is proved, and various examples involving nonlinearity, high-dimensionality, and multimodality show that DREAM is generally superior to other adaptive MCMC sampling approaches. The DREAM scheme significantly enhances the applicability of MCMC simulation to complex, multi-modal search problems.

1,004 citations


Journal ArticleDOI
TL;DR: The influence of the network characteristics on the virus spread is analyzed in a new-the N -intertwined Markov chain-model, whose only approximation lies in the application of mean field theory.
Abstract: The influence of the network characteristics on the virus spread is analyzed in a new-the N -intertwined Markov chain-model, whose only approximation lies in the application of mean field theory. The mean field approximation is quantified in detail. The N -intertwined model has been compared with the exact 2N-state Markov model and with previously proposed ldquohomogeneousrdquo or ldquolocalrdquo models. The sharp epidemic threshold tauc , which is a consequence of mean field theory, is rigorously shown to be equal to tauc = 1/(lambdamax(A)) , where lambdamax(A) is the largest eigenvalue-the spectral radius-of the adjacency matrix A . A continued fraction expansion of the steady-state infection probability at node j is presented as well as several upper bounds.

1,000 citations


Proceedings Article
07 Dec 2009
TL;DR: A unified framework for establishing consistency and convergence rates for regularized M-estimators under high-dimensional scaling is provided and one main theorem is state and shown how it can be used to re-derive several existing results, and also to obtain several new results.
Abstract: High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions). In such settings, a general approach to estimation is to solve a regularized convex program (known as a regularized M-estimator) which combines a loss function (measuring how well the model fits the data) with some regularization function that encourages the assumed structure. The goal of this paper is to provide a unified framework for establishing consistency and convergence rates for such regularized M-estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive several existing results, and also to obtain several new results on consistency and convergence rates. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure the corresponding regularized M-estimators have fast convergence rates.

974 citations


Book
28 Apr 2009
TL;DR: The model Likelihood evaluation Parameter estimation by maximum likelihood Model checking Inferring the underlying state Models for a heterogeneous group of subjects Other modifications or extensions Application to caterpillar feeding behavior appear at the end of most chapters.
Abstract: MODEL STRUCTURE, PROPERTIES, AND METHODS Mixture Distributions and Markov Chains Introduction Independent mixture models Markov chains Hidden Markov Models: Definition and Properties A simple hidden Markov model The basics The likelihood Estimation by Direct Maximization of the Likelihood Introduction Scaling the likelihood computation Maximization subject to constraints Other problems Example: earthquakes Standard errors and confidence intervals Example: parametric bootstrap Estimation by the EM Algorithm Forward and backward probabilities The EM algorithm Examples of EM applied to Poisson HMMs Discussion Forecasting, Decoding, and State Prediction Conditional distributions Forecast distributions Decoding State prediction Model Selection and Checking Model selection by AIC and BIC Model checking with pseudo-residuals Examples Discussion Bayesian Inference for Poisson HMMs Applying the Gibbs sampler to Poisson HMMs Bayesian estimation of the number of states Example: earthquakes Discussion Extensions of the Basic Hidden Markov Model Introduction HMMs with general univariate state-dependent distribution HMMs based on a second-order Markov chain HMMs for multivariate series Series which depend on covariates Models with additional dependencies APPLICATIONS Epileptic Seizures Introduction Models fitted Model checking by pseudo-residuals Eruptions of the Old Faithful Geyser Introduction Binary time series of short and long eruptions Normal HMMs for durations and waiting times Bivariate model for durations and waiting times Drosophila Speed and Change of Direction Introduction Von Mises distributions Von Mises HMMs for the two subjects Circular autocorrelation functions Bivariate model Wind Direction at Koeberg Introduction Wind direction as classified into 16 categories Wind direction as a circular variable Models for Financial Series Thinly traded shares Multivariate HMM for returns on four shares Stochastic volatility models Births at Edendale Hospital Introduction Models for the proportion Caesarean Models for the total number of deliveries Conclusion Cape Town Homicides and Suicides Introduction Firearm homicides as a proportion of all homicides, suicides, and legal intervention homicides The number of firearm homicides Firearm homicide and suicide proportions Proportion in each of the five categories Animal-Behavior Model with Feedback Introduction The model Likelihood evaluation Parameter estimation by maximum likelihood Model checking Inferring the underlying state Models for a heterogeneous group of subjects Other modifications or extensions Application to caterpillar feeding behavior Discussion Appendix A: Examples of R code Stationary Poisson HMM, numerical maximization More on Poisson HMMs, including EM Bivariate normal state-dependent distributions Categorical HMM, constrained optimization Appendix B: Some Proofs Factorization needed for forward probabilities Two results for backward probabilities Conditional independence of Xt1 and XTt+1 References Author Index Subject Index Exercises appear at the end of most chapters.

876 citations


Journal ArticleDOI
TL;DR: It is shown that the PSPACE upper bounds cannot be substantially improved without a breakthrough on long standing open problems: the square-root sum problem and an arithmetic circuit decision problem that captures P-time on the unit-cost rational arithmetic RAM model.
Abstract: We define Recursive Markov Chains (RMCs), a class of finitely presented denumerable Markov chains, and we study algorithms for their analysis. Informally, an RMC consists of a collection of finite-state Markov chains with the ability to invoke each other in a potentially recursive manner. RMCs offer a natural abstract model for probabilistic programs with procedures. They generalize, in a precise sense, a number of well-studied stochastic models, including Stochastic Context-Free Grammars (SCFG) and Multi-Type Branching Processes (MT-BP).We focus on algorithms for reachability and termination analysis for RMCs: what is the probability that an RMC started from a given state reaches another target state, or that it terminatesq These probabilities are in general irrational, and they arise as (least) fixed point solutions to certain (monotone) systems of nonlinear equations associated with RMCs. We address both the qualitative problem of determining whether the probabilities are 0, 1 or in-between, and the quantitative problems of comparing the probabilities with a given bound, or approximating them to desired precision.We show that all these problems can be solved in PSPACE using a decision procedure for the Existential Theory of Reals. We provide a more practical algorithm, based on a decomposed version of multi-variate Newton's method, and prove that it always converges monotonically to the desired probabilities. We show this method applies more generally to any monotone polynomial system. We obtain polynomial-time algorithms for various special subclasses of RMCs. Among these: for SCFGs and MT-BPs (equivalently, for 1-exit RMCs) the qualitative problem can be solved in P-time; for linearly recursive RMCs the probabilities are rational and can be computed exactly in P-time.We show that our PSPACE upper bounds cannot be substantially improved without a breakthrough on long standing open problems: the square-root sum problem and an arithmetic circuit decision problem that captures P-time on the unit-cost rational arithmetic RAM model. We show that these problems reduce to the qualitative problem and to the approximation problem (to within any nontrivial error) for termination probabilities of general RMCs, and to the quantitative decision problem for termination (extinction) of SCFGs (MT-BPs).

632 citations


Book
02 Dec 2009
TL;DR: This chapter discusses Bayesian analysis for simple models, which focuses on the case for hierarchical modeling, and its applications in the Social Sciences and Simulation Based Bayesian Analysis.
Abstract: List of Figures. List of Tables. Preface. Acknowledgments. Introduction. Part I: Introducing Bayesian Analysis. 1. The foundations of Bayesian inference. 1.1 What is probability? 1.2 Subjective probability in Bayesian statistics. 1.3 Bayes theorem, discrete case. 1.4 Bayes theorem, continuous parameter. 1.5 Parameters as random variables, beliefs as distributions. 1.6 Communicating the results of a Bayesian analysis. 1.7 Asymptotic properties of posterior distributions. 1.8 Bayesian hypothesis testing. 1.9 From subjective beliefs to parameters and models. 1.10 Historical note. 2. Getting started: Bayesian analysis for simple models. 2.1 Learning about probabilities, rates and proportions. 2.2 Associations between binary variables. 2.3 Learning from counts. 2.4 Learning about a normal mean and variance. 2.5 Regression models. 2.6 Further reading. Part II: Simulation Based Bayesian Analysis. 3. Monte Carlo methods. 3.1 Simulation consistency. 3.2 Inference for functions of parameters. 3.3 Marginalization via Monte Carlo integration. 3.4 Sampling algorithms. 3.5 Further reading. 4. Markov chains. 4.1 Notation and definitions. 4.2 Properties of Markov chains. 4.3 Convergence of Markov chains. 4.4 Limit theorems for Markov chains. 4.5 Further reading. 5. Markov chain Monte Carlo. 5.1 Metropolis-Hastings algorithm. 5.2 Gibbs sampling. 6. Implementing Markov chain Monte Carlo. 6.1 Software for Markov chain Monte Carlo. 6.2 Assessing convergence and run-length. 6.3 Working with BUGS/JAGS from R. 6.4 Tricks of the trade. 6.5 Other examples. 6.6 Further reading. Part III: Advanced Applications in the Social Sciences. 7. Hierarchical Statistical Models. 7.1 Data and parameters that vary by groups: the case for hierarchical modeling. 7.2 ANOVA as a hierarchical model. 7.3 Hierarchical models for longitudinal data. 7.4 Hierarchical models for non-normal data. 7.5 Multi-level models. 8. Bayesian analysis of choice making. 8.1 Regression models for binary responses. 8.2 Ordered outcomes. 8.3 Multinomial outcomes. 8.4 Multinomial probit. 9. Bayesian approaches to measurement. 9.1 Bayesian inference for latent states. 9.2 Factor analysis. 9.3 Item-response models. 9.4 Dynamic measurement models. Part IV: Appendices. Appendix A: Working with vectors and matrices. Appendix B: Probability review. B.1 Foundations of probability. B.2 Probability densities and mass functions. B.3 Convergence of sequences of random variabales. Appendix C: Proofs of selected propositions. C.1 Products of normal densities. C.2 Conjugate analysis of normal data. C.3 Asymptotic normality of the posterior density. References. Topic index. Author index.

626 citations


Journal ArticleDOI
TL;DR: This note investigates the output feedback stabilization of networked control systems (NCSs) through the design of a two-mode-dependent controller that depends on not only the current S-C delay but also the most recent available C-A delay at the controller node.
Abstract: This note investigates the output feedback stabilization of networked control systems (NCSs). The sensor-to-controller (S-C) and controller-to-actuator (C-A) random network-induced delays are modeled as Markov chains. The focus is on the design of a two-mode-dependent controller that depends on not only the current S-C delay but also the most recent available C-A delay at the controller node. The resulting closed-loop system is transformed to a special discrete-time jump linear system. Then, the sufficient and necessary conditions for the stochastic stability are established. Further, the output feedback controller is designed via the iterative linear matrix inequality (LMI) approach. Simulation examples illustrate the effectiveness of the proposed method.

Journal ArticleDOI
TL;DR: In this article, a powerful and flexible MCMC algorithm for stochastic simulation is introduced, based on a pseudo-marginal method originally introduced in [Genetics 164 (2003) 1139-1160], showing how algorithms which are approximations to an idealized marginal algorithm, can share the same marginal stationary distribution as the idealized method.
Abstract: We introduce a powerful and flexible MCMC algorithm for stochastic simulation. The method builds on a pseudo-marginal method originally introduced in [Genetics 164 (2003) 1139-1160], showing how algorithms which are approximations to an idealized marginal algorithm, can share the same marginal stationary distribution as the idealized method. Theoretical results are given describing the convergence properties of the proposed method, and simple numerical examples are given to illustrate the promising empirical characteristics of the technique. Interesting comparisons with a more obvious, but inexact, Monte Carlo approximation to the marginal algorithm, are also given.

Journal ArticleDOI
TL;DR: Describes a fully-automatic three-dimensional (3-D)-segmentation technique for brain magnetic resonance (MR) images that captures three features that are of special importance for MR images, i.e., nonparametric distributions of tissue intensities, neighborhood correlations, and signal inhomogeneities.
Abstract: We describe a fully-automatic 3D-segmentation technique for brain MR images. Using Markov random fields the segmentation algorithm captures three important MR features, i.e. non-parametric distributions of tissue intensities, neighborhood correlations and signal inhomogeneities. Detailed simulations and real MR images demonstrate the performance of the segmentation algorithm. The impact of noise, inhomogeneity, smoothing and structure thickness is analyzed quantitatively. Even single echo MR images are well classified into gray matter, white matter, cerebrospinal fluid, scalp-bone and background. A simulated annealing and an iterated conditional modes implementation are presented. Keywords: Magnetic Resonance Imaging, Segmentation, Markov Random Fields

Book
26 Jul 2009
TL;DR: The textbook looks at the fundamentals of probability theory, from the basic concepts of set-based probability, through probability distributions, to bounds, limit theorems, and the laws of large numbers.
Abstract: Probability, Markov Chains, Queues, and Simulation provides a modern and authoritative treatment of the mathematical processes that underlie performance modeling. The detailed explanations of mathematical derivations and numerous illustrative examples make this textbook readily accessible to graduate and advanced undergraduate students taking courses in which stochastic processes play a fundamental role. The textbook is relevant to a wide variety of fields, including computer science, engineering, operations research, statistics, and mathematics. The textbook looks at the fundamentals of probability theory, from the basic concepts of set-based probability, through probability distributions, to bounds, limit theorems, and the laws of large numbers. Discrete and continuous-time Markov chains are analyzed from a theoretical and computational point of view. Topics include the Chapman-Kolmogorov equations; irreducibility; the potential, fundamental, and reachability matrices; random walk problems; reversibility; renewal processes; and the numerical computation of stationary and transient distributions. The M/M/1 queue and its extensions to more general birth-death processes are analyzed in detail, as are queues with phase-type arrival and service processes. The M/G/1 and G/M/1 queues are solved using embedded Markov chains; the busy period, residual service time, and priority scheduling are treated. Open and closed queueing networks are analyzed. The final part of the book addresses the mathematical basis of simulation. Each chapter of the textbook concludes with an extensive set of exercises. An instructor's solution manual, in which all exercises are completely worked out, is also available (to professors only).Numerous examples illuminate the mathematical theories Carefully detailed explanations of mathematical derivations guarantee a valuable pedagogical approach Each chapter concludes with an extensive set of exercises Professors: A supplementary Solutions Manual is available for this book. It is restricted to teachers using the text in courses. For information on how to obtain a copy, refer to: http://press.princeton.edu/class_use/solutions.html

Journal ArticleDOI
TL;DR: In particular, this article showed that buy and sell orders can cluster away from the bid-ask spread, thus generating a hump-shaped limit-order book, and following a market buy order, both the ask and bid prices increase, with the ask increasing more than the bid, thus the spread widens.
Abstract: of a Markov equilibrium in which the bid and ask prices depend only on the numbers of buy and sell orders in the book, and which can be characterized in closed-form in several cases of interest. My model generates empirically verified implications for the shape of the limit-order book and the dynamics of prices and trades. In particular, I show that buy and sell orders can cluster away from the bid-ask spread, thus generating a hump-shaped limit-order book. Also, following a market buy order, both the ask and bid prices increase, with the ask increasing more than the bid—hence the spread widens.

Journal ArticleDOI
TL;DR: The framework of TPT for Markov chains is developed in detail, and the relation of the theory to electric resistor network theory and data analysis tools such as Laplacian eigenmaps and diffusion maps is discussed as well.
Abstract: The framework of transition path theory (TPT) is developed in the context of continuous-time Markov chains on discrete state-spaces. Under assumption of ergodicity, TPT singles out any two subsets in the state-space and analyzes the statistical properties of the associated reactive trajectories, i.e., those trajectories by which the random walker transits from one subset to another. TPT gives properties such as the probability distribution of the reactive trajectories, their probability current and flux, and their rate of occurrence and the dominant reaction pathways. In this paper the framework of TPT for Markov chains is developed in detail, and the relation of the theory to electric resistor network theory and data analysis tools such as Laplacian eigenmaps and diffusion maps is discussed as well. Various algorithms for the numerical calculation of the various objects in TPT are also introduced. Finally, the theory and the algorithms are illustrated in several examples.

Book
08 May 2009
TL;DR: Most subfields of computer science have an interface layer via which applications communicate with the infrastructure, and this is key to their success, but this interface layer has been missing in AI.
Abstract: Most subfields of computer science have an interface layer via which applications communicate with the infrastructure, and this is key to their success (e.g., the Internet in networking, the relational model in databases, etc.). So far this interface layer has been missing in AI. First-order logic and probabilistic graphical models each have some of the necessary features, but a viable interface layer requires combining both. Markov logic is a powerful new language that accomplishes this by attaching weights to first-order formulas and treating them as templates for features of Markov random fields. Most statistical models in wide use are special cases of Markov logic, and first-order logic is its infinite-weight limit. Inference algorithms for Markov logic combine ideas from satisfiability, Markov chain Monte Carlo, belief propagation, and resolution. Learning algorithms make use of conditional likelihood, convex optimization, and inductive logic programming. Markov logic has been successfully applied to problems in information extraction and integration, natural language processing, robot mapping, social networks, computational biology, and others, and is the basis of the open-source Alchemy system.

Journal ArticleDOI
TL;DR: This work demonstrates the application of a toolkit for automating the construction ofMarkov state models to the villin headpiece (HP-35 NleNle), one of the smallest and fastest folding proteins, and shows that the resulting MSM captures both the thermodynamics and kinetics of the original molecular dynamics of the system.
Abstract: Markov state models (MSMs) are a powerful tool for modeling both the thermodynamics and kinetics of molecular systems. In addition, they provide a rigorous means to combine information from multiple sources into a single model and to direct future simulations/experiments to minimize uncertainties in the model. However, constructing MSMs is challenging because doing so requires decomposing the extremely high dimensional and rugged free energy landscape of a molecular system into long-lived states, also called metastable states. Thus, their application has generally required significant chemical intuition and hand-tuning. To address this limitation we have developed a toolkit for automating the construction of MSMs called MSMBUILDER (available at https://simtk.org/home/msmbuilder). In this work we demonstrate the application of MSMBUILDER to the villin headpiece (HP-35 NleNle), one of the smallest and fastest folding proteins. We show that the resulting MSM captures both the thermodynamics and kinetics of the original molecular dynamics of the system. As a first step toward experimental validation of our methodology we show that our model provides accurate structure prediction and that the longest timescale events correspond to folding.

Journal ArticleDOI
TL;DR: A hybrid of these techniques which models stochastic usage behaviour in a comprehensive and efficient way is proposed, and an algorithm for implementing this model in dynamic building simulation tools is described.

Journal ArticleDOI
TL;DR: An algorithm that generalizes the randomized incremental subgradient method with fixed stepsize due to Nedic and Bertsekas is presented, particularly suitable for distributed implementation and execution, and possible applications include distributed optimization, e.g., parameter estimation in networks of tiny wireless sensors.
Abstract: We present an algorithm that generalizes the randomized incremental subgradient method with fixed stepsize due to Nedic and Bertsekas [SIAM J. Optim., 12 (2001), pp. 109-138]. Our novel algorithm is particularly suitable for distributed implementation and execution, and possible applications include distributed optimization, e.g., parameter estimation in networks of tiny wireless sensors. The stochastic component in the algorithm is described by a Markov chain, which can be constructed in a distributed fashion using only local information. We provide a detailed convergence analysis of the proposed algorithm and compare it with existing, both deterministic and randomized, incremental subgradient methods.

Journal ArticleDOI
TL;DR: Simulation results show that MCMCDA outperforms multiple hypothesis tracking (MHT) by a significant margin in terms of accuracy and efficiency under extreme conditions, such as a large number of targets in a dense environment, low detection probabilities, and high false alarm rates.
Abstract: This paper presents Markov chain Monte Carlo data association (MCMCDA) for solving data association problems arising in multitarget tracking in a cluttered environment. When the number of targets is fixed, the single-scan version of MCMCDA approximates joint probabilistic data association (JPDA). Although the exact computation of association probabilities in JPDA is NP-hard, we prove that the single-scan MCMCDA algorithm provides a fully polynomial randomized approximation scheme for JPDA. For general multitarget tracking problems, in which unknown numbers of targets appear and disappear at random times, we present a multi-scan MCMCDA algorithm that approximates the optimal Bayesian filter. We also present extensive simulation studies supporting theoretical results in this paper. Our simulation results also show that MCMCDA outperforms multiple hypothesis tracking (MHT) by a significant margin in terms of accuracy and efficiency under extreme conditions, such as a large number of targets in a dense environment, low detection probabilities, and high false alarm rates.

Journal ArticleDOI
TL;DR: This work considers a Bayesian approach to nonlinear inverse problems in which the unknown quantity is a spatial or temporal field, endowed with a hierarchical Gaussian process prior, and introduces truncated Karhunen-Loeve expansions, based on the prior distribution, to efficiently parameterize the unknown field.

Journal ArticleDOI
TL;DR: The Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment.
Abstract: We describe a new program for the alignment of multiple biological sequences that is both statistically motivated and fast enough for problem sizes that arise in practice. Our Fast Statistical Alignment program is based on pair hidden Markov models which approximate an insertion/deletion process on a tree and uses a sequence annealing algorithm to combine the posterior probabilities estimated from these models into a multiple alignment. FSA uses its explicit statistical model to produce multiple alignments which are accompanied by estimates of the alignment accuracy and uncertainty for every column and character of the alignment—previously available only with alignment programs which use computationally-expensive Markov Chain Monte Carlo approaches—yet can align thousands of long sequences. Moreover, FSA utilizes an unsupervised query-specific learning procedure for parameter estimation which leads to improved accuracy on benchmark reference alignments in comparison to existing programs. The centroid alignment approach taken by FSA, in combination with its learning procedure, drastically reduces the amount of false-positive alignment on biological data in comparison to that given by other methods. The FSA program and a companion visualization tool for exploring uncertainty in alignments can be used via a web interface at http://orangutan.math.berkeley.edu/fsa/, and the source code is available at http://fsa.sourceforge.net/.

Journal ArticleDOI
TL;DR: In this article, the authors apply the reversible jump algorithm to the seismic tomography problem, where the model is parametrized using Voronoi cells with mobile geometry and number, and the size, position and shape of the cells defining the velocity model are directly determined by the data.
Abstract: SUMMARY The reversible jump algorithm is a statistical method for Bayesian inference with a variable number of unknowns. Here, we apply this method to the seismic tomography problem. The approach lets us consider the issue of model parametrization (i.e. the way of discretizing the velocity field) as part of the inversion process. The model is parametrized using Voronoi cells with mobile geometry and number. The size, position and shape of the cells defining the velocity model are directly determined by the data. The inverse problem is tackled within a Bayesian framework and explicit regularization of model parameters is not required. The mobile position and number of cells means that global damping procedures, controlled by an optimal regularization parameter, are avoided. Many velocity models with variable numbers of cells are generated via a transdimensional Markov chain and information is extracted from the ensemble as a whole. As an aid to interpretation we visualize the expected earth model that is obtained via Monte Carlo integration in a straightforward manner. The procedure is particularly adept at imaging rapid changes or discontinuities in wave speed. While each velocity model in the final ensemble consists of many discontinuities at cell boundaries, these are smoothed out in the averaged ensemble solution while those required by the data are reinforced. The ensemble of models can also be used to produce uncertainty estimates and experiments with synthetic data suggest that they represent actual uncertainty surprisingly well. We use the fast marching method in order to iteratively update the ray geometry and account for the non-linearity of the problem. The method is tested here with synthetic data in a 2-D application and compared with a subspace method that is a more standard matrix-based inversion scheme. Preliminary results illustrate the advantages of the reversible jump algorithm. A real data example is also shown where a tomographic image of Rayleigh wave group velocity for the Australian continent is constructed together with uncertainty estimates.

Proceedings ArticleDOI
14 Jun 2009
TL;DR: It is shown that the weight updates force the Markov chain to mix fast, and using this insight, an even faster mixing chain is developed that uses an auxiliary set of "fast weights" to implement a temporary overlay on the energy landscape.
Abstract: The most commonly used learning algorithm for restricted Boltzmann machines is contrastive divergence which starts a Markov chain at a data point and runs the chain for only a few iterations to get a cheap, low variance estimate of the sufficient statistics under the model. Tieleman (2008) showed that better learning can be achieved by estimating the model's statistics using a small set of persistent "fantasy particles" that are not reinitialized to data points after each weight update. With sufficiently small weight updates, the fantasy particles represent the equilibrium distribution accurately but to explain why the method works with much larger weight updates it is necessary to consider the interaction between the weight updates and the Markov chain. We show that the weight updates force the Markov chain to mix fast, and using this insight we develop an even faster mixing chain that uses an auxiliary set of "fast weights" to implement a temporary overlay on the energy landscape. The fast weights learn rapidly but also decay rapidly and do not contribute to the normal energy landscape that defines the model.

Journal ArticleDOI
TL;DR: A stochastic approximation version extending DILOC to random environments, i.e., when the communications among nodes is noisy, the communication links among neighbors may fail at random times, and the internodes distances are subject to errors is introduced.
Abstract: The paper introduces DILOC, a distributed, iterative algorithm to locate M sensors (with unknown locations) in Rm, m ges 1, with respect to a minimal number of m + 1 anchors with known locations. The sensors and anchors, nodes in the network, exchange data with their neighbors only; no centralized data processing or communication occurs, nor is there a centralized fusion center to compute the sensors' locations. DILOC uses the barycentric coordinates of a node with respect to its neighbors; these coordinates are computed using the Cayley-Menger determinants, i.e., the determinants of matrices of internode distances. We show convergence of DILOC by associating with it an absorbing Markov chain whose absorbing states are the states of the anchors. We introduce a stochastic approximation version extending DILOC to random environments, i.e., when the communications among nodes is noisy, the communication links among neighbors may fail at random times, and the internodes distances are subject to errors. We show a.s. convergence of the modified DILOC and characterize the error between the true values of the sensors' locations and their final estimates given by DILOC. Numerical studies illustrate DILOC under a variety of deterministic and random operating conditions.

Proceedings ArticleDOI
07 Sep 2009
TL;DR: A method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is least significant bit (LSB) matching.
Abstract: This paper presents a novel method for detection of steganographic methods that embed in the spatial domain by adding a low-amplitude independent stego signal, an example of which is LSB matching. First, arguments are provided for modeling differences between adjacent pixels using first-order and second-order Markov chains. Subsets of sample transition probability matrices are then used as features for a steganalyzer implemented by support vector machines. The accuracy of the presented steganalyzer is evaluated on LSB matching and four different databases. The steganalyzer achieves superior accuracy with respect to prior art and provides stable results across various cover sources. Since the feature set based on second-order Markov chain is high-dimensional, we address the issue of curse of dimensionality using a feature selection algorithm and show that the curse did not occur in our experiments.

Book
14 Aug 2009
TL;DR: The main focus of as discussed by the authors is the exploration of the geometric and dynamic properties of a far reaching generalization of a conformal iterated function system -a Graph Directed Markov System.
Abstract: The main focus of this book is the exploration of the geometric and dynamic properties of a far reaching generalization of a conformal iterated function system - a Graph Directed Markov System. These systems are very robust in that they apply to many settings that do not fit into the scheme of conformal iterated systems. The basic theory is laid out here and the authors have touched on many natural questions arising in its context. However, they also emphasise the many issues and current research topics which can be found in original papers. For example the detailed analysis of the structure of harmonic measures of limit sets, the examination of the doubling property of conformal measures, the extensive study of generalized polynomial like mapping or multifractal analysis of geometrically finite Kleinian groups. This book leads readers onto frontier research in the field, making it ideal for both established researchers and graduate students.

Proceedings ArticleDOI
20 Jun 2009
TL;DR: A functional gradient approach for learning high-dimensional parameters of random fields in order to perform discrete, multi-label classification and successfully demonstrates the generality of the approach on the challenging vision problem of recovering 3-D geometric surfaces from images.
Abstract: We address the problem of label assignment in computer vision: given a novel 3D or 2D scene, we wish to assign a unique label to every site (voxel, pixel, superpixel, etc.). To this end, the Markov Random Field framework has proven to be a model of choice as it uses contextual information to yield improved classification results over locally independent classifiers. In this work we adapt a functional gradient approach for learning high-dimensional parameters of random fields in order to perform discrete, multi-label classification. With this approach we can learn robust models involving high-order interactions better than the previously used learning method. We validate the approach in the context of point cloud classification and improve the state of the art. In addition, we successfully demonstrate the generality of the approach on the challenging vision problem of recovering 3-D geometric surfaces from images.

Journal ArticleDOI
TL;DR: This method is an extension of a method of Collatz (1942) for calculating the spectral radius of an irreducible nonnegative matrix and applies the method to studying higher-order Markov chains.
Abstract: In this paper we propose an iterative method for calculating the largest eigenvalue of an irreducible nonnegative tensor. This method is an extension of a method of Collatz (1942) for calculating the spectral radius of an irreducible nonnegative matrix. Numerical results show that our proposed method is promising. We also apply the method to studying higher-order Markov chains.