Showing papers on "Markov chain published in 2006"

PDF

Open Access

CODA: convergence diagnosis and output analysis for MCMC

[...]

Martyn Plummer, Nicky Best, Kate Cowles, Karen Vines

01 Mar 2006

TL;DR: Bayesian inference with Markov Chain Monte Carlo with coda package for R contains a set of functions designed to help the user answer questions about how many samples are required to accurately estimate posterior quantities of interest.

...read moreread less

Abstract: [1st paragraph] At first sight, Bayesian inference with Markov Chain Monte Carlo (MCMC) appears to be straightforward. The user defines a full probability model, perhaps using one of the programs discussed in this issue; an underlying sampling engine takes the model definition and returns a sequence of dependent samples from the posterior distribution of the model parameters, given the supplied data. The user can derive any summary of the posterior distribution from this sample. For example, to calculate a 95% credible interval for a parameter α, it suffices to take 1000 MCMC iterations of α and sort them so that α1<α2<...<α1000. The credible interval estimate is then (α25, α975). However, there is a price to be paid for this simplicity. Unlike most numerical methods used in statistical inference, MCMC does not give a clear indication of whether it has converged. The underlying Markov chain theory only guarantees that the distribution of the output will converge to the posterior in the limit as the number of iterations increases to infinity. The user is generally ignorant about how quickly convergence occurs, and therefore has to fall back on post hoc testing of the sampled output. By convention, the sample is divided into two parts: a “burn in” period during which all samples are discarded, and the remainder of the run in which the chain is considered to have converged sufficiently close to the limiting distribution to be used. Two questions then arise: 1. How long should the burn in period be? 2. How many samples are required to accurately estimate posterior quantities of interest? The coda package for R contains a set of functions designed to help the user answer these questions. Some of these convergence diagnostics are simple graphical ways of summarizing the data. Others are formal statistical tests.

...read moreread less

3,098 citations

Journal Article•DOI•

Markov logic networks

[...]

Matthew Richardson¹, Pedro Domingos¹•Institutions (1)

University of Washington¹

01 Feb 2006-Machine Learning

TL;DR: Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach to combining first-order logic and probabilistic graphical models in a single representation.

...read moreread less

Abstract: We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a first-order formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudo-likelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach.

...read moreread less

2,916 citations

Book•

Finite Mixture and Markov Switching Models

[...]

Sylvia Frühwirth-Schnatter¹•Institutions (1)

Johannes Kepler University of Linz¹

08 Aug 2006

TL;DR: This book should help newcomers to the field to understand how finite mixture and Markov switching models are formulated, what structures they imply on the data, what they could be used for, and how they are estimated.

...read moreread less

Abstract: WINNER OF THE 2007 DEGROOT PRIZE! The prominence of finite mixture modelling is greater than ever. Many important statistical topics like clustering data, outlier treatment, or dealing with unobserved heterogeneity involve finite mixture models in some way or other. The area of potential applications goes beyond simple data analysis and extends to regression analysis and to non-linear time series analysis using Markov switching models. For more than the hundred years since Karl Pearson showed in 1894 how to estimate the five parameters of a mixture of two normal distributions using the method of moments, statistical inference for finite mixture models has been a challenge to everybody who deals with them. In the past ten years, very powerful computational tools emerged for dealing with these models which combine a Bayesian approach with recent Monte simulation techniques based on Markov chains. This book reviews these techniques and covers the most recent advances in the field, among them bridge sampling techniques and reversible jump Markov chain Monte Carlo methods. It is the first time that the Bayesian perspective of finite mixture modelling is systematically presented in book form. It is argued that the Bayesian approach provides much insight in this context and is easily implemented in practice. Although the main focus is on Bayesian inference, the author reviews several frequentist techniques, especially selecting the number of components of a finite mixture model, and discusses some of their shortcomings compared to the Bayesian approach. The aim of this book is to impart the finite mixture and Markov switching approach to statistical modelling to a wide-ranging community. This includes not only statisticians, but also biologists, economists, engineers, financial agents, market researcher, medical researchers or any other frequent user of statistical models. This book should help newcomers to the field to understand how finite mixture and Markov switching models are formulated, what structures they imply on the data, what they could be used for, and how they are estimated. Researchers familiar with the subject also will profit from reading this book. The presentation is rather informal without abandoning mathematical correctness. Previous notions of Bayesian inference and Monte Carlo simulation are useful but not needed.

...read moreread less

1,642 citations

Proceedings Article•DOI•

Topics over time: a non-Markov continuous-time model of topical trends

[...]

Xuerui Wang¹, Andrew McCallum¹•Institutions (1)

University of Massachusetts Amherst¹

20 Aug 2006

TL;DR: An LDA-style topic model is presented that captures not only the low-dimensional structure of data, but also how the structure changes over time, showing improved topics, better timestamp prediction, and interpretable trends.

...read moreread less

Abstract: This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.

...read moreread less

1,327 citations

Journal Article•DOI•

Assessment of methods for amino acid matrix selection and their use on empirical data shows that ad hoc assumptions for choice of matrix are not justified

[...]

Thomas M. Keane¹, Christopher J. Creevey², Melissa M. Pentony³, Thomas J. Naughton¹, James O Mclnerney¹ - Show less +1 more•Institutions (3)

Maynooth University¹, European Bioinformatics Institute², University College London³

24 Mar 2006-BMC Evolutionary Biology

TL;DR: It is demonstrated that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins.

...read moreread less

Abstract: In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner. We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins. This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.

...read moreread less

1,067 citations

A Tutorial on Energy-Based Learning

[...]

Yann LeCun, Sumit Chopra¹, Raia Hadsell¹, Aurelio Ranzato¹, Fu Jie Huang¹ - Show less +1 more•Institutions (1)

New York University¹

01 Jan 2006

TL;DR: The EBM approach provides a common theoretical framework for many learning models, including traditional discr iminative and generative approaches, as well as graph-transformer networks, co nditional random fields, maximum margin Markov networks, and several manifold learning methods.

...read moreread less

Abstract: Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variab les. Inference consists in clamping the value of observed variables and finding config urations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables a re given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discr iminative and generative approaches, as well as graph-transformer networks, co nditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all poss ible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of non-probabilistic factor graphs, and they provide considerably more flexibility in th e design of architectures and training criteria than probabilistic approaches .

...read moreread less

1,022 citations

Journal Article•DOI•

Bayesian Analysis of Correlated Evolution of Discrete Characters by Reversible-Jump Markov Chain Monte Carlo

[...]

Mark Pagel¹, Andrew Meade¹•Institutions (1)

University of Reading¹

09 May 2006-The American Naturalist

TL;DR: A Bayesian method for investigating correlated evolution of discrete binary traits on phylogenetic trees is described and the question of whether mating system and advertisement of estrus by females have coevolved in the Old World monkeys and great apes is illustrated.

...read moreread less

Abstract: We describe a Bayesian method for investigating correlated evolution of discrete binary traits on phylogenetic trees. The method fits a continuous-time Markov model to a pair of traits, seeking the best fitting models that describe their joint evolution on a phylogeny. We employ the methodology of reversible-jump (RJ) Markov chain Monte Carlo to search among the large number of possible models, some of which conform to independent evolution of the two traits, others to correlated evolution. The RJ Markov chain visits these models in proportion to their posterior probabilities, thereby directly estimating the support for the hypothesis of correlated evolution. In addition, the RJ Markov chain simultaneously estimates the posterior distributions of the rate parameters of the model of trait evolution. These posterior distributions can be used to test among alternative evolutionary scenarios to explain the observed data. All results are integrated over a sample of phylogenetic trees to account for phylogenetic uncertainty. We implement the method in a program called RJ Discrete and illustrate it by analyzing the question of whether mating system and advertisement of estrus by females have coevolved in the Old World monkeys and great apes.

...read moreread less

868 citations

Journal Article•DOI•

A Markov Chain Monte Carlo version of the genetic algorithm Differential Evolution: easy Bayesian computing for real parameter spaces

[...]

Cajo J. Braak¹•Institutions (1)

Wageningen University and Research Centre¹

01 Sep 2006-Statistics and Computing

TL;DR: The essential ideas of DE and MCMC are integrated, resulting in Differential Evolution Markov Chain (DE-MC), a population MCMC algorithm, in which multiple chains are run in parallel, showing simplicity, speed of calculation and convergence, even for nearly collinear parameters and multimodal densities.

...read moreread less

Abstract: Differential Evolution (DE) is a simple genetic algorithm for numerical optimization in real parameter spaces. In a statistical context one would not just want the optimum but also its uncertainty. The uncertainty distribution can be obtained by a Bayesian analysis (after specifying prior and likelihood) using Markov Chain Monte Carlo (MCMC) simulation. This paper integrates the essential ideas of DE and MCMC, resulting in Differential Evolution Markov Chain (DE-MC). DE-MC is a population MCMC algorithm, in which multiple chains are run in parallel. DE-MC solves an important problem in MCMC, namely that of choosing an appropriate scale and orientation for the jumping distribution. In DE-MC the jumps are simply a fixed multiple of the differences of two random parameter vectors that are currently in the population. The selection process of DE-MC works via the usual Metropolis ratio which defines the probability with which a proposal is accepted. In tests with known uncertainty distributions, the efficiency of DE-MC with respect to random walk Metropolis with optimal multivariate Normal jumps ranged from 68% for small population sizes to 100% for large population sizes and even to 500% for the 97.5% point of a variable from a 50-dimensional Student distribution. Two Bayesian examples illustrate the potential of DE-MC in practice. DE-MC is shown to facilitate multidimensional updates in a multi-chain "Metropolis-within-Gibbs" sampling approach. The advantage of DE-MC over conventional MCMC are simplicity, speed of calculation and convergence, even for nearly collinear parameters and multimodal densities.

...read moreread less

839 citations

Journal Article•DOI•

Analysis of Single-Molecule FRET Trajectories Using Hidden Markov Modeling

[...]

Sean A. McKinney¹, Chirlmin Joo¹, Taekjip Ha¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

01 Sep 2006-Biophysical Journal

TL;DR: An analysis scheme is developed that casts single-molecule time-binned FRET trajectories as hidden Markov processes, allowing one to determine, based on probability alone, the most likely FRET-value distributions of states and their interconversion rates while simultaneously determining the mostlikely time sequence of underlying states for each trajectory.

...read moreread less

742 citations

Proceedings Article•DOI•

Achievable Rate Regions for the Two-way Relay Channel

[...]

B. Rankov¹, Armin Wittneben¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

09 Jul 2006

TL;DR: It is shown that a combined strategy of block Markov superposition coding and Wyner-Ziv coding achieves the cut-set upper bound on the sum-rate of the two-way relay channel when the relay is in the proximity of one of the terminals.

...read moreread less

Abstract: We study the two-way communication problem for the relay channel. Hereby, two terminals communicate simultaneously in both directions with the help of one relay. We consider the restricted two-way problem, i.e., the encoders at both terminals do not cooperate. We provide achievable rate regions for different cooperation strategies, such as decode-and-forward based on block Markov superposition coding and compress-and-forward based on Wyner-Ziv source coding. We also evaluate the regions for the special case of additive white Gaussian noise channels. We show that a combined strategy of block Markov superposition coding and Wyner-Ziv coding achieves the cut-set upper bound on the sum-rate of the two-way relay channel when the relay is in the proximity of one of the terminals.

...read moreread less

558 citations

Journal Article•DOI•

Quantum dynamics of human decision-making

[...]

Jerome R. Busemeyer¹, Zheng Wang¹, James T. Townsend¹•Institutions (1)

Indiana University¹

01 Jun 2006-Journal of Mathematical Psychology

TL;DR: In this article, a quantum dynamic model of decision-making is presented, and it is compared with a previously established Markov model, which is formulated as random walk decision processes, but the probabilistic principles differ between the two approaches.

...read moreread less

Journal Article•DOI•

A taxonomy of model structures for economic evaluation of health technologies

[...]

Alan Brennan¹, Stephen E. Chick², Ruth Davies³•Institutions (3)

University of Sheffield¹, INSEAD², University of Warwick³

01 Dec 2006-Health Economics

TL;DR: A new taxonomy of model structures is developed, based on key requirements, including output requirements, the population size, and system complexity, for modelling infectious diseases and systems with constrained resources.

...read moreread less

Abstract: Models for the economic evaluation of health technologies provide valuable information to decision makers. The choice of model structure is rarely discussed in published studies and can affect the results produced. Many papers describe good modelling practice, but few describe how to choose from the many types of available models. This paper develops a new taxonomy of model structures. The horizontal axis of the taxonomy describes assumptions about the role of expected values, randomness, the heterogeneity of entities, and the degree of non-Markovian structure. Commonly used aggregate models, including decision trees and Markov models require large population numbers, homogeneous sub-groups and linear interactions. Individual models are more flexible, but may require replications with different random numbers to estimate expected values. The vertical axis of the taxonomy describes potential interactions between the individual actors, as well as how the interactions occur through time. Models using interactions, such as system dynamics, some Markov models, and discrete event simulation are fairly uncommon in the health economics but are necessary for modelling infectious diseases and systems with constrained resources. The paper provides guidance for choosing a model, based on key requirements, including output requirements, the population size, and system complexity.

...read moreread less

Book Chapter•DOI•

A comparative study of energy minimization methods for markov random fields

[...]

Richard Szeliski¹, Ramin Zabih², Daniel Scharstein³, Olga Veksler⁴, Vladimir Kolmogorov⁵, Aseem Agarwala⁶, Marshall F. Tappen⁷, Carsten Rother¹ - Show less +4 more•Institutions (7)

Microsoft¹, Cornell University², Middlebury College³, University of Western Ontario⁴, University College London⁵, University of Washington⁶, Massachusetts Institute of Technology⁷

07 May 2006

TL;DR: A set of energy minimization benchmarks, which are used to compare the solution quality and running time of several common energy minimizations algorithms, as well as a general-purpose software interface that allows vision researchers to easily switch between optimization methods with minimal overhead.

...read moreread less

Abstract: One of the most exciting advances in early vision has been the development of efficient energy minimization algorithms. Many early vision tasks require labeling each pixel with some quantity such as depth or texture. While many such problems can be elegantly expressed in the language of Markov Random Fields (MRF's), the resulting energy minimization problems were widely viewed as intractable. Recently, algorithms such as graph cuts and loopy belief propagation (LBP) have proven to be very powerful: for example, such methods form the basis for almost all the top-performing stereo methods. Unfortunately, most papers define their own energy function, which is minimized with a specific algorithm of their choice. As a result, the tradeoffs among different energy minimization algorithms are not well understood. In this paper we describe a set of energy minimization benchmarks, which we use to compare the solution quality and running time of several common energy minimization algorithms. We investigate three promising recent methods—graph cuts, LBP, and tree-reweighted message passing—as well as the well-known older iterated conditional modes (ICM) algorithm. Our benchmark problems are drawn from published energy functions used for stereo, image stitching and interactive segmentation. We also provide a general-purpose software interface that allows vision researchers to easily switch between optimization methods with minimal overhead. We expect that the availability of our benchmarks and interface will make it significantly easier for vision researchers to adopt the best method for their specific problems. Benchmarks, code, results and images are available at http://vision.middlebury.edu/MRF.

...read moreread less

Journal Article•DOI•

Improving the Efficiency of Markov Chain Monte Carlo for Analyzing the Orbits of Extrasolar Planets

[...]

Eric B. Ford

01 May 2006-The Astrophysical Journal

TL;DR: In this paper, a simple method for generating alternative CTPDFs that can significantly speed up the convergence of MCMC by 1-3 orders of magnitude is presented. But the method is not suitable for the detection of multiple-planar systems.

...read moreread less

Abstract: Precise radial velocity measurements have led to the discovery of ~170 extrasolar planetary systems. Understanding the uncertainties in the orbital solutions will become increasingly important as the discovery space for extrasolar planets shifts to planets with smaller masses and longer orbital periods. The method of Markov chain Monte Carlo (MCMC) provides a rigorous method for quantifying the uncertainties in orbital parameters in a Bayesian framework (Paper I). The main practical challenge for the general application of MCMC is the need to construct Markov chains that quickly converge. The rate of convergence is very sensitive to the choice of the candidate transition probability distribution function (CTPDF). Here we explain one simple method for generating alternative CTPDFs that can significantly speed convergence by 1-3 orders of magnitude. We have numerically tested dozens of CTPDFs with simulated radial velocity data sets to identify those that perform well for different types of orbits and suggest a set of CTPDFs for general application. In addition, we introduce other refinements to the MCMC algorithm for radial velocity planets, including an improved treatment of the uncertainties in the radial velocity observations, an algorithm for automatically choosing step sizes, an algorithm for automatically determining reasonable stopping times, and the use of importance sampling for including the dynamical evolution of multiple-planet systems. Together, these improvements make it practical to apply MCMC to multiple-planet systems. We demonstrate the improvements in efficiency by analyzing a variety of extrasolar planetary systems.

...read moreread less

Proceedings Article•

Linearly-solvable Markov decision problems

[...]

Emanuel Todorov¹•Institutions (1)

University of California, San Diego¹

04 Dec 2006

TL;DR: A class of MPDs which greatly simplify Reinforcement Learning, which have discrete state spaces and continuous control spaces and enable efficient approximations to traditional MDPs.

...read moreread less

Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical significance, the new MDPs enable efficient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Off-policy learning of the optimal value function is possible without need for state-action values; the new algorithm (Z-learning) outperforms Q-learning.

...read moreread less

Journal Article•DOI•

Regime switching for dynamic correlations

[...]

Denis Pelletier¹•Institutions (1)

North Carolina State University¹

01 Mar 2006-Journal of Econometrics

TL;DR: The authors decompose the covariances into correlations and standard deviations and the correlation matrix follows a regime switching model; it is constant within a regime but different across regimes, and the transitions between the regimes are governed by a Markov chain.

...read moreread less

Proceedings Article•DOI•

Entity Resolution with Markov Logic

[...]

Parag Singla¹, Pedro Domingos¹•Institutions (1)

University of Washington¹

18 Dec 2006

TL;DR: A well-founded, integrated solution to the entity resolution problem based on Markov logic, which combines first-order logic and probabilistic graphical models by attaching weights to first- order formulas, and viewing them as templates for features of Markov networks.

...read moreread less

Abstract: Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in it has grown rapidly in recent years, and many approaches have been proposed. However, they tend to address only isolated aspects of the problem, and are often ad hoc. This paper proposes a well-founded, integrated solution to the entity resolution problem based on Markov logic. Markov logic combines first-order logic and probabilistic graphical models by attaching weights to first-order formulas, and viewing them as templates for features of Markov networks. We show how a number of previous approaches can be formulated and seamlessly combined in Markov logic, and how the resulting learning and inference problems can be solved efficiently. Experiments on two citation databases show the utility of this approach, and evaluate the contribution of the different components.

...read moreread less

Journal Article•

Image parsing : Unifying segmentation, detection, and recognition

[...]

Zhuowen Tu, Xiangrong Chen, Alan L. Yuille, Song-Chun Zhu

01 Jan 2006-Lecture Notes in Computer Science

TL;DR: A Bayesian framework for parsing images into their constituent visual patterns that optimizes the posterior probability and outputs a scene representation as a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language is presented.

...read moreread less

Abstract: In this chapter we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a parsing graph, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches - generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this chapter, we focus on two types of visual patterns - generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation [48].). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.

...read moreread less

Book•

Mathematical Aspects of Mixing Times in Markov Chains

[...]

Ravi Montenegro¹, Prasad Tetali²•Institutions (2)

University of Massachusetts Lowell¹, Georgia Institute of Technology²

25 May 2006

TL;DR: The strength of the main techniques are illustrated by way of simple examples, a recent result on the Pollard Rho random walk to compute the discrete logarithm, as well as with an improved analysis of the Thorp shuffle.

...read moreread less

Abstract: In the past few years we have seen a surge in the theory of finite Markov chains, by way of new techniques to bounding the convergence to stationarity. This includes functional techniques such as logarithmic Sobolev and Nash inequalities, refined spectral and entropy techniques, and isoperimetric techniques such as the average and blocking conductance and the evolving set methodology. We attempt to give a more or less self-contained treatment of some of these modern techniques, after reviewing several preliminaries. We also review classical and modern lower bounds on mixing times. There have been other important contributions to this theory such as variants on coupling techniques and decomposition methods, which are not included here; our choice was to keep the analytical methods as the theme of this presentation. We illustrate the strength of the main techniques by way of simple examples, a recent result on the Pollard Rho random walk to compute the discrete logarithm, as well as with an improved analysis of the Thorp shuffle.

...read moreread less

Journal Article•DOI•

Per Capita Carbon Dioxide Emissions: Convergence or Divergence?

[...]

Joseph E. Aldy

01 Apr 2006-Environmental and Resource Economics

TL;DR: In this paper, the authors evaluate historic international emissions distributions and forecast future distributions to assess whether per capita emissions have been converging or will converge, and find evidence of convergence among 23 member countries of the Organisation for Economic Co-operation and Development (OECD), whereas emissions appear to be diverging for an 88-country global sample over 1960-2000.

...read moreread less

Abstract: Understanding and considering the distribution of per capita carbon dioxide (CO2) emissions is important in designing international climate change proposals and incentives for participation. I evaluate historic international emissions distributions and forecast future distributions to assess whether per capita emissions have been converging or will converge. I find evidence of convergence among 23 member countries of the Organisation for Economic Co-operation and Development (OECD), whereas emissions appear to be diverging for an 88-country global sample over 1960–2000. Forecasts based on a Markov chain transition matrix provide little evidence of future emissions convergence and indicate that emissions may diverge in the near term. I also review the shortcomings of environmental Kuznets curve regressions and structural models in characterizing future emissions distributions.

...read moreread less

Journal Article•DOI•

Imitation processes with small mutations

[...]

Drew Fudenberg¹, Lorens A. Imhof²•Institutions (2)

Harvard University¹, University of Bonn²

01 Nov 2006-Journal of Economic Theory

TL;DR: This note characterizes the impact of adding rare stochastic mutations to an â€œimitation dynamic,â€ meaning a process with the properties that absent strategies remain absent, and non-homogeneous states are transient.

...read moreread less

Journal Article•DOI•

Bayesian Clustering Using Hidden Markov Random Fields in Spatial Population Genetics

[...]

Olivier François, Sophie Ancelet, Gilles Guillot

01 Oct 2006-Genetics

TL;DR: A new Bayesian clustering algorithm for studying population structure using individually geo-referenced multilocus data sets based on the concept of hidden Markov random field, which models the spatial dependencies at the cluster membership level is introduced.

...read moreread less

Abstract: We introduce a new Bayesian clustering algorithm for studying population structure using individually geo-referenced multilocus data sets. The algorithm is based on the concept of hidden Markov random field, which models the spatial dependencies at the cluster membership level. We argue that (i) a Markov chain Monte Carlo procedure can implement the algorithm efficiently, (ii) it can detect significant geographical discontinuities in allele frequencies and regulate the number of clusters, (iii) it can check whether the clusters obtained without the use of spatial priors are robust to the hypothesis of discontinuous geographical variation in allele frequencies, and (iv) it can reduce the number of loci required to obtain accurate assignments. We illustrate and discuss the implementation issues with the Scandinavian brown bear and the human CEPH diversity panel data set.

...read moreread less

Book•

Markov Chains: Models, Algorithms and Applications

[...]

Wai-Ki Ching¹, Ximin Huang¹, Michael K. Ng², Tak Kuen Siu³•Institutions (3)

University of Hong Kong¹, Hong Kong Baptist University², University of London³

01 Jan 2006

TL;DR: This new edition of Markov Chains: Models, Algorithms and Applications has been completely reformatted as a text, complete with end-of-chapter exercises, a new focus on management science, new applications of the models, and new examples with applications in financial risk management and modeling of financial data.

...read moreread less

Abstract: This new edition of Markov Chains: Models, Algorithms and Applications has been completely reformatted as a text, complete with end-of-chapter exercises, a new focus on management science, new applications of the models, and new examples with applications in financial risk management and modeling of financial data. This book consists of eight chapters. Chapter 1 gives a brief introduction to the classical theory on both discrete and continuous time Markov chains. The relationship between Markov chains of finite states and matrix theory will also be highlighted. Some classical iterative methods for solving linear systems will be introduced for finding the stationary distribution of a Markov chain. The chapter then covers the basic theories and algorithms for hidden Markov models (HMMs) and Markov decision processes (MDPs). Chapter 2 discusses the applications of continuous time Markov chains to model queueing systems and discrete time Markov chain for computing the PageRank, the ranking of websites on the Internet. Chapter 3 studies Markovian models for manufacturing and re-manufacturing systems and presents closed form solutions and fast numerical algorithms for solving the captured systems. In Chapter 4, the authors present a simple hidden Markov model (HMM) with fast numerical algorithms for estimating the model parameters. An application of the HMM for customer classification is also presented. Chapter 5 discusses Markov decision processes for customer lifetime values. Customer Lifetime Values (CLV) is an important concept and quantity in marketing management. The authors present an approach based on Markov decision processes for the calculation of CLV using real data. Chapter 6 considers higher-order Markov chain models, particularly a class of parsimonious higher-order Markov chain models. Efficient estimation methods for model parameters based on linear programming are presented. Contemporary research results on applications to demand predictions, inventory control and financial risk measurement are also presented. In Chapter 7, a class of parsimonious multivariate Markov models is introduced. Again, efficient estimation methods based on linear programming are presented. Applications to demand predictions, inventory control policy and modeling credit ratings data are discussed. Finally, Chapter 8 re-visits hidden Markov models, and the authors present a new class of hidden Markov models with efficient algorithms for estimating the model parameters. Applications to modeling interest rates, credit ratings and default data are discussed. This book is aimed at senior undergraduate students, postgraduate students, professionals, practitioners, and researchers in applied mathematics, computational science, operational research, management science and finance, who are interested in the formulation and computation of queueing networks, Markov chain models and related topics. Readers are expected to have some basic knowledge of probability theory, Markov processes and matrix theory.

...read moreread less

Proceedings Article•

Sound and efficient inference with probabilistic and deterministic dependencies

[...]

Hoifung Poon¹, Pedro Domingos¹•Institutions (1)

University of Washington¹

16 Jul 2006

TL;DR: MC-SAT is an inference algorithm that combines ideas from MCMC and satisfiability, based on Markov logic, which defines Markov networks using weighted clauses in first-order logic and greatly outperforms Gibbs sampling and simulated tempering over a broad range of problem sizes and degrees of determinism.

...read moreread less

Abstract: Reasoning with both probabilistic and deterministic dependencies is important for many real-world problems, and in particular for the emerging field of statistical relational learning. However, probabilistic inference methods like MCMC or belief propagation tend to give poor results when deterministic or near-deterministic dependencies are present, and logical ones like satisfiability testing are inapplicable to probabilistic ones. In this paper we propose MC-SAT, an inference algorithm that combines ideas from MCMC and satisfiability. MC-SAT is based on Markov logic, which defines Markov networks using weighted clauses in first-order logic. From the point of view of MCMC, MC-SAT is a slice sampler with an auxiliary variable per clause, and with a satisfiability-based method for sampling the original variables given the auxiliary ones. From the point of view of satisfiability, MCSAT wraps a procedure around the SampleSAT uniform sampler that enables it to sample from highly non-uniform distributions over satisfying assignments. Experiments on entity resolution and collective classification problems show that MC-SAT greatly outperforms Gibbs sampling and simulated tempering over a broad range of problem sizes and degrees of determinism.

...read moreread less

Journal Article•

Kernel-Based Learning of Hierarchical Multilabel Classification Models

[...]

Juho Rousu, Craig Saunders¹, Sandor Szedmak¹, John Shawe-Taylor¹•Institutions (1)

University of Southampton¹

01 Dec 2006-Journal of Machine Learning Research

TL;DR: A kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time and its predictive accuracy was found to be competitive with other recently introduced hierarchical multi-category or multilabel classification learning algorithms.

...read moreread less

Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm's predictive accuracy was found to be competitive with other recently introduced hierarchical multi-category or multilabel classification learning algorithms.

...read moreread less

Journal Article•DOI•

Trans-dimensional inverse problems, model comparison and the evidence

[...]

Malcolm Sambridge¹, Kerry Gallagher², Andrew Jackson³, Peter Rickwood¹•Institutions (3)

Australian National University¹, Imperial College London², ETH Zurich³

01 Nov 2006-Geophysical Journal International

TL;DR: In this paper, a particular type of Markov chain Monte Carlo (MCMC) sampling algorithm is highlighted which allows probabilistic sampling in variable dimension spaces, and it is shown that once evidence calculations are performed, the results of complex variable dimension sampling algorithms can be replicated with simple and more familiar fixed dimensional MCMC sampling techniques.

...read moreread less

Abstract: SUMMARY In most geophysical inverse problems the properties of interest are parametrized using a fixed number of unknowns. In some cases arguments can be used to bound the maximum number of parameters that need to be considered. In others the number of unknowns is set at some arbitrary value and regularization is used to encourage simple, non-extravagant models. In recent times variable or self-adaptive parametrizations have gained in popularity. Rarely, however, is the number of unknowns itself directly treated as an unknown. This situation leads to a transdimensional inverse problem, that is, one where the dimension of the parameter space is a variable to be solved for. This paper discusses trans-dimensional inverse problems from the Bayesian viewpoint. A particular type of Markov chain Monte Carlo (MCMC) sampling algorithm is highlighted which allows probabilistic sampling in variable dimension spaces. A quantity termed the evidence or marginal likelihood plays a key role in this type of problem. It is shown that once evidence calculations are performed, the results of complex variable dimension sampling algorithms can be replicated with simple and more familiar fixed dimensional MCMC sampling techniques. Numerical examples are used to illustrate the main points. The evidence can be difficult to calculate, especially in high-dimensional non-linear inverse problems. Nevertheless some general strategies are discussed and analytical expressions given for certain linear problems.

...read moreread less

Proceedings Article•

Efficient Structure Learning of Markov Networks using L_1-Regularization

[...]

Su-In Lee¹, Varun Ganapathi¹, Daphne Koller¹•Institutions (1)

Stanford University¹

04 Dec 2006

TL;DR: This paper provides a computationally efficient method for learning Markov network structure from data based on the use of L1 regularization on the weights of the log-linear model, which achieves considerably higher generalization performance than the more standard L2-based method (a Gaussian parameter prior or pure maximum-likelihood learning).

...read moreread less

Abstract: Markov networks are commonly used in a wide variety of applications, ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally efficient method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the log-linear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efficient gradient methods. A key issue in this setting is the (unavoidable) use of approximate inference, which can lead to errors in the gradient computation when the network structure is dense. Thus, we explore the use of different feature introduction schemes and compare their performance. We provide results for our method on synthetic data, and on two real world data sets: pixel values in the MNIST data, and genetic sequence variations in the human HapMap data. We show that our L1 -based method achieves considerably higher generalization performance than the more standard L2-based method (a Gaussian parameter prior) or pure maximum-likelihood learning. We also show that we can learn MRF network structure at a computational cost that is not much greater than learning parameters alone, demonstrating the existence of a feasible method for this important problem.

...read moreread less

Journal Article•DOI•

Threshold Regression for Survival Analysis: Modeling Event Times by a Stochastic Process Reaching a Boundary

[...]

Mei-Ling Ting Lee¹, George A. Whitmore•Institutions (1)

Ohio State University¹

01 Nov 2006-Statistical Science

TL;DR: First hitting times arise naturally in many types of stochastic processes, ranging from Wiener processes to Markov chains, and have been investigated as models for survival data.

...read moreread less

Abstract: Many researchers have investigated first hitting times as models for survival data. First hitting times arise naturally in many types of stochastic processes, ranging from Wiener processes to Markov chains. In a survival context, the state of the underlying process represents the strength of an item or the health of an individual. The item fails or the individual experiences a clinical endpoint when the process reaches an adverse threshold state for the first time. The time scale can be calendar time or some other operational measure of degradation or disease progression. In many applications, the process is latent (i.e., unobservable). Threshold regression refers to first-hitting-time models with regression structures that accommodate covariate data. The parameters of the process, threshold state and time scale may depend on the covariates. This paper reviews aspects of this topic and discusses fruitful avenues for future research.

...read moreread less

Journal Article•DOI•

Synchronization and convergence of linear dynamics in random directed networks

[...]

Chai Wah Wu¹•Institutions (1)

IBM¹

10 Jul 2006-IEEE Transactions on Automatic Control

TL;DR: Rather than using Lyapunov type methods, results from the theory of inhomogeneous Markov chains are used in the authors' analysis and it is shown that they are useful for deterministic consensus problems and more general random graph processes.

...read moreread less

Abstract: Recently, methods in stochastic control are used to study the synchronization properties of a nonautonomous discrete-time linear system x(k+1)=G(k)x(k) where the matrices G(k) are derived from a random graph process. The purpose of this note is to extend this analysis to directed graphs and more general random graph processes. Rather than using Lyapunov type methods, we use results from the theory of inhomogeneous Markov chains in our analysis. These results have been used successfully in deterministic consensus problems and we show that they are useful for these problems as well. Sufficient conditions are derived that depend on the types of graphs that have nonvanishing probabilities. For instance, if a scrambling graph occurs with nonzero probability, then the system synchronizes.

...read moreread less

Journal Article•DOI•

Evolutionary game dynamics in a Wright-Fisher process

[...]

Lorens A. Imhof¹, Martin A. Nowak²•Institutions (2)

University of Bonn¹, Harvard University²

07 Feb 2006-Journal of Mathematical Biology

TL;DR: The1/3 law is obtained: if A and B are strict Nash equilibria then selection favors replacement of B by A, if the unstable equilibrium occurs at a frequency of A which is less than 1/3.

...read moreread less

Abstract: Evolutionary game dynamics in finite populations can be described by a frequency dependent, stochastic Wright-Fisher process. We consider a symmetric game between two strategies, A and B. There are discrete generations. In each generation, individuals produce offspring proportional to their payoff. The next generation is sampled randomly from this pool of offspring. The total population size is constant. The resulting Markov process has two absorbing states corresponding to homogeneous populations of all A or all B. We quantify frequency dependent selection by comparing the absorption probabilities to the corresponding probabilities under random drift. We derive conditions for selection to favor one strategy or the other by using the concept of total positivity. In the limit of weak selection, we obtain the 1/3 law: if A and B are strict Nash equilibria then selection favors replacement of B by A, if the unstable equilibrium occurs at a frequency of A which is less than 1/3.

...read moreread less

Collapse