scispace - formally typeset
Search or ask a question

Showing papers on "Markov chain published in 2006"


01 Mar 2006
TL;DR: Bayesian inference with Markov Chain Monte Carlo with coda package for R contains a set of functions designed to help the user answer questions about how many samples are required to accurately estimate posterior quantities of interest.
Abstract: [1st paragraph] At first sight, Bayesian inference with Markov Chain Monte Carlo (MCMC) appears to be straightforward. The user defines a full probability model, perhaps using one of the programs discussed in this issue; an underlying sampling engine takes the model definition and returns a sequence of dependent samples from the posterior distribution of the model parameters, given the supplied data. The user can derive any summary of the posterior distribution from this sample. For example, to calculate a 95% credible interval for a parameter α, it suffices to take 1000 MCMC iterations of α and sort them so that α1<α2<...<α1000. The credible interval estimate is then (α25, α975). However, there is a price to be paid for this simplicity. Unlike most numerical methods used in statistical inference, MCMC does not give a clear indication of whether it has converged. The underlying Markov chain theory only guarantees that the distribution of the output will converge to the posterior in the limit as the number of iterations increases to infinity. The user is generally ignorant about how quickly convergence occurs, and therefore has to fall back on post hoc testing of the sampled output. By convention, the sample is divided into two parts: a “burn in” period during which all samples are discarded, and the remainder of the run in which the chain is considered to have converged sufficiently close to the limiting distribution to be used. Two questions then arise: 1. How long should the burn in period be? 2. How many samples are required to accurately estimate posterior quantities of interest? The coda package for R contains a set of functions designed to help the user answer these questions. Some of these convergence diagnostics are simple graphical ways of summarizing the data. Others are formal statistical tests.

3,098 citations


Journal ArticleDOI
TL;DR: Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach to combining first-order logic and probabilistic graphical models in a single representation.
Abstract: We propose a simple approach to combining first-order logic and probabilistic graphical models in a single representation. A Markov logic network (MLN) is a first-order knowledge base with a weight attached to each formula (or clause). Together with a set of constants representing objects in the domain, it specifies a ground Markov network containing one feature for each possible grounding of a first-order formula in the KB, with the corresponding weight. Inference in MLNs is performed by MCMC over the minimal subset of the ground network required for answering the query. Weights are efficiently learned from relational databases by iteratively optimizing a pseudo-likelihood measure. Optionally, additional clauses are learned using inductive logic programming techniques. Experiments with a real-world database and knowledge base in a university domain illustrate the promise of this approach.

2,916 citations


Book
08 Aug 2006
TL;DR: This book should help newcomers to the field to understand how finite mixture and Markov switching models are formulated, what structures they imply on the data, what they could be used for, and how they are estimated.
Abstract: WINNER OF THE 2007 DEGROOT PRIZE! The prominence of finite mixture modelling is greater than ever. Many important statistical topics like clustering data, outlier treatment, or dealing with unobserved heterogeneity involve finite mixture models in some way or other. The area of potential applications goes beyond simple data analysis and extends to regression analysis and to non-linear time series analysis using Markov switching models. For more than the hundred years since Karl Pearson showed in 1894 how to estimate the five parameters of a mixture of two normal distributions using the method of moments, statistical inference for finite mixture models has been a challenge to everybody who deals with them. In the past ten years, very powerful computational tools emerged for dealing with these models which combine a Bayesian approach with recent Monte simulation techniques based on Markov chains. This book reviews these techniques and covers the most recent advances in the field, among them bridge sampling techniques and reversible jump Markov chain Monte Carlo methods. It is the first time that the Bayesian perspective of finite mixture modelling is systematically presented in book form. It is argued that the Bayesian approach provides much insight in this context and is easily implemented in practice. Although the main focus is on Bayesian inference, the author reviews several frequentist techniques, especially selecting the number of components of a finite mixture model, and discusses some of their shortcomings compared to the Bayesian approach. The aim of this book is to impart the finite mixture and Markov switching approach to statistical modelling to a wide-ranging community. This includes not only statisticians, but also biologists, economists, engineers, financial agents, market researcher, medical researchers or any other frequent user of statistical models. This book should help newcomers to the field to understand how finite mixture and Markov switching models are formulated, what structures they imply on the data, what they could be used for, and how they are estimated. Researchers familiar with the subject also will profit from reading this book. The presentation is rather informal without abandoning mathematical correctness. Previous notions of Bayesian inference and Monte Carlo simulation are useful but not needed.

1,642 citations


Proceedings ArticleDOI
20 Aug 2006
TL;DR: An LDA-style topic model is presented that captures not only the low-dimensional structure of data, but also how the structure changes over time, showing improved topics, better timestamp prediction, and interpretable trends.
Abstract: This paper presents an LDA-style topic model that captures not only the low-dimensional structure of data, but also how the structure changes over time. Unlike other recent work that relies on Markov assumptions or discretization of time, here each topic is associated with a continuous distribution over timestamps, and for each generated document, the mixture distribution over topics is influenced by both word co-occurrences and the document's timestamp. Thus, the meaning of a particular topic can be relied upon as constant, but the topics' occurrence and correlations change significantly over time. We present results on nine months of personal email, 17 years of NIPS research papers and over 200 years of presidential state-of-the-union addresses, showing improved topics, better timestamp prediction, and interpretable trends.

1,327 citations


Journal ArticleDOI
TL;DR: It is demonstrated that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins.
Abstract: In recent years, model based approaches such as maximum likelihood have become the methods of choice for constructing phylogenies. A number of authors have shown the importance of using adequate substitution models in order to produce accurate phylogenies. In the past, many empirical models of amino acid substitution have been derived using a variety of different methods and protein datasets. These matrices are normally used as surrogates, rather than deriving the maximum likelihood model from the dataset being examined. With few exceptions, selection between alternative matrices has been carried out in an ad hoc manner. We start by highlighting the potential dangers of arbitrarily choosing protein models by demonstrating an empirical example where a single alignment can produce two topologically different and strongly supported phylogenies using two different arbitrarily-chosen amino acid substitution models. We demonstrate that in simple simulations, statistical methods of model selection are indeed robust and likely to be useful for protein model selection. We have investigated patterns of amino acid substitution among homologous sequences from the three Domains of life and our results show that no single amino acid matrix is optimal for any of the datasets. Perhaps most interestingly, we demonstrate that for two large datasets derived from the proteobacteria and archaea, one of the most favored models in both datasets is a model that was originally derived from retroviral Pol proteins. This demonstrates that choosing protein models based on their source or method of construction may not be appropriate.

1,067 citations


Yann LeCun, Sumit Chopra1, Raia Hadsell1, Aurelio Ranzato1, Fu Jie Huang1 
01 Jan 2006
TL;DR: The EBM approach provides a common theoretical framework for many learning models, including traditional discr iminative and generative approaches, as well as graph-transformer networks, co nditional random fields, maximum margin Markov networks, and several manifold learning methods.
Abstract: Energy-Based Models (EBMs) capture dependencies between variables by associating a scalar energy to each configuration of the variab les. Inference consists in clamping the value of observed variables and finding config urations of the remaining variables that minimize the energy. Learning consists in finding an energy function in which observed configurations of the variables a re given lower energies than unobserved ones. The EBM approach provides a common theoretical framework for many learning models, including traditional discr iminative and generative approaches, as well as graph-transformer networks, co nditional random fields, maximum margin Markov networks, and several manifold learning methods. Probabilistic models must be properly normalized, which sometimes requires evaluating intractable integrals over the space of all poss ible variable configurations. Since EBMs have no requirement for proper normalization, this problem is naturally circumvented. EBMs can be viewed as a form of non-probabilistic factor graphs, and they provide considerably more flexibility in th e design of architectures and training criteria than probabilistic approaches .

1,022 citations


Journal ArticleDOI
TL;DR: A Bayesian method for investigating correlated evolution of discrete binary traits on phylogenetic trees is described and the question of whether mating system and advertisement of estrus by females have coevolved in the Old World monkeys and great apes is illustrated.
Abstract: We describe a Bayesian method for investigating correlated evolution of discrete binary traits on phylogenetic trees. The method fits a continuous-time Markov model to a pair of traits, seeking the best fitting models that describe their joint evolution on a phylogeny. We employ the methodology of reversible-jump (RJ) Markov chain Monte Carlo to search among the large number of possible models, some of which conform to independent evolution of the two traits, others to correlated evolution. The RJ Markov chain visits these models in proportion to their posterior probabilities, thereby directly estimating the support for the hypothesis of correlated evolution. In addition, the RJ Markov chain simultaneously estimates the posterior distributions of the rate parameters of the model of trait evolution. These posterior distributions can be used to test among alternative evolutionary scenarios to explain the observed data. All results are integrated over a sample of phylogenetic trees to account for phylogenetic uncertainty. We implement the method in a program called RJ Discrete and illustrate it by analyzing the question of whether mating system and advertisement of estrus by females have coevolved in the Old World monkeys and great apes.

868 citations


Journal ArticleDOI
TL;DR: The essential ideas of DE and MCMC are integrated, resulting in Differential Evolution Markov Chain (DE-MC), a population MCMC algorithm, in which multiple chains are run in parallel, showing simplicity, speed of calculation and convergence, even for nearly collinear parameters and multimodal densities.
Abstract: Differential Evolution (DE) is a simple genetic algorithm for numerical optimization in real parameter spaces. In a statistical context one would not just want the optimum but also its uncertainty. The uncertainty distribution can be obtained by a Bayesian analysis (after specifying prior and likelihood) using Markov Chain Monte Carlo (MCMC) simulation. This paper integrates the essential ideas of DE and MCMC, resulting in Differential Evolution Markov Chain (DE-MC). DE-MC is a population MCMC algorithm, in which multiple chains are run in parallel. DE-MC solves an important problem in MCMC, namely that of choosing an appropriate scale and orientation for the jumping distribution. In DE-MC the jumps are simply a fixed multiple of the differences of two random parameter vectors that are currently in the population. The selection process of DE-MC works via the usual Metropolis ratio which defines the probability with which a proposal is accepted. In tests with known uncertainty distributions, the efficiency of DE-MC with respect to random walk Metropolis with optimal multivariate Normal jumps ranged from 68% for small population sizes to 100% for large population sizes and even to 500% for the 97.5% point of a variable from a 50-dimensional Student distribution. Two Bayesian examples illustrate the potential of DE-MC in practice. DE-MC is shown to facilitate multidimensional updates in a multi-chain "Metropolis-within-Gibbs" sampling approach. The advantage of DE-MC over conventional MCMC are simplicity, speed of calculation and convergence, even for nearly collinear parameters and multimodal densities.

839 citations


Journal ArticleDOI
TL;DR: An analysis scheme is developed that casts single-molecule time-binned FRET trajectories as hidden Markov processes, allowing one to determine, based on probability alone, the most likely FRET-value distributions of states and their interconversion rates while simultaneously determining the mostlikely time sequence of underlying states for each trajectory.

742 citations


Proceedings ArticleDOI
09 Jul 2006
TL;DR: It is shown that a combined strategy of block Markov superposition coding and Wyner-Ziv coding achieves the cut-set upper bound on the sum-rate of the two-way relay channel when the relay is in the proximity of one of the terminals.
Abstract: We study the two-way communication problem for the relay channel. Hereby, two terminals communicate simultaneously in both directions with the help of one relay. We consider the restricted two-way problem, i.e., the encoders at both terminals do not cooperate. We provide achievable rate regions for different cooperation strategies, such as decode-and-forward based on block Markov superposition coding and compress-and-forward based on Wyner-Ziv source coding. We also evaluate the regions for the special case of additive white Gaussian noise channels. We show that a combined strategy of block Markov superposition coding and Wyner-Ziv coding achieves the cut-set upper bound on the sum-rate of the two-way relay channel when the relay is in the proximity of one of the terminals.

558 citations


Journal ArticleDOI
TL;DR: In this article, a quantum dynamic model of decision-making is presented, and it is compared with a previously established Markov model, which is formulated as random walk decision processes, but the probabilistic principles differ between the two approaches.

Journal ArticleDOI
TL;DR: A new taxonomy of model structures is developed, based on key requirements, including output requirements, the population size, and system complexity, for modelling infectious diseases and systems with constrained resources.
Abstract: Models for the economic evaluation of health technologies provide valuable information to decision makers. The choice of model structure is rarely discussed in published studies and can affect the results produced. Many papers describe good modelling practice, but few describe how to choose from the many types of available models. This paper develops a new taxonomy of model structures. The horizontal axis of the taxonomy describes assumptions about the role of expected values, randomness, the heterogeneity of entities, and the degree of non-Markovian structure. Commonly used aggregate models, including decision trees and Markov models require large population numbers, homogeneous sub-groups and linear interactions. Individual models are more flexible, but may require replications with different random numbers to estimate expected values. The vertical axis of the taxonomy describes potential interactions between the individual actors, as well as how the interactions occur through time. Models using interactions, such as system dynamics, some Markov models, and discrete event simulation are fairly uncommon in the health economics but are necessary for modelling infectious diseases and systems with constrained resources. The paper provides guidance for choosing a model, based on key requirements, including output requirements, the population size, and system complexity.

Book ChapterDOI
07 May 2006
TL;DR: A set of energy minimization benchmarks, which are used to compare the solution quality and running time of several common energy minimizations algorithms, as well as a general-purpose software interface that allows vision researchers to easily switch between optimization methods with minimal overhead.
Abstract: One of the most exciting advances in early vision has been the development of efficient energy minimization algorithms. Many early vision tasks require labeling each pixel with some quantity such as depth or texture. While many such problems can be elegantly expressed in the language of Markov Random Fields (MRF's), the resulting energy minimization problems were widely viewed as intractable. Recently, algorithms such as graph cuts and loopy belief propagation (LBP) have proven to be very powerful: for example, such methods form the basis for almost all the top-performing stereo methods. Unfortunately, most papers define their own energy function, which is minimized with a specific algorithm of their choice. As a result, the tradeoffs among different energy minimization algorithms are not well understood. In this paper we describe a set of energy minimization benchmarks, which we use to compare the solution quality and running time of several common energy minimization algorithms. We investigate three promising recent methods—graph cuts, LBP, and tree-reweighted message passing—as well as the well-known older iterated conditional modes (ICM) algorithm. Our benchmark problems are drawn from published energy functions used for stereo, image stitching and interactive segmentation. We also provide a general-purpose software interface that allows vision researchers to easily switch between optimization methods with minimal overhead. We expect that the availability of our benchmarks and interface will make it significantly easier for vision researchers to adopt the best method for their specific problems. Benchmarks, code, results and images are available at http://vision.middlebury.edu/MRF.

Journal ArticleDOI
TL;DR: In this paper, a simple method for generating alternative CTPDFs that can significantly speed up the convergence of MCMC by 1-3 orders of magnitude is presented. But the method is not suitable for the detection of multiple-planar systems.
Abstract: Precise radial velocity measurements have led to the discovery of ~170 extrasolar planetary systems. Understanding the uncertainties in the orbital solutions will become increasingly important as the discovery space for extrasolar planets shifts to planets with smaller masses and longer orbital periods. The method of Markov chain Monte Carlo (MCMC) provides a rigorous method for quantifying the uncertainties in orbital parameters in a Bayesian framework (Paper I). The main practical challenge for the general application of MCMC is the need to construct Markov chains that quickly converge. The rate of convergence is very sensitive to the choice of the candidate transition probability distribution function (CTPDF). Here we explain one simple method for generating alternative CTPDFs that can significantly speed convergence by 1-3 orders of magnitude. We have numerically tested dozens of CTPDFs with simulated radial velocity data sets to identify those that perform well for different types of orbits and suggest a set of CTPDFs for general application. In addition, we introduce other refinements to the MCMC algorithm for radial velocity planets, including an improved treatment of the uncertainties in the radial velocity observations, an algorithm for automatically choosing step sizes, an algorithm for automatically determining reasonable stopping times, and the use of importance sampling for including the dynamical evolution of multiple-planet systems. Together, these improvements make it practical to apply MCMC to multiple-planet systems. We demonstrate the improvements in efficiency by analyzing a variety of extrasolar planetary systems.

Proceedings Article
04 Dec 2006
TL;DR: A class of MPDs which greatly simplify Reinforcement Learning, which have discrete state spaces and continuous control spaces and enable efficient approximations to traditional MDPs.
Abstract: We introduce a class of MPDs which greatly simplify Reinforcement Learning. They have discrete state spaces and continuous control spaces. The controls have the effect of rescaling the transition probabilities of an underlying Markov chain. A control cost penalizing KL divergence between controlled and uncontrolled transition probabilities makes the minimization problem convex, and allows analytical computation of the optimal controls given the optimal value function. An exponential transformation of the optimal value function makes the minimized Bellman equation linear. Apart from their theoretical significance, the new MDPs enable efficient approximations to traditional MDPs. Shortest path problems are approximated to arbitrary precision with largest eigenvalue problems, yielding an O (n) algorithm. Accurate approximations to generic MDPs are obtained via continuous embedding reminiscent of LP relaxation in integer programming. Off-policy learning of the optimal value function is possible without need for state-action values; the new algorithm (Z-learning) outperforms Q-learning.

Journal ArticleDOI
TL;DR: The authors decompose the covariances into correlations and standard deviations and the correlation matrix follows a regime switching model; it is constant within a regime but different across regimes, and the transitions between the regimes are governed by a Markov chain.

Proceedings ArticleDOI
18 Dec 2006
TL;DR: A well-founded, integrated solution to the entity resolution problem based on Markov logic, which combines first-order logic and probabilistic graphical models by attaching weights to first- order formulas, and viewing them as templates for features of Markov networks.
Abstract: Entity resolution is the problem of determining which records in a database refer to the same entities, and is a crucial and expensive step in the data mining process. Interest in it has grown rapidly in recent years, and many approaches have been proposed. However, they tend to address only isolated aspects of the problem, and are often ad hoc. This paper proposes a well-founded, integrated solution to the entity resolution problem based on Markov logic. Markov logic combines first-order logic and probabilistic graphical models by attaching weights to first-order formulas, and viewing them as templates for features of Markov networks. We show how a number of previous approaches can be formulated and seamlessly combined in Markov logic, and how the resulting learning and inference problems can be solved efficiently. Experiments on two citation databases show the utility of this approach, and evaluate the contribution of the different components.

Journal Article
TL;DR: A Bayesian framework for parsing images into their constituent visual patterns that optimizes the posterior probability and outputs a scene representation as a “parsing graph”, in a spirit similar to parsing sentences in speech and natural language is presented.
Abstract: In this chapter we present a Bayesian framework for parsing images into their constituent visual patterns. The parsing algorithm optimizes the posterior probability and outputs a scene representation as a parsing graph, in a spirit similar to parsing sentences in speech and natural language. The algorithm constructs the parsing graph and re-configures it dynamically using a set of moves, which are mostly reversible Markov chain jumps. This computational framework integrates two popular inference approaches - generative (top-down) methods and discriminative (bottom-up) methods. The former formulates the posterior probability in terms of generative models for images defined by likelihood functions and priors. The latter computes discriminative probabilities based on a sequence (cascade) of bottom-up tests/filters. In our Markov chain algorithm design, the posterior probability, defined by the generative models, is the invariant (target) probability for the Markov chain, and the discriminative probabilities are used to construct proposal probabilities to drive the Markov chain. Intuitively, the bottom-up discriminative probabilities activate top-down generative models. In this chapter, we focus on two types of visual patterns - generic visual patterns, such as texture and shading, and object patterns including human faces and text. These types of patterns compete and cooperate to explain the image and so image parsing unifies image segmentation, object detection, and recognition (if we use generic visual patterns only then image parsing will correspond to image segmentation [48].). We illustrate our algorithm on natural images of complex city scenes and show examples where image segmentation can be improved by allowing object specific knowledge to disambiguate low-level segmentation cues, and conversely where object detection can be improved by using generic visual patterns to explain away shadows and occlusions.

Book
25 May 2006
TL;DR: The strength of the main techniques are illustrated by way of simple examples, a recent result on the Pollard Rho random walk to compute the discrete logarithm, as well as with an improved analysis of the Thorp shuffle.
Abstract: In the past few years we have seen a surge in the theory of finite Markov chains, by way of new techniques to bounding the convergence to stationarity. This includes functional techniques such as logarithmic Sobolev and Nash inequalities, refined spectral and entropy techniques, and isoperimetric techniques such as the average and blocking conductance and the evolving set methodology. We attempt to give a more or less self-contained treatment of some of these modern techniques, after reviewing several preliminaries. We also review classical and modern lower bounds on mixing times. There have been other important contributions to this theory such as variants on coupling techniques and decomposition methods, which are not included here; our choice was to keep the analytical methods as the theme of this presentation. We illustrate the strength of the main techniques by way of simple examples, a recent result on the Pollard Rho random walk to compute the discrete logarithm, as well as with an improved analysis of the Thorp shuffle.

Journal ArticleDOI
TL;DR: In this paper, the authors evaluate historic international emissions distributions and forecast future distributions to assess whether per capita emissions have been converging or will converge, and find evidence of convergence among 23 member countries of the Organisation for Economic Co-operation and Development (OECD), whereas emissions appear to be diverging for an 88-country global sample over 1960-2000.
Abstract: Understanding and considering the distribution of per capita carbon dioxide (CO2) emissions is important in designing international climate change proposals and incentives for participation. I evaluate historic international emissions distributions and forecast future distributions to assess whether per capita emissions have been converging or will converge. I find evidence of convergence among 23 member countries of the Organisation for Economic Co-operation and Development (OECD), whereas emissions appear to be diverging for an 88-country global sample over 1960–2000. Forecasts based on a Markov chain transition matrix provide little evidence of future emissions convergence and indicate that emissions may diverge in the near term. I also review the shortcomings of environmental Kuznets curve regressions and structural models in characterizing future emissions distributions.

Journal ArticleDOI
TL;DR: This note characterizes the impact of adding rare stochastic mutations to an “imitation dynamic,†meaning a process with the properties that absent strategies remain absent, and non-homogeneous states are transient.

Journal ArticleDOI
01 Oct 2006-Genetics
TL;DR: A new Bayesian clustering algorithm for studying population structure using individually geo-referenced multilocus data sets based on the concept of hidden Markov random field, which models the spatial dependencies at the cluster membership level is introduced.
Abstract: We introduce a new Bayesian clustering algorithm for studying population structure using individually geo-referenced multilocus data sets. The algorithm is based on the concept of hidden Markov random field, which models the spatial dependencies at the cluster membership level. We argue that (i) a Markov chain Monte Carlo procedure can implement the algorithm efficiently, (ii) it can detect significant geographical discontinuities in allele frequencies and regulate the number of clusters, (iii) it can check whether the clusters obtained without the use of spatial priors are robust to the hypothesis of discontinuous geographical variation in allele frequencies, and (iv) it can reduce the number of loci required to obtain accurate assignments. We illustrate and discuss the implementation issues with the Scandinavian brown bear and the human CEPH diversity panel data set.

Book
01 Jan 2006
TL;DR: This new edition of Markov Chains: Models, Algorithms and Applications has been completely reformatted as a text, complete with end-of-chapter exercises, a new focus on management science, new applications of the models, and new examples with applications in financial risk management and modeling of financial data.
Abstract: This new edition of Markov Chains: Models, Algorithms and Applications has been completely reformatted as a text, complete with end-of-chapter exercises, a new focus on management science, new applications of the models, and new examples with applications in financial risk management and modeling of financial data. This book consists of eight chapters. Chapter 1 gives a brief introduction to the classical theory on both discrete and continuous time Markov chains. The relationship between Markov chains of finite states and matrix theory will also be highlighted. Some classical iterative methods for solving linear systems will be introduced for finding the stationary distribution of a Markov chain. The chapter then covers the basic theories and algorithms for hidden Markov models (HMMs) and Markov decision processes (MDPs). Chapter 2 discusses the applications of continuous time Markov chains to model queueing systems and discrete time Markov chain for computing the PageRank, the ranking of websites on the Internet. Chapter 3 studies Markovian models for manufacturing and re-manufacturing systems and presents closed form solutions and fast numerical algorithms for solving the captured systems. In Chapter 4, the authors present a simple hidden Markov model (HMM) with fast numerical algorithms for estimating the model parameters. An application of the HMM for customer classification is also presented. Chapter 5 discusses Markov decision processes for customer lifetime values. Customer Lifetime Values (CLV) is an important concept and quantity in marketing management. The authors present an approach based on Markov decision processes for the calculation of CLV using real data. Chapter 6 considers higher-order Markov chain models, particularly a class of parsimonious higher-order Markov chain models. Efficient estimation methods for model parameters based on linear programming are presented. Contemporary research results on applications to demand predictions, inventory control and financial risk measurement are also presented. In Chapter 7, a class of parsimonious multivariate Markov models is introduced. Again, efficient estimation methods based on linear programming are presented. Applications to demand predictions, inventory control policy and modeling credit ratings data are discussed. Finally, Chapter 8 re-visits hidden Markov models, and the authors present a new class of hidden Markov models with efficient algorithms for estimating the model parameters. Applications to modeling interest rates, credit ratings and default data are discussed. This book is aimed at senior undergraduate students, postgraduate students, professionals, practitioners, and researchers in applied mathematics, computational science, operational research, management science and finance, who are interested in the formulation and computation of queueing networks, Markov chain models and related topics. Readers are expected to have some basic knowledge of probability theory, Markov processes and matrix theory.

Proceedings Article
16 Jul 2006
TL;DR: MC-SAT is an inference algorithm that combines ideas from MCMC and satisfiability, based on Markov logic, which defines Markov networks using weighted clauses in first-order logic and greatly outperforms Gibbs sampling and simulated tempering over a broad range of problem sizes and degrees of determinism.
Abstract: Reasoning with both probabilistic and deterministic dependencies is important for many real-world problems, and in particular for the emerging field of statistical relational learning. However, probabilistic inference methods like MCMC or belief propagation tend to give poor results when deterministic or near-deterministic dependencies are present, and logical ones like satisfiability testing are inapplicable to probabilistic ones. In this paper we propose MC-SAT, an inference algorithm that combines ideas from MCMC and satisfiability. MC-SAT is based on Markov logic, which defines Markov networks using weighted clauses in first-order logic. From the point of view of MCMC, MC-SAT is a slice sampler with an auxiliary variable per clause, and with a satisfiability-based method for sampling the original variables given the auxiliary ones. From the point of view of satisfiability, MCSAT wraps a procedure around the SampleSAT uniform sampler that enables it to sample from highly non-uniform distributions over satisfying assignments. Experiments on entity resolution and collective classification problems show that MC-SAT greatly outperforms Gibbs sampling and simulated tempering over a broad range of problem sizes and degrees of determinism.

Journal Article
TL;DR: A kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time and its predictive accuracy was found to be competitive with other recently introduced hierarchical multi-category or multilabel classification learning algorithms.
Abstract: We present a kernel-based algorithm for hierarchical text classification where the documents are allowed to belong to more than one category at a time. The classification model is a variant of the Maximum Margin Markov Network framework, where the classification hierarchy is represented as a Markov tree equipped with an exponential family defined on the edges. We present an efficient optimization algorithm based on incremental conditional gradient ascent in single-example subspaces spanned by the marginal dual variables. The optimization is facilitated with a dynamic programming based algorithm that computes best update directions in the feasible set. Experiments show that the algorithm can feasibly optimize training sets of thousands of examples and classification hierarchies consisting of hundreds of nodes. Training of the full hierarchical model is as efficient as training independent SVM-light classifiers for each node. The algorithm's predictive accuracy was found to be competitive with other recently introduced hierarchical multi-category or multilabel classification learning algorithms.

Journal ArticleDOI
TL;DR: In this paper, a particular type of Markov chain Monte Carlo (MCMC) sampling algorithm is highlighted which allows probabilistic sampling in variable dimension spaces, and it is shown that once evidence calculations are performed, the results of complex variable dimension sampling algorithms can be replicated with simple and more familiar fixed dimensional MCMC sampling techniques.
Abstract: SUMMARY In most geophysical inverse problems the properties of interest are parametrized using a fixed number of unknowns. In some cases arguments can be used to bound the maximum number of parameters that need to be considered. In others the number of unknowns is set at some arbitrary value and regularization is used to encourage simple, non-extravagant models. In recent times variable or self-adaptive parametrizations have gained in popularity. Rarely, however, is the number of unknowns itself directly treated as an unknown. This situation leads to a transdimensional inverse problem, that is, one where the dimension of the parameter space is a variable to be solved for. This paper discusses trans-dimensional inverse problems from the Bayesian viewpoint. A particular type of Markov chain Monte Carlo (MCMC) sampling algorithm is highlighted which allows probabilistic sampling in variable dimension spaces. A quantity termed the evidence or marginal likelihood plays a key role in this type of problem. It is shown that once evidence calculations are performed, the results of complex variable dimension sampling algorithms can be replicated with simple and more familiar fixed dimensional MCMC sampling techniques. Numerical examples are used to illustrate the main points. The evidence can be difficult to calculate, especially in high-dimensional non-linear inverse problems. Nevertheless some general strategies are discussed and analytical expressions given for certain linear problems.

Proceedings Article
04 Dec 2006
TL;DR: This paper provides a computationally efficient method for learning Markov network structure from data based on the use of L1 regularization on the weights of the log-linear model, which achieves considerably higher generalization performance than the more standard L2-based method (a Gaussian parameter prior or pure maximum-likelihood learning).
Abstract: Markov networks are commonly used in a wide variety of applications, ranging from computer vision, to natural language, to computational biology. In most current applications, even those that rely heavily on learned models, the structure of the Markov network is constructed by hand, due to the lack of effective algorithms for learning Markov network structure from data. In this paper, we provide a computationally efficient method for learning Markov network structure from data. Our method is based on the use of L1 regularization on the weights of the log-linear model, which has the effect of biasing the model towards solutions where many of the parameters are zero. This formulation converts the Markov network learning problem into a convex optimization problem in a continuous space, which can be solved using efficient gradient methods. A key issue in this setting is the (unavoidable) use of approximate inference, which can lead to errors in the gradient computation when the network structure is dense. Thus, we explore the use of different feature introduction schemes and compare their performance. We provide results for our method on synthetic data, and on two real world data sets: pixel values in the MNIST data, and genetic sequence variations in the human HapMap data. We show that our L1 -based method achieves considerably higher generalization performance than the more standard L2-based method (a Gaussian parameter prior) or pure maximum-likelihood learning. We also show that we can learn MRF network structure at a computational cost that is not much greater than learning parameters alone, demonstrating the existence of a feasible method for this important problem.

Journal ArticleDOI
TL;DR: First hitting times arise naturally in many types of stochastic processes, ranging from Wiener processes to Markov chains, and have been investigated as models for survival data.
Abstract: Many researchers have investigated first hitting times as models for survival data. First hitting times arise naturally in many types of stochastic processes, ranging from Wiener processes to Markov chains. In a survival context, the state of the underlying process represents the strength of an item or the health of an individual. The item fails or the individual experiences a clinical endpoint when the process reaches an adverse threshold state for the first time. The time scale can be calendar time or some other operational measure of degradation or disease progression. In many applications, the process is latent (i.e., unobservable). Threshold regression refers to first-hitting-time models with regression structures that accommodate covariate data. The parameters of the process, threshold state and time scale may depend on the covariates. This paper reviews aspects of this topic and discusses fruitful avenues for future research.

Journal ArticleDOI
Chai Wah Wu1
TL;DR: Rather than using Lyapunov type methods, results from the theory of inhomogeneous Markov chains are used in the authors' analysis and it is shown that they are useful for deterministic consensus problems and more general random graph processes.
Abstract: Recently, methods in stochastic control are used to study the synchronization properties of a nonautonomous discrete-time linear system x(k+1)=G(k)x(k) where the matrices G(k) are derived from a random graph process. The purpose of this note is to extend this analysis to directed graphs and more general random graph processes. Rather than using Lyapunov type methods, we use results from the theory of inhomogeneous Markov chains in our analysis. These results have been used successfully in deterministic consensus problems and we show that they are useful for these problems as well. Sufficient conditions are derived that depend on the types of graphs that have nonvanishing probabilities. For instance, if a scrambling graph occurs with nonzero probability, then the system synchronizes.

Journal ArticleDOI
TL;DR: The1/3 law is obtained: if A and B are strict Nash equilibria then selection favors replacement of B by A, if the unstable equilibrium occurs at a frequency of A which is less than 1/3.
Abstract: Evolutionary game dynamics in finite populations can be described by a frequency dependent, stochastic Wright-Fisher process. We consider a symmetric game between two strategies, A and B. There are discrete generations. In each generation, individuals produce offspring proportional to their payoff. The next generation is sampled randomly from this pool of offspring. The total population size is constant. The resulting Markov process has two absorbing states corresponding to homogeneous populations of all A or all B. We quantify frequency dependent selection by comparing the absorption probabilities to the corresponding probabilities under random drift. We derive conditions for selection to favor one strategy or the other by using the concept of total positivity. In the limit of weak selection, we obtain the 1/3 law: if A and B are strict Nash equilibria then selection favors replacement of B by A, if the unstable equilibrium occurs at a frequency of A which is less than 1/3.