scispace - formally typeset
Search or ask a question
Author

Persi Diaconis

Bio: Persi Diaconis is an academic researcher from Stanford University. The author has contributed to research in topics: Markov chain & Random walk. The author has an hindex of 80, co-authored 341 publications receiving 28431 citations. Previous affiliations of Persi Diaconis include University of Nice Sophia Antipolis & Cornell University.


Papers
More filters
Book
01 Jan 1988

1,522 citations

Journal ArticleDOI
TL;DR: In this article, a probability density on an interval I, finite or infinite, including its finite endpoints, if any; and f vanishes outside of I. To define this object, choose a reference point xosI and a cell width h.
Abstract: Let f be a probability density on an interval I, finite or infinite: I includes its finite endpoints, if any; and f vanishes outside of I. Let X1, . . . ,X k be independent random variables, with common density f The empirical histogram for the X's is often used to estimate f To define this object, choose a reference point xosI and a cell width h. Let Nj be the number of X's falling in the j th class interval:

1,336 citations

Journal ArticleDOI
TL;DR: The bootstrap method is examined and evaluated as an example of this new generation of statistical tools that take advantage of the high speed digital computer and free the statistician to attack more complicated problems.
Abstract: In the past few years there has been a surge in the development of new statistical theories and methods that take advantage of the high speed digital computer. The payoff for such intensive computation methods is freedom from two limiting factors that have dominated statistical theory since its beginning: the assumption that the data conform to a bell-shaped curve and the need to focus on statistical measures whose theoretical properties can be analyzed mathematically. The new methods free the statistician to attack more complicated problems, exploiting a wider array of statistical tools. The bootstrap method is examined and evaluated as an example of this new generation of statistical tools.

1,040 citations

Journal ArticleDOI
TL;DR: In this article, the second largest eigenvalue and spectral gap of a reversible Markov chain were derived for the random walk associated to approximate computation of the permanent. But these bounds depend on geometric quantities such as the maximum degree, diameter and covering number of associated graphs.
Abstract: We develop bounds for the second largest eigenvalue and spectral gap of a reversible Markov chain. The bounds depend on geometric quantities such as the maximum degree, diameter and covering number of associated graphs. The bounds compare well with exact answers for a variety of simple chains and seem better than bounds derived through Cheeger-like inequalities. They offer improved rates of convergence for the random walk associated to approximate computation of the permanent.

947 citations

Book
01 Jan 2003
TL;DR: The Lagrange dual of the fastest mixing Markov chain problem is derived, which gives a sophisticated method for obtaining (arbitrarily good) bounds on the optimal mixing rate, as well as the optimality conditions.
Abstract: We consider a symmetric random walk on a connected graph, where each edge is labeled with the probability of transition between the two adjacent vertices. The associated Markov chain has a uniform equilibrium distribution; the rate of convergence to this distribution, i.e., the mixing rate of the Markov chain, is determined by the second largest eigenvalue modulus (SLEM) of the transition probability matrix. In this paper we address the problem of assigning probabilities to the edges of the graph in such a way as to minimize the SLEM, i.e., the problem of finding the fastest mixing Markov chain on the graph. We show that this problem can be formulated as a convex optimization problem, which can in turn be expressed as a semidefinite program (SDP). This allows us to easily compute the (globally) fastest mixing Markov chain for any graph with a modest number of edges (say, $1000$) using standard numerical methods for SDPs. Larger problems can be solved by exploiting various types of symmetry and structure in the problem, and far larger problems (say, 100,000 edges) can be solved using a subgradient method we describe. We compare the fastest mixing Markov chain to those obtained using two commonly used heuristics: the maximum-degree method, and the Metropolis--Hastings algorithm. For many of the examples considered, the fastest mixing Markov chain is substantially faster than those obtained using these heuristic methods. We derive the Lagrange dual of the fastest mixing Markov chain problem, which gives a sophisticated method for obtaining (arbitrarily good) bounds on the optimal mixing rate, as well as the optimality conditions. Finally, we describe various extensions of the method, including a solution of the problem of finding the fastest mixing reversible Markov chain, on a fixed graph, with a given equilibrium distribution.

750 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The recently‐developed statistical method known as the “bootstrap” can be used to place confidence intervals on phylogenies and shows significant evidence for a group if it is defined by three or more characters.
Abstract: The recently-developed statistical method known as the "bootstrap" can be used to place confidence intervals on phylogenies. It involves resampling points from one's own data, with replacement, to create a series of bootstrap samples of the same size as the original data. Each of these is analyzed, and the variation among the resulting estimates taken to indicate the size of the error involved in making estimates from the original data. In the case of phylogenies, it is argued that the proper method of resampling is to keep all of the original species while sampling characters with replacement, under the assumption that the characters have been independently drawn by the systematist and have evolved independently. Majority-rule consensus trees can be used to construct a phylogeny showing all of the inferred monophyletic groups that occurred in a majority of the bootstrap samples. If a group shows up 95% of the time or more, the evidence for it is taken to be statistically significant. Existing computer programs can be used to analyze different bootstrap samples by using weights on the characters, the weight of a character being how many times it was drawn in bootstrap sampling. When all characters are perfectly compatible, as envisioned by Hennig, bootstrap sampling becomes unnecessary; the bootstrap method would show significant evidence for a group if it is defined by three or more characters.

40,349 citations

Book
01 Jan 1995
TL;DR: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition, and is designed as a text, with over 100 exercises, to benefit anyone involved in the fields of neural computation and pattern recognition.
Abstract: From the Publisher: This is the first comprehensive treatment of feed-forward neural networks from the perspective of statistical pattern recognition. After introducing the basic concepts, the book examines techniques for modelling probability density functions and the properties and merits of the multi-layer perceptron and radial basis function network models. Also covered are various forms of error functions, principal algorithms for error function minimalization, learning and generalization in neural networks, and Bayesian techniques and their applications. Designed as a text, with over 100 exercises, this fully up-to-date work will benefit anyone involved in the fields of neural computation and pattern recognition.

19,056 citations

Journal ArticleDOI
TL;DR: It is demonstrated that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube.
Abstract: In this paper we demonstrate that finite linear combinations of compositions of a fixed, univariate function and a set of affine functionals can uniformly approximate any continuous function ofn real variables with support in the unit hypercube; only mild conditions are imposed on the univariate function. Our results settle an open question about representability in the class of single hidden layer neural networks. In particular, we show that arbitrary decision regions can be arbitrarily well approximated by continuous feedforward neural networks with only a single internal, hidden layer and any continuous sigmoidal nonlinearity. The paper discusses approximation properties of other possible types of nonlinearities that might be implemented by artificial neural networks.

12,286 citations

Book
23 Nov 2005
TL;DR: The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics, and deals with the supervised learning problem for both regression and classification.
Abstract: A comprehensive and self-contained introduction to Gaussian processes, which provide a principled, practical, probabilistic approach to learning in kernel machines. Gaussian processes (GPs) provide a principled, practical, probabilistic approach to learning in kernel machines. GPs have received increased attention in the machine-learning community over the past decade, and this book provides a long-needed systematic and unified treatment of theoretical and practical aspects of GPs in machine learning. The treatment is comprehensive and self-contained, targeted at researchers and students in machine learning and applied statistics. The book deals with the supervised-learning problem for both regression and classification, and includes detailed algorithms. A wide variety of covariance (kernel) functions are presented and their properties discussed. Model selection is discussed both from a Bayesian and a classical perspective. Many connections to other well-known techniques from machine learning and statistics are discussed, including support-vector machines, neural networks, splines, regularization networks, relevance vector machines and others. Theoretical issues including learning curves and the PAC-Bayesian framework are treated, and several approximation methods for learning with large datasets are discussed. The book contains illustrative examples and exercises, and code and datasets are available on the Web. Appendixes provide mathematical background and a discussion of Gaussian Markov processes.

11,357 citations

Christopher M. Bishop1
01 Jan 2006
TL;DR: Probability distributions of linear models for regression and classification are given in this article, along with a discussion of combining models and combining models in the context of machine learning and classification.
Abstract: Probability Distributions.- Linear Models for Regression.- Linear Models for Classification.- Neural Networks.- Kernel Methods.- Sparse Kernel Machines.- Graphical Models.- Mixture Models and EM.- Approximate Inference.- Sampling Methods.- Continuous Latent Variables.- Sequential Data.- Combining Models.

10,141 citations