scispace - formally typeset
Search or ask a question

Showing papers on "Probability distribution published in 2008"


Book
16 Dec 2008
TL;DR: The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.
Abstract: The formalism of probabilistic graphical models provides a unifying framework for capturing complex dependencies among random variables, and building large-scale multivariate statistical models. Graphical models have become a focus of research in many statistical, computational and mathematical fields, including bioinformatics, communication theory, statistical physics, combinatorial optimization, signal and image processing, information retrieval and statistical machine learning. Many problems that arise in specific instances — including the key problems of computing marginals and modes of probability distributions — are best studied in the general setting. Working with exponential family representations, and exploiting the conjugate duality between the cumulant function and the entropy for exponential families, we develop general variational representations of the problems of computing likelihoods, marginal probabilities and most probable configurations. We describe how a wide variety of algorithms — among them sum-product, cluster variational methods, expectation-propagation, mean field methods, max-product and linear programming relaxation, as well as conic programming relaxations — can all be understood in terms of exact or approximate forms of these variational representations. The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.

4,335 citations


Proceedings ArticleDOI
17 May 2008
TL;DR: In this article, the authors show how to construct a variety of "trapdoor" cryptographic tools assuming the worst-case hardness of standard lattice problems (such as approximating the length of the shortest nonzero vector to within certain polynomial factors).
Abstract: We show how to construct a variety of "trapdoor" cryptographic tools assuming the worst-case hardness of standard lattice problems (such as approximating the length of the shortest nonzero vector to within certain polynomial factors). Our contributions include a new notion of trapdoor function with preimage sampling, simple and efficient "hash-and-sign" digital signature schemes, and identity-based encryption. A core technical component of our constructions is an efficient algorithm that, given a basis of an arbitrary lattice, samples lattice points from a discrete Gaussian probability distribution whose standard deviation is essentially the length of the longest Gram-Schmidt vector of the basis. A crucial security property is that the output distribution of the algorithm is oblivious to the particular geometry of the given basis.

1,834 citations


Journal ArticleDOI
TL;DR: A Bayesian-mixing model is developed that estimates probability distributions of source contributions to a mixture while explicitly accounting for uncertainty associated with multiple sources, fractionation and isotope signatures.
Abstract: Stable isotopes are a powerful tool for ecologists, often used to assess contributions of different sources to a mixture (e.g. prey to a consumer). Mixing models use stable isotope data to estimate the contribution of sources to a mixture. Uncertainty associated with mixing models is often substantial, but has not yet been fully incorporated in models. We developed a Bayesian-mixing model that estimates probability distributions of source contributions to a mixture while explicitly accounting for uncertainty associated with multiple sources, fractionation and isotope signatures. This model also allows for optional incorporation of informative prior information in analyses. We demonstrate our model using a predator–prey case study. Accounting for uncertainty in mixing model inputs can change the variability, magnitude and rank order of estimates of prey (source) contributions to the predator (mixture). Isotope mixing models need to fully account for uncertainty in order to accurately estimate source contributions.

1,085 citations


Book
02 Sep 2008
TL;DR: In this article, the authors present a review of Probability theory and its application in the field of mine planning, including the following: 1.1 Introduction. 2.2 Discrete-Time, Discrete State Markov Chains (DSMC) 2.3 Monte Carlo Analysis and Results. 3.4 Probabilistic Interpretation.
Abstract: Preface. Acknowledgements. PART 1: THEORY. Chapter 1: Review of Probability Theory. 1.1 Introduction. 1.2 Basic Set Theory. 1.3 Probability. 1.4 Conditional Probability. 1.5 Random Variables and Probability Distributions. 1.6 Measures of Central Tendency, Variability, and Association. 1.7 Linear Combinations of Random Variables. 1.8 Functions of Random Variables. 1.9 Common Discrete Probability Distributions. 1.10 Common Continuous Probability Distributions. 1.11 Extreme-Value Distributions. Chapter2: Discrete random Processes. 2.1 Introduction. 2.2 Discrete-Time, Discrete-State Markov Chains. 2.3 Continuous-Time Markov Chains. 2.4 Queueing Models. Chapter 3: Random Fields. 3.1 Introduction. 3.2 Covariance Function. 3.3 Spectral Density Function. 3.4 Variance Function. 3.5 Correlation Length. 3.6 Some Common Models. 3.7 Random Fields in Higher Dimensions. Chapter 4: Best Estimates, Excursions, and Averages. 4.1 Best Linear Unbiased Estimation. 4.2 Threshold Excursions in One Dimension. 4.3 Threshold Excursions in Two Dimensions. 4.4 Averages. Chapter 5: Estimation. 5.1 Introduction. 5.2 Choosing a Distribution. 5.3 Estimation in Presence of Correlation. 5.4 Advanced Estimation Techniques. Chapter 6: Simulation. 6.1 Introduction. 6.2 Random-Number Generators. 6.3 Generating Nonuniform Random Variables. 6.4 Generating Random Fields. 6.5 Conditional Simulation of Random Fields. 6.6 Monte carlo Simulation. Chapter 7: Reliability-Based Design. 7.1 Acceptable Risk. 7.2 Assessing Risk. 7.3 Background to Design Methodologies. 7.4 Load and Resistance Factor Design. 7.5 Going Beyond Calibration. 7.6 Risk-Based Decision making. PART 2: PRACTICE. Chapter 8: Groundwater Modeling. 8.1 Introduction. 8.2 Finite-Element Model. 8.3 One-Dimensional Flow. 8.4 Simple Two-Dimensional Flow. 8.5 Two-Dimensional Flow Beneath Water-Retaining Structures. 8.6 Three-Dimensional Flow. 8.7 Three Dimensional Exit Gradient Analysis. Chapter 9: Flow Through Earth Dams. 9.1 Statistics of Flow Through Earth Dams. 9.2 Extreme Hydraulic Gradient Statistics. Chapter 10: Settlement of Shallow Foundations. 10.1 Introduction. 10.2 Two-Dimensional Probabilistic Foundation Settlement. 10.3 Three-Dimensional Probabilistic Foundation Settlement. 10.4 Strip Footing Risk Assessment. 10.5 Resistance Factors for Shallow-Foundation Settlement Design. Chapter 11: Bearing Capacity. 11.1 Strip Footings on c-o Soils. 11.2 Load and Resistance Factor Design of Shallow Foundations. 11.3 Summary. Chapter 12: Deep Foundations. 12.1 Introduction. 12.2 Random Finite-Element Method. 12.3 Monte Carlo Estimation of Pile Capacity. 12.4 Summary. Chapter 13: Slope Stability. 13.1 Introduction. 13.2 Probabilistic Slope Stability Analysis. 13.3 Slope Stability Reliability Model. Chapter 14: Earth Pressure. 14.1 Introduction. 14.2 Passive Earth Pressures. 14.3 Active Earth Pressures: Retaining Wall Reliability. Chapter 15: Mine Pillar Capacity. 15.1 Introduction. 15.2 Literature. 15.3 Parametric Studies. 15.4 Probabilistic Interpretation. 15.5 Summary. Chapter 16: Liquefaction. 16.1 Introduction. 16.2 Model Size: Soil Liquefaction. 16.3 Monte Carlo Analysis and Results. 16.4 Summary PART 3: APPENDIXES. APPENDIX A: PROBABILITY TABLES. A.1 Normal Distribution. A.2 Inverse Student t -Distribution. A.3 Inverse Chi-Square Distribution APPENDIX B: NUMERICAL INTEGRATION. B.1 Gaussian Quadrature. APPENDIX C. COMPUTING VARIANCES AND CONVARIANCES OF LOCAL AVERAGES. C.1 One-Dimensional Case. C.2 Two-Dimensional Case C.3 Three-Dimensional Case. Index.

751 citations


Journal ArticleDOI
TL;DR: It is demonstrated that, compared with GDE3, RM-MEDA is not sensitive to algorithmic parameters, and has good scalability to the number of decision variables in the case of nonlinear variable linkages.
Abstract: Under mild conditions, it can be induced from the Karush-Kuhn-Tucker condition that the Pareto set, in the decision space, of a continuous multiobjective optimization problem is a piecewise continuous (m - 1)-D manifold, where m is the number of objectives. Based on this regularity property, we propose a regularity model-based multiobjective estimation of distribution algorithm (RM-MEDA) for continuous multiobjective optimization problems with variable linkages. At each generation, the proposed algorithm models a promising area in the decision space by a probability distribution whose centroid is a (m - 1)-D piecewise continuous manifold. The local principal component analysis algorithm is used for building such a model. New trial solutions are sampled from the model thus built. A nondominated sorting-based selection is used for choosing solutions for the next generation. Systematic experiments have shown that, overall, RM-MEDA outperforms three other state-of-the-art algorithms, namely, GDE3, PCX-NSGA-II, and MIDEA, on a set of test instances with variable linkages. We have demonstrated that, compared with GDE3, RM-MEDA is not sensitive to algorithmic parameters, and has good scalability to the number of decision variables in the case of nonlinear variable linkages. A few shortcomings of RM-MEDA have also been identified and discussed in this paper.

660 citations


Journal ArticleDOI
TL;DR: In this paper, a simple econometric procedure is proposed to account for uncertainty in the choice of functional form for regression discontinuity (RD) designs with discrete support, where deviations of the true regression function from a given approximating function are modeled as random.

609 citations


Journal ArticleDOI
TL;DR: In this article, the authors compare the density statistics of compressible turbulence driven by the usually adopted solenoidal forcing (divergence-free) and by compressive forcing (curl-free).
Abstract: The probability density function (PDF) of the gas density in turbulent supersonic flows is investigated with high-resolution numerical simulations. In a systematic study, we compare the density statistics of compressible turbulence driven by the usually adopted solenoidal forcing (divergence-free) and by compressive forcing (curl-free). Our results are in agreement with studies using solenoidal forcing. However, compressive forcing yields a significantly broader density distribution with standard deviation ~3 times larger at the same rms Mach number. The standard deviation-Mach number relation used in analytical models of star formation is reviewed and a modification of the existing expression is proposed, which takes into account the ratio of solenoidal and compressive modes of the turbulence forcing.

557 citations


Book ChapterDOI
01 Jan 2008
TL;DR: In this article, a brief introduction to the formulation of various types of stochastic epidemic models is presented based on the well-known deterministic SIS and SIR epidemic models.
Abstract: A brief introduction to the formulation of various types of stochastic epidemic models is presented based on the well-known deterministic SIS and SIR epidemic models. Three different types of stochastic model formulations are discussed: discrete time Markov chain, continuous time Markov chain and stochastic differential equations. Properties unique to the stochastic models are presented: probability of disease extinction, probability of disease outbreak, quasistationary probability distribution, final size distribution, and expected duration of an epidemic. The chapter ends with a discussion of two stochastic formulations that cannot be directly related to the SIS and SIR epidemic models. They are discrete time Markov chain formulations applied in the study of epidemics within households (chain binomial models) and in the prediction of the initial spread of an epidemic (branching processes).

414 citations


Journal ArticleDOI
TL;DR: In this paper, the authors studied the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size.
Abstract: We study the asymptotic properties of bridge estimators in sparse, high-dimensional, linear regression models when the number of covariates may increase to infinity with the sample size. We are particularly interested in the use of bridge estimators to distinguish between covariates whose coefficients are zero and covariates whose coefficients are nonzero. We show that under appropriate conditions, bridge estimators correctly select covariates with nonzero coefficients with probability converging to one and that the estimators of nonzero coefficients have the same asymptotic distribution that they would have if the zero coefficients were known in advance. Thus, bridge estimators have an oracle property in the sense of Fan and Li [J. Amer. Statist. Assoc. 96 (2001) 1348-1360] and Fan and Peng [Ann. Statist. 32 (2004) 928-961]. In general, the oracle property holds only if the number of covariates is smaller than the sample size. However, under a partial orthogonality condition in which the covariates of the zero coefficients are uncorrelated or weakly correlated with the covariates of nonzero coefficients, we show that marginal bridge estimators can correctly distinguish between covariates with nonzero and zero coefficients with probability converging to one even when the number of covariates is greater than the sample size.

411 citations


Journal ArticleDOI
TL;DR: The classes of statistics that are currently available in the ergm package are described and means for controlling the Markov chain Monte Carlo (MCMC) algorithm that the package uses for estimation are described.
Abstract: Exponential-family random graph models (ERGMs) represent the processes that govern the formation of links in networks through the terms selected by the user. The terms specify network statistics that are sufficient to represent the probability distribution over the space of networks of that size. Many classes of statistics can be used. In this article we describe the classes of statistics that are currently available in the ergm package. We also describe means for controlling the Markov chain Monte Carlo (MCMC) algorithm that the package uses for estimation. These controls affect either the proposal distribution on the sample space used by the underlying Metropolis-Hastings algorithm or the constraints on the sample space itself. Finally, we describe various other arguments to core functions of the ergm package.

369 citations


Journal ArticleDOI
TL;DR: The analytical model that is presented is able to describe the effects of various system parameters, including road traffic parameters and the transmission range of vehicles, on the connectivity, and provides bounds obtained using stochastic ordering techniques.
Abstract: We investigate connectivity in the ad hoc network formed between vehicles that move on a typical highway. We use a common model in vehicular traffic theory in which a fixed point on the highway sees cars passing it that are separated by times with an exponentially distributed duration. We obtain the distribution of the distances between the cars, which allows us to use techniques from queuing theory to study connectivity. We obtain the Laplace transform of the probability distribution of the connectivity distance, explicit expressions for the expected connectivity distance, and the probability distribution and expectation of the number of cars in a platoon. Then, we conduct extensive simulation studies to evaluate the obtained results. The analytical model that we present is able to describe the effects of various system parameters, including road traffic parameters (i.e., speed distribution and traffic flow) and the transmission range of vehicles, on the connectivity. To more precisely study the effect of speed on connectivity, we provide bounds obtained using stochastic ordering techniques. Our approach is based on the work of Miorandi and Altman, which transformed the problem of connectivity distance distribution into that of the distribution of the busy period of an equivalent infinite server queue. We use our analytical results, along with common road traffic statistical data, to understand connectivity in vehicular ad hoc networks.

Journal ArticleDOI
TL;DR: This paper derives the order quantities that minimize the newsvendor's maximum regret of not acting optimally, which can be extended to a variety of problems that require a robust but not conservative solution.
Abstract: Traditional stochastic inventory models assume full knowledge of the demand probability distribution. However, in practice, it is often difficult to completely characterize the demand distribution, especially in fast-changing markets. In this paper, we study the newsvendor problem with partial information about the demand distribution (e.g., mean, variance, symmetry, unimodality). In particular, we derive the order quantities that minimize the newsvendor's maximum regret of not acting optimally. Most of our solutions are tractable, which makes them attractive for practical application. Our analysis also generates insights into the choice of the demand distribution as an input to the newsvendor model. In particular, the distributions that maximize the entropy perform well under the regret criterion. Our approach can be extended to a variety of problems that require a robust but not conservative solution.

Journal ArticleDOI
TL;DR: In this paper, the eigenvalue-eigenfunction decomposition of an integral operator associated with specific joint probability densities is used to identify a large class of nonclassical nonlinear errors-in-variables models with continuously distributed variables.
Abstract: While the literature on nonclassical measurement error traditionally relies on the availability of an auxiliary data set containing correctly measured observations, we establish that the availability of instruments enables the identification of a large class of nonclassical nonlinear errors-in-variables models with continuously distributed variables. Our main identifying assumption is that, conditional on the value of the true regressors, some “measure of location” of the distribution of the measurement error (e.g., its mean, mode, or median) is equal to zero. The proposed approach relies on the eigenvalue–eigenfunction decomposition of an integral operator associated with specific joint probability densities. The main identifying assumption is used to “index” the eigenfunctions so that the decomposition is unique. We propose a convenient sieve-based estimator, derive its asymptotic properties, and investigate its finite-sample behavior through Monte Carlo simulations.

Journal ArticleDOI
TL;DR: In this article, the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered is addressed, and an efficient Markov chain Monte Carlo algorithm is developed for computation.
Abstract: In multicenter studies, subjects in different centers may have different outcome distributions. This article is motivated by the problem of nonparametric modeling of these distributions, borrowing information across centers while also allowing centers to be clustered. Starting with a stick-breaking representation of the Dirichlet process (DP), we replace the random atoms with random probability measures drawn from a DP. This results in a nested DP prior, which can be placed on the collection of distributions for the different centers, with centers drawn from the same DP component automatically clustered together. Theoretical properties are discussed, and an efficient Markov chain Monte Carlo algorithm is developed for computation. The methods are illustrated using a simulation study and an application to quality of care in U.S. hospitals.

Journal ArticleDOI
TL;DR: The results show that the proposed approach allows decision makers to perform trade-off analysis among expected costs, quality acceptance levels, and on-time delivery distributions and provides alternative tools to evaluate and improve supplier selection decisions in an uncertain supply chain environment.

Journal ArticleDOI
TL;DR: A retrospective Markov chain Monte Carlo algorithm is developed for posterior computation, and the methods are illustrated using a simulated example and an epidemiological application.
Abstract: We propose a class of kernel stick-breaking processes for uncountable collections of dependent random probability measures. The process is constructed by first introducing an infinite sequence of random locations. Independent random probability measures and beta-distributed random weights are assigned to each location. Predictor-dependent random probability measures are then constructed by mixing over the locations, with stick-breaking probabilities expressed as a kernel multiplied by the beta weights. Some theoretical properties of the process are described, including a covariate-dependent prediction rule. A retrospective Markov chain Monte Carlo algorithm is developed for posterior computation, and the methods are illustrated using a simulated example and an epidemiological application.

Proceedings ArticleDOI
Ravi Jampani1, Fei Xu1, Mingxi Wu1, Luis Perez1, Chris Jermaine1, Peter J. Haas2 
09 Jun 2008
TL;DR: MCDB is introduced, a system for managing uncertain data that is based on a Monte Carlo approach, which can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distribution such as means, variances, and quantiles.
Abstract: To deal with data uncertainty, existing probabilistic database systems augment tuples with attribute-level or tuple-level probability values, which are loaded into the database along with the data itself. This approach can severely limit the system's ability to gracefully handle complex or unforeseen types of uncertainty, and does not permit the uncertainty model to be dynamically parameterized according to the current state of the database. We introduce MCDB, a system for managing uncertain data that is based on a Monte Carlo approach. MCDB represents uncertainty via "VG functions," which are used to pseudorandomly generate realized values for uncertain attributes. VG functions can be parameterized on the results of SQL queries over "parameter tables" that are stored in the database, facilitating what-if analyses. By storing parameters, and not probabilities, and by estimating, rather than exactly computing, the probability distribution over possible query answers, MCDB avoids many of the limitations of prior systems. For example, MCDB can easily handle arbitrary joint probability distributions over discrete or continuous attributes, arbitrarily complex SQL queries, and arbitrary functionals of the query-result distribution such as means, variances, and quantiles. To achieve good performance, MCDB uses novel query processing techniques, executing a query plan exactly once, but over "tuple bundles" instead of ordinary tuples. Experiments indicate that our enhanced functionality can be obtained with acceptable overheads relative to traditional systems.

Proceedings ArticleDOI
06 Jul 2008
TL;DR: A method for estimating the KL divergence between continuous densities is presented and it is proved it converges almost surely and can be either estimated using the empirical cdf or k-nearest-neighbour density estimation, which does not converge to the true measure for finite k.
Abstract: We present a method for estimating the KL divergence between continuous densities and we prove it converges almost surely. Divergence estimation is typically solved estimating the densities first. Our main result shows this intermediate step is unnecessary and that the divergence can be either estimated using the empirical cdf or k-nearest-neighbour density estimation, which does not converge to the true measure for finite k. The convergence proof is based on describing the statistics of our estimator using waiting-times distributions, as the exponential or Erlang. We illustrate the proposed estimators and show how they compare to existing methods based on density estimation, and we also outline how our divergence estimators can be used for solving the two-sample problem.

Journal ArticleDOI
TL;DR: The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform.
Abstract: How many independent samples N do we need from a distribution p to decide that p is epsiv-distant from uniform in an L1 sense, Sigmai=1 m |p(i) - 1/m| > epsiv? (Here m is the number of bins on which the distribution is supported, and is assumed known a priori.) Somewhat surprisingly, we only need N epsiv2 Gt m 1/2 to make this decision reliably (this condition is both sufficient and necessary). The test for uniformity introduced here is based on the number of observed ldquocoincidencesrdquo (samples that fall into the same bin), the mean and variance of which may be computed explicitly for the uniform distribution and bounded nonparametrically for any distribution that is known to be epsiv-distant from uniform. Some connections to the classical birthday problem are noted.

Journal ArticleDOI
TL;DR: In this article, the authors describe and implement three computationally attractive procedures for nonparametric estimation of mixing distributions in discrete choice models, which are specific types of the well known EM (Expectation-Maximization) algorithm based on three different ways of approximating the mixing distribution nonparametrically.
Abstract: This paper describes and implements three computationally attractive procedures for nonparametric estimation of mixing distributions in discrete choice models. The procedures are specific types of the well known EM (Expectation-Maximization) algorithm based on three different ways of approximating the mixing distribution nonparametrically: (1) a discrete distribution with mass points and frequencies treated as parameters, (2) a discrete mixture of continuous distributions, with the moments and weight for each distribution treated as parameters, and (3) a discrete distribution with fixed mass points whose frequencies are treated as parameters. The methods are illustrated with a mixed logit model of households' choices among alternative-fueled vehicles.

Journal ArticleDOI
TL;DR: In this article, a matrix perturbation approach was used to study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n and those of the limiting population PCA as n → oo.
Abstract: Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, and those of the limiting population PCA as n → oo. As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the leading eigenvalue and eigenvector of sample PCA and population PCA under a spiked covariance model. In addition, we also consider the relation between finite sample PCA and the asymptotic results in the joint limit p, n → ∞, with p/n = c. We present a matrix perturbation view of the "phase transition phenomenon," and a simple linear-algebra based derivation of the eigenvalue and eigenvector overlap in this asymptotic limit. Moreover, our analysis also applies for finite p, n where we show that although there is no sharp phase transition as in the infinite case, either as a function of noise level or as a function of sample size n, the eigenvector of sample PCA may exhibit a sharp "loss of tracking," suddenly losing its relation to the (true) eigenvector of the population PCA matrix. This occurs due to a crossover between the eigenvalue due to the signal and the largest eigenvalue due to noise, whose eigenvector points in a random direction.

Journal ArticleDOI
TL;DR: Measuring the crowd within: probabilistic representations within individuals finds any benefit of averaging two responses from one person would yield support for this hypothesis, which is consistent with such models that responses of many people are distributed probabilistically.
Abstract: Psychological Science, Short Report, 2008. 19, 645-647. (in press version): This manuscript may differ from the final published version Measuring the crowd within: probabilistic representations within individuals. EDWARD VUL Massachusetts Institute of Technology HAROLD PASHLER University of California, San Diego A crowd often possesses better information than do the individuals it comprises. For example, if people are asked to guess the weight of a prize- winning ox (Galton, 1907), the error of the average response is substantially smaller than the average error of individual estimates. This fact, which Galton interpreted as support for democratic governance, is responsible for the success of polling the audience in the television program “Who Wants to be a Millionaire” (Surowiecki, 2004) and for the superiority of combined over individual financial forecasts (Clemen, 1989). Researchers agree that this wisdom-of-crowds effect depends on a statistical fact: The crowd's average will be more accurate as long as some of the error of one individual is statistically independent of the error of other individuals—as seems almost guaranteed to be the case. Whether a similar improvement can be obtained by averaging two estimates from a single individual is not, a priori, obvious. If one estimate represents the best information available to the person, as common intuition suggests, then a second guess will simply add noise, and averaging the two will only decrease accuracy. Researchers have previously assumed this view and focused on improving the best estimate (Hirt & Markman, 1995; Mussweiler, Strack, & Pfeiffer, 2000; Stewart, 2001). Alternatively, single estimates may represent samples drawn from an internal probability Address correspondence to Edward Vul, Department of Brain and Cognitive Science, Massachusetts Institute of Technology, 77 Massachusetts Ave. 46- 4141, Cambridge, MA 02139, e-mail: evul@mit.edu. distribution, rather than deterministic best guesses. According to this account, if the internal probability distribution is unbiased, the average of two estimates from one person will be more accurate than a single estimate. Ariely et al. (2000) predicted that such a benefit would accrue from averaging probability judgments within one individual, but did not find evidence of such an effect. However, probability judgments are known to be biased toward extreme values (0 or 1), and averaging should not reduce the bias of estimates; if guesses are sampled from an unbiased distribution, however, averaging should reduce error (variance; Laplace, 1812/1878; Wallsten, Budescu, Erev, & Diederich, 1997). Probabilistic representations have been postulated in recent models of memory (Steyvers, Griffiths, & Dennis, 2006), perception (Kersten & Yuille, 2003), and neural coding (Ma, Beck, Latham, & Pouget, 2006). It is consistent with such models that responses of many people are distributed probabilistically, as shown by the wisdom-of-crowds effect. However, despite the theoretical appeal of these models, there has been scant evidence that, within a given person, knowledge is represented as a probability distribution. Finding any benefit of averaging two responses from one person would yield support for this hypothesis. METHOD We recruited 428 participants from an Internet- based subject pool and asked them eight questions probing their real-world knowledge (derived from The World Factbook, Central Intelligence Agency, 2007; e.g., “What percentage of the world's airports are in the

Journal ArticleDOI
TL;DR: It is shown that for local quenches starting at criticality the probability distribution of the work displays an interesting edge singularity.
Abstract: We study the statistics of the work done on a quantum critical system by quenching a control parameter in the Hamiltonian. We elucidate the relation between the probability distribution of the work and the Loschmidt echo, a quantity emerging usually in the context of dephasing. Using this connection we characterize the statistics of the work done on a quantum Ising chain by quenching locally or globally the transverse field. We show that for local quenches starting at criticality the probability distribution of the work displays an interesting edge singularity.

Journal ArticleDOI
TL;DR: Multi-Entity Bayesian Networks is presented, a first-order language for specifying probabilistic knowledge bases as parameterized fragments of Bayesian networks, and a proof is given that MEBN can represent a probability distribution on interpretations of any finitely axiomatizable first- order theory.

Journal ArticleDOI
TL;DR: In this article, the authors obtained general integral formulas for probabilities in the asymmetric simple exclusion process (ASEP) on the integer lattice with nearest neighbor hopping rates p to the right and q = 1−p to the left.
Abstract: In this paper we obtain general integral formulas for probabilities in the asymmetric simple exclusion process (ASEP) on the integer lattice $${\mathbb{Z}}$$ with nearest neighbor hopping rates p to the right and q = 1−p to the left. For the most part we consider an N-particle system but for certain of these formulas we can take the $$N\to\infty$$ limit. First we obtain, for the N-particle system, a formula for the probability of a configuration at time t, given the initial configuration. For this we use Bethe Ansatz ideas to solve the master equation, extending a result of Schutz for the case N = 2. The main results of the paper, derived from this, are integral formulas for the probability, for given initial configuration, that the m th left-most particle is at x at time t. In one of these formulas we can take the $$N\to\infty$$ limit, and it gives the probability for an infinite system where the initial configuration is bounded on one side. For the special case of the totally asymmetric simple exclusion process (TASEP) our formulas reduce to the known ones.

Journal ArticleDOI
TL;DR: In this article, a stochastic parameterization scheme for deep convection is described, suitable for use in both climate and NWP models, and theoretical arguments and results of cloud-resolving models are discussed in order to motivate the form of the scheme.
Abstract: A stochastic parameterization scheme for deep convection is described, suitable for use in both climate and NWP models. Theoretical arguments and the results of cloud-resolving models are discussed in order to motivate the form of the scheme. In the deterministic limit, it tends to a spectrum of entraining/detraining plumes and is similar to other current parameterizations. The stochastic variability describes the local fluctuations about a large-scale equilibrium state. Plumes are drawn at random from a probability distribution function (PDF) that defines the chance of finding a plume of given cloud-base mass flux within each model grid box. The normalization of the PDF is given by the ensemble-mean mass flux, and this is computed with a CAPE closure method. The characteristics of each plume produced are determined using an adaptation of the plume model from the Kain–Fritsch parameterization. Initial tests in the single-column version of the Unified Model verify that the scheme is effective in producing the desired distributions of convective variability without adversely affecting the mean state.

Journal ArticleDOI
TL;DR: This paper proposes new consensus clustering algorithms with linear computational complexity in n and introduces the idea of cumulative voting as a solution for the problem of cluster label alignment, where unlike the common one-to-one voting scheme, a probabilistic mapping is computed.
Abstract: Over the past few years, there has been a renewed interest in the consensus clustering problem. Several new methods have been proposed for finding a consensus partition for a set of n data objects that optimally summarizes an ensemble. In this paper, we propose new consensus clustering algorithms with linear computational complexity in n. We consider clusterings generated with a random number of clusters, which we describe by categorical random variables. We introduce the idea of cumulative voting as a solution for the problem of cluster label alignment, where unlike the common one-to-one voting scheme, a probabilistic mapping is computed. We seek a first summary of the ensemble that minimizes the average squared distance between the mapped partitions and the optimal representation of the ensemble, where the selection criterion of the reference clustering is defined based on maximizing the information content as measured by the entropy. We describe cumulative vote weighting schemes and corresponding algorithms to compute an empirical probability distribution summarizing the ensemble. Given the arbitrary number of clusters of the input partitions, we formulate the problem of extracting the optimal consensus as that of finding a compressed summary of the estimated distribution that preserves the maximum relevant information. An efficient solution is obtained using an agglomerative algorithm that minimizes the average generalized Jensen-Shannon divergence within the cluster. The empirical study demonstrates significant gains in accuracy and superior performance compared to several recent consensus clustering algorithms.

Book
04 Aug 2008
TL;DR: The study of random variables and their properties, regression and multivariate analysis, and simulation techniques for design led to the development of Bayesian decision methods and parameter uncertainty.
Abstract: Preface . Introduction. Preliminary data analysis. Basic probability concepts. Random variables and their properties. Probability distributions. Model estimation and testing. Methods of regression and multivariate analysis. Frequency analysis of extreme events. Simulation techniques for design. Risk and reliability analysis. Bayesian decision methods and parameter uncertainty. Appendixes Further mathematics Glossary of symbols Tables of selected distributions Brief answers to selected problems . Data lists. Index

Journal ArticleDOI
TL;DR: An axiomatic model of decision making which incorporates objective but imprecise information and explains how subjective belief varies with information is presented, which identifies an explicit attitude toward imprecision that underlies usual hedging axioms.

Journal ArticleDOI
TL;DR: In this paper, a general solution to the problem of identification and estimation of nonlinear models with misclassification error in a general discrete explanatory variable using instrumental variables is provided, and the model can be expressed as an explicit function of directly observed distribution functions.