scispace - formally typeset
Search or ask a question

Showing papers on "Probability distribution published in 2014"


Journal ArticleDOI
TL;DR: A unifying framework for modeling and solving distributionally robust optimization problems and introduces standardized ambiguity sets that contain all distributions with prescribed conic representable confidence sets and with mean values residing on an affine manifold.
Abstract: Distributionally robust optimization is a paradigm for decision making under uncertainty where the uncertain problem data are governed by a probability distribution that is itself subject to uncertainty. The distribution is then assumed to belong to an ambiguity set comprising all distributions that are compatible with the decision maker's prior information. In this paper, we propose a unifying framework for modeling and solving distributionally robust optimization problems. We introduce standardized ambiguity sets that contain all distributions with prescribed conic representable confidence sets and with mean values residing on an affine manifold. These ambiguity sets are highly expressive and encompass many ambiguity sets from the recent literature as special cases. They also allow us to characterize distributional families in terms of several classical and/or robust statistical indicators that have not yet been studied in the context of robust optimization. We determine conditions under which distributionally robust optimization problems based on our standardized ambiguity sets are computationally tractable. We also provide tractable conservative approximations for problems that violate these conditions.

789 citations


Proceedings ArticleDOI
31 May 2014
TL;DR: This paper describes connections this research area called ``Probabilistic Programming" has with programming languages and software engineering, and this includes language design, and the static and dynamic analysis of programs.
Abstract: Probabilistic programs are usual functional or imperative programs with two added constructs: (1) the ability to draw values at random from distributions, and (2) the ability to condition values of variables in a program via observations. Models from diverse application areas such as computer vision, coding theory, cryptographic protocols, biology and reliability analysis can be written as probabilistic programs. Probabilistic inference is the problem of computing an explicit representation of the probability distribution implicitly specified by a probabilistic program. Depending on the application, the desired output from inference may vary---we may want to estimate the expected value of some function f with respect to the distribution, or the mode of the distribution, or simply a set of samples drawn from the distribution. In this paper, we describe connections this research area called ``Probabilistic Programming" has with programming languages and software engineering, and this includes language design, and the static and dynamic analysis of programs. We survey current state of the art and speculate on promising directions for future research.

609 citations


Journal ArticleDOI
19 Feb 2014-PLOS ONE
TL;DR: An accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set is presented, which applies when measuring the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time.
Abstract: Mutual information (MI) is a powerful method for detecting relationships between data sets. There are accurate methods for estimating MI that avoid problems with “binning” when both data sets are discrete or when both data sets are continuous. We present an accurate, non-binning MI estimator for the case of one discrete data set and one continuous data set. This case applies when measuring, for example, the relationship between base sequence and gene expression level, or the effect of a cancer drug on patient survival time. We also show how our method can be adapted to calculate the Jensen–Shannon divergence of two or more data sets.

511 citations


Journal ArticleDOI
TL;DR: The decision to calculate a summary estimate in a meta-analysis should be based on clinical judgment, the number of studies, and the degree of variation among studies, as well as on a random-effects model that incorporates study-to-study variability beyond what would be expected by chance.
Abstract: A primary goal of meta-analysis is to improve the estimation of treatment effects by pooling results of similar studies. This article discusses the problems associated with using the DerSimonian–La...

353 citations


01 Jan 2014
TL;DR: In this article, it was shown that making many quick but locally suboptimal decisions based on very few samples may be the globally optimal strategy over long periods under reasonable assumptions about the time costs of sampling.
Abstract: In many learning or inference tasks human behavior approximates that of a Bayesian ideal observer, suggesting that, at some level, cognition can be described as Bayesian inference However, a number of findings have highlighted an intriguing mismatch between human behavior and standard assumptions about optimality: People often appear to make decisions based on just one or a few samples from the appropriate posterior probability distribution, rather than using the full distribution Although sampling-based approximations are a common way to implement Bayesian inference, the very limited numbers of samples often used by humans seem insufficient to approximate the required probability distributions very accurately Here, we consider this discrepancy in the broader framework of statistical decision theory, and ask: If people are making decisions based on samples—but as samples are costly—how many samples should people use to optimize their total expected or worst-case reward over a large number of decisions? We find that under reasonable assumptions about the time costs of sampling, making many quick but locally suboptimal decisions based on very few samples may be the globally optimal strategy over long periods These results help to reconcile a large body of work showing sampling-based or probability matching behavior with the hypothesis that human cognition can be understood in Bayesian terms, and they suggest promising future directions for studies of resource-constrained cognition

258 citations


Journal ArticleDOI
TL;DR: It is shown that the PSD of a CARMA model can be expressed as a sum of Lorentzian functions, which makes them extremely flexible and able to model a broad range of PSDs.
Abstract: We present the use of continuous-time autoregressive moving average (CARMA) models as a method for estimating the variability features of a light curve, and in particular its power spectral density (PSD) CARMA models fully account for irregular sampling and measurement errors, making them valuable for quantifying variability, forecasting and interpolating light curves, and variability-based classification We show that the PSD of a CARMA model can be expressed as a sum of Lorentzian functions, which makes them extremely flexible and able to model a broad range of PSDs We present the likelihood function for light curves sampled from CARMA processes, placing them on a statistically rigorous foundation, and we present a Bayesian method to infer the probability distribution of the PSD given the measured light curve Because calculation of the likelihood function scales linearly with the number of data points, CARMA modeling scales to current and future massive time-domain data sets We conclude by applying our CARMA modeling approach to light curves for an X-ray binary, two active galactic nuclei, a long-period variable star, and an RR Lyrae star in order to illustrate their use, applicability, and interpretation

218 citations


Proceedings ArticleDOI
04 Jun 2014
TL;DR: The capability of the stochastic model predictive control approach in terms of shaping the probability distribution of system states and fulfilling state constraints in a Stochastic setting is demonstrated for optimal control of polymorphic transformation in batch crystallization.
Abstract: Stochastic uncertainties are ubiquitous in complex dynamical systems and can lead to undesired variability of system outputs and, therefore, a notable degradation of closed- loop performance. This paper investigates model predictive control of nonlinear dynamical systems subject to probabilistic parametric uncertainties. A nonlinear model predictive control framework is presented for control of the probability dis- tribution of system states while ensuring the satisfaction of constraints with some desired probability levels. To obtain a computationally tractable formulation for real control applica- tions, polynomial chaos expansions are utilized to propagate the probabilistic parametric uncertainties through the system model. The paper considers individual probabilistic constraints, which are converted explicitly into convex second-order cone constraints for a general class of probability distributions. An algorithm is presented for receding horizon implementation of the finite-horizon stochastic optimal control problem. The capability of the stochastic model predictive control approach in terms of shaping the probability distribution of system states and fulfilling state constraints in a stochastic setting is demon- strated for optimal control of polymorphic transformation in batch crystallization.

214 citations


Posted Content
TL;DR: A* sampling as mentioned in this paper is a generic sampling algorithm that searches for the maximum of a Gumbel process using A* search, which makes more efficient use of bound and likelihood evaluations than the most closely related adaptive rejection sampling based algorithms.
Abstract: The problem of drawing samples from a discrete distribution can be converted into a discrete optimization problem In this work, we show how sampling from a continuous distribution can be converted into an optimization problem over continuous space Central to the method is a stochastic process recently described in mathematical statistics that we call the Gumbel process We present a new construction of the Gumbel process and A* sampling, a practical generic sampling algorithm that searches for the maximum of a Gumbel process using A* search We analyze the correctness and convergence time of A* sampling and demonstrate empirically that it makes more efficient use of bound and likelihood evaluations than the most closely related adaptive rejection sampling-based algorithms

212 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider two models for directed polymers in space-time independent random media (the O'Connell-Yor semidiscrete directed polymer and the continuum directed random polymer) at positive temperature and prove their KPZ universality via asymptotic analysis of exact Fredholm determinant formulas for the Laplace transform of their partition functions.
Abstract: We consider two models for directed polymers in space-time independent random media (the O'Connell-Yor semidiscrete directed polymer and the continuum directed random polymer) at positive temperature and prove their KPZ universality via asymptotic analysis of exact Fredholm determinant formulas for the Laplace transform of their partition functions. In particular, we show that for large time τ, the probability distributions for the free energy fluctuations, when rescaled by τ1/3, converges to the GUE Tracy-Widom distribution. We also consider the effect of boundary perturbations to the quenched random media on the limiting free energy statistics. For the semidiscrete directed polymer, when the drifts of a finite number of the Brownian motions forming the quenched random media are critically tuned, the statistics are instead governed by the limiting Baik–Ben Arous–Peche distributions from spiked random matrix theory. For the continuum polymer, the boundary perturbations correspond to choosing the initial data for the stochastic heat equation from a particular class, and likewise for its logarithm—the Kardar-Parisi-Zhang equation. The Laplace transform formula we prove can be inverted to give the one-point probability distribution of the solution to these stochastic PDEs for the class of initial data. © 2014 Wiley Periodicals, Inc.

196 citations


Journal ArticleDOI
TL;DR: In this article, the root-mean-square error is adapted to quantum measurements, and it is shown that there are no nontrivial unconditional joint-measurement bounds for state-dependent errors in the conceptual framework discussed here, while Heisenberg type measurement uncertainty relations for stateindependent errors have been proven.
Abstract: Recent years have witnessed a controversy over Heisenberg's famous error-disturbance relation. Here the conflict is resolved by way of an analysis of the possible conceptualizations of measurement error and disturbance in quantum mechanics. Two approaches to adapting the classic notion of root-mean-square error to quantum measurements are discussed. One is based on the concept of a noise operator; its natural operational content is that of a mean deviation of the values of two observables measured jointly, and thus its applicability is limited to cases where such joint measurements are available. The second error measure quantifies the differences between two probability distributions obtained in separate runs of measurements and is of unrestricted applicability. We show that there are no nontrivial unconditional joint-measurement bounds for state-dependent errors in the conceptual framework discussed here, while Heisenberg-type measurement uncertainty relations for state-independent errors have been proven.

185 citations


Book ChapterDOI
15 Sep 2014
TL;DR: A new optimal transport algorithm is proposed that incorporates label information in the optimization: this is achieved by combining an efficient matrix scaling technique together with a majoration of a non-convex regularization term.
Abstract: We present a new and original method to solve the domain adaptation problem using optimal transport. By searching for the best transportation plan between the probability distribution functions of a source and a target domain, a non-linear and invertible transformation of the learning samples can be estimated. Any standard machine learning method can then be applied on the transformed set, which makes our method very generic. We propose a new optimal transport algorithm that incorporates label information in the optimization: this is achieved by combining an efficient matrix scaling technique together with a majoration of a non-convex regularization term. By using the proposed optimal transport with label regularization, we obtain significant increase in performance compared to the original transport solution. The proposed algorithm is computationally efficient and effective, as illustrated by its evaluation on a toy example and a challenging real life vision dataset, against which it achieves competitive results with respect to state-of-the-art methods.

Proceedings Article
08 Dec 2014
TL;DR: This work shows how sampling from a continuous distribution can be converted into an optimization problem over continuous space and presents a new construction of the Gumbel process and A* Sampling, a practical generic sampling algorithm that searches for the maximum of a Gumbels process using A* search.
Abstract: The problem of drawing samples from a discrete distribution can be converted into a discrete optimization problem [1, 2, 3, 4]. In this work, we show how sampling from a continuous distribution can be converted into an optimization problem over continuous space. Central to the method is a stochastic process recently described in mathematical statistics that we call the Gumbel process. We present a new construction of the Gumbel process and A* Sampling, a practical generic sampling algorithm that searches for the maximum of a Gumbel process using A* search. We analyze the correctness and convergence time of A* Sampling and demonstrate empirically that it makes more efficient use of bound and likelihood evaluations than the most closely related adaptive rejection sampling-based algorithms.

Journal ArticleDOI
TL;DR: In this paper, the authors introduce a method of generating a random sample of simulated 14C determinations, from a specified distribution, with variable data densities and measurement errors, and compare the resulting proxy population curve to the known population distribution from which it was generated, to see whether known population fluctuations are unambiguously visible on a proxy curve derived from 14C data sets.

Journal ArticleDOI
14 Oct 2014-PLOS ONE
TL;DR: Different approaches to evaluate transfer entropy are compared, some already proposed, some novel, and present their implementation in a freeware MATLAB toolbox and applications to simulated and real data are presented.
Abstract: A challenge for physiologists and neuroscientists is to map information transfer between components of the systems that they study at different scales, in order to derive important knowledge on structure and function from the analysis of the recorded dynamics. The components of physiological networks often interact in a nonlinear way and through mechanisms which are in general not completely known. It is then safer that the method of choice for analyzing these interactions does not rely on any model or assumption on the nature of the data and their interactions. Transfer entropy has emerged as a powerful tool to quantify directed dynamical interactions. In this paper we compare different approaches to evaluate transfer entropy, some of them already proposed, some novel, and present their implementation in a freeware MATLAB toolbox. Applications to simulated and real data are presented.

Proceedings Article
21 Jun 2014
TL;DR: This paper shows that, under a 'time-reversibility' or Bradley-Terry-Luce (BTL) condition on the distribution, the rank centrality (PageRank) and least squares (HodgeRank) algorithms both converge to an optimal ranking.
Abstract: There has been much interest recently in the problem of rank aggregation from pairwise data. A natural question that arises is: under what sorts of statistical assumptions do various rank aggregation algorithms converge to an 'optimal' ranking? In this paper, we consider this question in a natural setting where pairwise comparisons are drawn randomly and independently from some underlying probability distribution. We first show that, under a 'time-reversibility' or Bradley-Terry-Luce (BTL) condition on the distribution, the rank centrality (PageRank) and least squares (HodgeRank) algorithms both converge to an optimal ranking. Next, we show that a matrix version of the Borda count algorithm, and more surprisingly, an algorithm which performs maximum likelihood estimation under a BTL assumption, both converge to an optimal ranking under a 'low-noise' condition that is strictly more general than BTL. Finally, we propose a new SVM-based algorithm for rank aggregation from pairwise data, and show that this converges to an optimal ranking under an even more general condition that we term 'generalized low-noise'. In all cases, we provide explicit sample complexity bounds for exact recovery of an optimal ranking. Our experiments confirm our theoretical findings and help to shed light on the statistical behavior of various rank aggregation algorithms.

Journal ArticleDOI
TL;DR: In this article, a generalized extreme value distribution is proposed to model precipitation on the original scale without prior transformation of the data and a closed form expression for its continuous ranked probability score can be derived and permits computationally efficient model fitting.
Abstract: Statistical post-processing of dynamical forecast ensembles is an essential component of weather forecasting. In this article, we present a post-processing method which generates full predictive probability distributions for precipitation accumulations based on ensemble model output statistics (EMOS). We model precipitation amounts by a generalized extreme value distribution which is left-censored at zero. This distribution permits modelling precipitation on the original scale without prior transformation of the data. A closed form expression for its continuous ranked probability score can be derived and permits computationally efficient model fitting. We discuss an extension of our approach which incorporates further statistics characterizing the spatial variability of precipitation amounts in the vicinity of the location of interest. The proposed EMOS method is applied to daily 18 h forecasts of 6 h accumulated precipitation over Germany in 2011 using the COSMO-DE ensemble prediction system operated by the German Meteorological Service. It yields calibrated and sharp predictive distributions and compares favourably with extended logistic regression and Bayesian model averaging which are state-of-the-art approaches for precipitation post-processing. The incorporation of neighbourhood information further improves predictive performance and turns out to be a useful strategy to account for displacement errors of the dynamical forecasts in a probabilistic forecasting framework.

Journal ArticleDOI
TL;DR: This work combines the optimized dispatch from the dynamic program with estimated system loss of load probabilities to compute a probability distribution for the state of charge of storage in each period, which can be used as a forced outage rate for storage in standard reliability-based capacity value estimation methods.
Abstract: We present a method to estimate the capacity value of storage. Our method uses a dynamic program to model the effect of power system outages on the operation and state of charge of storage in subsequent periods. We combine the optimized dispatch from the dynamic program with estimated system loss of load probabilities to compute a probability distribution for the state of charge of storage in each period. This probability distribution can be used as a forced outage rate for storage in standard reliability-based capacity value estimation methods. Our proposed method has the advantage over existing approximations that it explicitly captures the effect of system shortage events on the state of charge of storage in subsequent periods. We also use a numerical case study, based on five utility systems in the U.S., to demonstrate our technique and compare it to existing approximation methods.

Journal ArticleDOI
TL;DR: This work probes the sources of suboptimality in probabilistic inference using a novel estimation task in which subjects are exposed to an explicitly provided distribution, thereby removing the need to remember the prior, and rejects several models of stochastic behavior, including probability matching and sample-averaging strategies.
Abstract: Humans have been shown to combine noisy sensory information with previous experience (priors), in qualitative and sometimes quantitative agreement with the statistically-optimal predictions of Bayesian integration. However, when the prior distribution becomes more complex than a simple Gaussian, such as skewed or bimodal, training takes much longer and performance appears suboptimal. It is unclear whether such suboptimality arises from an imprecise internal representation of the complex prior, or from additional constraints in performing probabilistic computations on complex distributions, even when accurately represented. Here we probe the sources of suboptimality in probabilistic inference using a novel estimation task in which subjects are exposed to an explicitly provided distribution, thereby removing the need to remember the prior. Subjects had to estimate the location of a target given a noisy cue and a visual representation of the prior probability density over locations, which changed on each trial. Different classes of priors were examined (Gaussian, unimodal, bimodal). Subjects' performance was in qualitative agreement with the predictions of Bayesian Decision Theory although generally suboptimal. The degree of suboptimality was modulated by statistical features of the priors but was largely independent of the class of the prior and level of noise in the cue, suggesting that suboptimality in dealing with complex statistical features, such as bimodality, may be due to a problem of acquiring the priors rather than computing with them. We performed a factorial model comparison across a large set of Bayesian observer models to identify additional sources of noise and suboptimality. Our analysis rejects several models of stochastic behavior, including probability matching and sample-averaging strategies. Instead we show that subjects' response variability was mainly driven by a combination of a noisy estimation of the parameters of the priors, and by variability in the decision process, which we represent as a noisy or stochastic posterior.

Journal ArticleDOI
08 Jan 2014-PLOS ONE
TL;DR: It is shown that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation, and Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches.
Abstract: Gene expression measurements have successfully been used for building prognostic signatures, i.e for identifying a short list of important genes that can predict patient outcome. Mostly microarray measurements have been considered, and there is little advice available for building multivariable risk prediction models from RNA-Seq data. We specifically consider penalized regression techniques, such as the lasso and componentwise boosting, which can simultaneously consider all measurements and provide both, multivariable regression models for prediction and automated variable selection. However, they might be affected by the typical skewness, mean-variance-dependency or extreme values of RNA-Seq covariates and therefore could benefit from transformations of the latter. In an analytical part, we highlight preferential selection of covariates with large variances, which is problematic due to the mean-variance dependency of RNA-Seq data. In a simulation study, we compare different transformations of RNA-Seq data for potentially improving detection of important genes. Specifically, we consider standardization, the log transformation, a variance-stabilizing transformation, the Box-Cox transformation, and rank-based transformations. In addition, the prediction performance for real data from patients with kidney cancer and acute myeloid leukemia is considered. We show that signature size, identification performance, and prediction performance critically depend on the choice of a suitable transformation. Rank-based transformations perform well in all scenarios and can even outperform complex variance-stabilizing approaches. Generally, the results illustrate that the distribution and potential transformations of RNA-Seq data need to be considered as a critical step when building risk prediction models by penalized regression techniques.

Journal ArticleDOI
TL;DR: A key feature of the Tool is that users can log in from different sites and view and interact with the same graphical displays, so that expert elicitation sessions can be conducted remotely (in conjunction with tele- or videoconferencing) and make probability elicitation easier in situations where it is difficult to interview experts in person.
Abstract: We present a web-based probability distribution elicitation tool: The MATCH Uncertainty Elicitation Tool. The Tool is designed to help elicit probability distributions about uncertain model parameters from experts, in situations where suitable data is either unavailable or sparse. The Tool is free to use, and offers five different techniques for eliciting univariate probability distributions. A key feature of the Tool is that users can log in from different sites and view and interact with the same graphical displays, so that expert elicitation sessions can be conducted remotely (in conjunction with tele- or videoconferencing). This will make probability elicitation easier in situations where it is difficult to interview experts in person. Even when conducting elicitation remotely, interviewers will be able to follow good elicitation practice, advise the experts, and provide instantaneous feedback and assistance.

Journal ArticleDOI
TL;DR: In this article, an iterative method to calculate a limiting probability distribution vector of a transition probability tensor arising from a higher order Markov chain was proposed and developed, where all entries of the eigenvalue matrix are required to be non-negative and its summation must be equal to one.
Abstract: In this paper, we propose and develop an iterative method to calculate a limiting probability distribution vector of a transition probability tensor arising from a higher order Markov chain. In the model, the computation of such limiting probability distribution vector can be formulated as a -eigenvalue problem associated with the eigenvalue 1 of where all the entries of are required to be non-negative and its summation must be equal to one. This is an analog of the matrix case for a limiting probability vector of a transition probability matrix arising from the first-order Markov chain. We show that if is a transition probability tensor, then solutions of this -eigenvalue problem exist. When is irreducible, all the entries of solutions are positie. With some suitable conditions of , the limiting probability distribution vector is even unique. Under the same uniqueness assumption, the linear convergence of the iterative method can be established. Numerical examples are presented to illustrate the ...

Proceedings ArticleDOI
23 Jun 2014
TL;DR: This framework introduces a sample selection method and a subspace-based method for unsupervised domain adaptation, and shows that both these manifold-based techniques outperform the corresponding approaches based on the MMD.
Abstract: In this paper, we tackle the problem of unsupervised domain adaptation for classification. In the unsupervised scenario where no labeled samples from the target domain are provided, a popular approach consists in transforming the data such that the source and target distributions become similar. To compare the two distributions, existing approaches make use of the Maximum Mean Discrepancy (MMD). However, this does not exploit the fact that probability distributions lie on a Riemannian manifold. Here, we propose to make better use of the structure of this manifold and rely on the distance on the manifold to compare the source and target distributions. In this framework, we introduce a sample selection method and a subspace-based method for unsupervised domain adaptation, and show that both these manifold-based techniques outperform the corresponding approaches based on the MMD. Furthermore, we show that our subspace-based approach yields state-of-the-art results on a standard object recognition benchmark.

Journal ArticleDOI
14 Feb 2014-PLOS ONE
TL;DR: An efficient sampling algorithm is proposed (called optGpSampler), which implements the Artificial Centering Hit-and-Run algorithm in a different manner than the sampling algorithm implemented in the COBRA Toolbox for metabolic network analysis, here called gpSampler.
Abstract: Constraint-based models of metabolic networks are typically underdetermined, because they contain more reactions than metabolites. Therefore the solutions to this system do not consist of unique flux rates for each reaction, but rather a space of possible flux rates. By uniformly sampling this space, an estimated probability distribution for each reaction’s flux in the network can be obtained. However, sampling a high dimensional network is time-consuming. Furthermore, the constraints imposed on the network give rise to an irregularly shaped solution space. Therefore more tailored, efficient sampling methods are needed. We propose an efficient sampling algorithm (called optGpSampler), which implements the Artificial Centering Hit-and-Run algorithm in a different manner than the sampling algorithm implemented in the COBRA Toolbox for metabolic network analysis, here called gpSampler. Results of extensive experiments on different genome-scale metabolic networks show that optGpSampler is up to 40 times faster than gpSampler. Application of existing convergence diagnostics on small network reconstructions indicate that optGpSampler converges roughly ten times faster than gpSampler towards similar sampling distributions. For networks of higher dimension (i.e. containing more than 500 reactions), we observed significantly better convergence of optGpSampler and a large deviation between the samples generated by the two algorithms. Availability: optGpSampler for Matlab and Python is available for non-commercial use at: http://cs.ru.nl/~wmegchel/optGpSampler/.

Proceedings Article
21 Jun 2014
TL;DR: This paper introduces a technique for graph-based semi-supervised learning of histograms, derived from the theory of optimal transportation, which can be used for histograms on non-standard domains like circles and extends to related problems such as smoothing distributions on graph nodes.
Abstract: Probability distributions and histograms are natural representations for product ratings, traffic measurements, and other data considered in many machine learning applications Thus, this paper introduces a technique for graph-based semi-supervised learning of histograms, derived from the theory of optimal transportation Our method has several properties making it suitable for this application; in particular, its behavior can be characterized by the moments and shapes of the histograms at the labeled nodes In addition, it can be used for histograms on non-standard domains like circles, revealing a strategy for manifold-valued semi-supervised learning We also extend this technique to related problems such as smoothing distributions on graph nodes

Book
06 Dec 2014
TL;DR: In this article, the authors present a test of Hypotheses based on the Normal Distribution of the distribution of the probability distributions of the normal distribution of a set of hypotheses, which is used for statistical analysis of survey data.
Abstract: 1. Introduction, 2. Data and Numbers, 3. Descriptive Tools, 4. Probability and Life Tables, 5. Probability Distributions, 6. Study Designs, 7. Interval Estimation, 8. Test of Hypotheses, 9. Test of Hypotheses Based on the Normal Distribution, 10. Nonparametric Tests, 11. Analysis of Categorical Data, 12. Analysis of Survival Data, 13. Analysis of Variance, 14. Linear Regression, 15. Logistic Regression, 16. Analysis of Survey Data, Appendix A. Statistical Tables, Appendix B. Selected Governmental Biostatistical Data, Appendix C. Solutions to Selected Exercises

Journal ArticleDOI
TL;DR: In this paper, a model for earthquakes induced by subsurface reservoir volume changes is developed for the Groningen gas field, which is based on the work of Kostrov and McGarr.
Abstract: A seismological model is developed for earthquakes induced by subsurface reservoir volume changes. The approach is based on the work of Kostrov (1974) and McGarr (1976) linking total strain to the summed seismic moment in an earthquake catalog. We refer to the fraction of the total strain expressed as seismic moment as the strain partitioning function, α. A probability distribution for total seismic moment as a function of time is derived from an evolving earthquake catalog. The moment distribution is taken to be a Pareto Sum Distribution with confidence bounds estimated using approximations given by Zaliapin et al. (2005). In this way available seismic moment is expressed in terms of reservoir volume change and hence compaction in the case of a depleting reservoir. The Pareto Sum Distribution for moment and the Pareto Distribution underpinning the Gutenberg-Richter Law are sampled using Monte Carlo methods to simulate synthetic earthquake catalogs for subsequent estimation of seismic ground motion hazard. We demonstrate the method by applying it to the Groningen gas field. A compaction model for the field calibrated using various geodetic data allows reservoir strain due to gas extraction to be expressed as a function of both spatial position and time since the start of production. Fitting with a generalized logistic function gives an empirical expression for the dependence of α on reservoir compaction. Probability density maps for earthquake event locations can then be calculated from the compaction maps. Predicted seismic moment is shown to be strongly dependent on planned gas production.

Posted Content
TL;DR: In this article, the authors compare the properties of quantile aggregation with those of the forecast combination schemes normally adopted in the econometric forecasting literature, based on linear or logarithmic averages of the individual densities.
Abstract: Quantile aggregation (or 'Vincentization') is a simple and intuitive way of combining probability distributions, originally proposed by S. B. Vincent in 1912. In certain cases, such as under Gaussianity, the Vincentized distribution belongs to the same family as that of the individual distributions and can be obtained by averaging the individual parameters. This paper compares the properties of quantile aggregation with those of the forecast combination schemes normally adopted in the econometric forecasting literature, based on linear or logarithmic averages of the individual densities. In general we find that: (i) larger differences among the combination schemes occur when there are biases in the individual forecasts, in which case quantile aggregation seems preferable overall; (ii) the choice of the combination weights is important in determining the performance of the various methods. Monte Carlo simulation experiments indicate that the properties of quantile aggregation fall between those of the linear and the logarithmic pool, and that quantile averaging is particularly useful for combining forecast distributions with large differences in location. An empirical illustration is provided with density forecasts from time series and econometric models for Italian GDP.

Journal ArticleDOI
TL;DR: This work introduces a copula-embedded BMA (Cop-BMA) method that relaxes any assumption on the shape of conditional PDFs, and shows that the predictive distributions are more accurate and reliable, less biased, and more confident with small uncertainty after Cop-B MA application.
Abstract: Bayesian model averaging (BMA) is a popular approach to combine hydrologic forecasts from individual models and characterize the uncertainty induced by model structure. In the original form of BMA, the conditional probability density function (PDF) of each model is assumed to be a particular probability distribution (e.g., Gaussian, gamma, etc.). If the predictions of any hydrologic model do not follow certain distribution, a data transformation procedure is required prior to model averaging. Moreover, it is strongly recommended to apply BMA on unbiased forecasts, whereas it is sometimes difficult to effectively remove bias from the predictions of complex hydrologic models. To overcome these limitations, we develop an approach to integrate a group of multivariate functions, the so-called copula functions, into BMA. Here we introduce a copula-embedded BMA (Cop-BMA) method that relaxes any assumption on the shape of conditional PDFs. Copula functions have a flexible structure and do not restrict the shape of posterior distributions. Furthermore, copulas are effective tools in removing bias from hydrologic forecasts. To compare the performance of BMA with Cop-BMA, they are applied to hydrologic forecasts from different rainfall-runoff and land-surface models. We consider the streamflow observation and simulations for 10 river basins provided by the Model Parameter Estimation Experiment (MOPEX) project. Results demonstrate that the predictive distributions are more accurate and reliable, less biased, and more confident with small uncertainty after Cop-BMA application. It is also shown that the postprocessed forecasts have better correlation with observation after Cop-BMA application.

Book
04 Mar 2014
TL;DR: This paper presents a meta-modelling procedure called Bayesian Methods for Testing Hypotheses that automates the very labor-intensive and therefore time-heavy and expensive process of estimating uncertainty in the discrete-time model.
Abstract: Introduction.- Exploring Data.- Probability and Random Variables.- Random Vectors.- Important Probability Distributions.- Sequences of Random Variables.- Estimation and Uncertainty.- Estimation in Theory and Practice.- Uncertainty and the Bootstrap.- Statistical Significance.- General Methods for Testing Hypotheses.- Linear Regression.- Analysis of Variance.- Generalized Regression.- Nonparametric Regression.- Bayesian Methods.- Multivariate Analysis.- Time Series.- Point Processes.- Appendix: Mathematical Background.- Example Index.- Index.- Bibliography.

Journal ArticleDOI
TL;DR: In this paper, it was shown that Zipf's law arises naturally for large systems, without fine-tuning parameters to a point, if there is a fluctuating unobserved variable that affects the system, such as a common input stimulus that causes individual neurons to fire at time-varying rates.
Abstract: The joint probability distribution of states of many degrees of freedom in biological systems, such as firing patterns in neural networks or antibody sequence compositions, often follows Zipf's law, where a power law is observed on a rank-frequency plot. This behavior has been shown to imply that these systems reside near a unique critical point where the extensive parts of the entropy and energy are exactly equal. Here, we show analytically, and via numerical simulations, that Zipf-like probability distributions arise naturally if there is a fluctuating unobserved variable (or variables) that affects the system, such as a common input stimulus that causes individual neurons to fire at time-varying rates. In statistics and machine learning, these are called latent-variable or mixture models. We show that Zipf's law arises generically for large systems, without fine-tuning parameters to a point. Our work gives insight into the ubiquity of Zipf's law in a wide range of systems.