scispace - formally typeset
Search or ask a question

Showing papers on "Probability distribution published in 2016"


Proceedings Article
05 Dec 2016
TL;DR: In this paper, the generative adversarial training (GAN) approach is shown to be a special case of an existing more general variational divergence estimation approach, and any f-divergence can be used for training GANs.
Abstract: Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models.

675 citations


Posted Content
TL;DR: The paper argues that the set of distributions chosen should be chosen to be appropriate for the application at hand, and that some of the choices that have been popular until recently are, for many applications, not good choices.
Abstract: Distributionally robust stochastic optimization (DRSO) is an approach to optimization under uncertainty in which, instead of assuming that there is an underlying probability distribution that is known exactly, one hedges against a chosen set of distributions. In this paper, we consider sets of distributions that are within a chosen Wasserstein distance from a nominal distribution. We argue that such a choice of sets has two advantages: (1) The resulting distributions hedged against are more reasonable than those resulting from other popular choices of sets, such as {\Phi}-divergence ambiguity set. (2) The problem of determining the worst-case expectation has desirable tractability properties. We derive a dual reformulation of the corresponding DRSO problem and construct approximate worst-case distributions (or an exact worst-case distribution if it exists) explicitly via the first-order optimality conditions of the dual problem. Our contributions are five-fold. (i) We identify necessary and sufficient conditions for the existence of a worst-case distribution, which is naturally related to the growth rate of the objective function. (ii) We show that the worst-case distributions resulting from an appropriate Wasserstein distance have a concise structure and a clear interpretation. (iii) Using this structure, we show that data-driven DRSO problems can be approximated to any accuracy by robust optimization problems, and thereby many DRSO problems become tractable by using tools from robust optimization. (iv) To the best of our knowledge, our proof of strong duality is the first constructive proof for DRSO problems, and we show that the constructive proof technique is also useful in other contexts. (v) Our strong duality result holds in a very general setting, and we show that it can be applied to infinite dimensional process control problems and worst-case value-at-risk analysis.

505 citations


Journal ArticleDOI
TL;DR: This paper derives an equivalent reformulation for DCC and shows that it is equivalent to a classical chance constraint with a perturbed risk level, and analyzes the relationship between the conservatism of D CC and the size of historical data, which can help indicate the value of data.
Abstract: In this paper, we study data-driven chance constrained stochastic programs, or more specifically, stochastic programs with distributionally robust chance constraints (DCCs) in a data-driven setting to provide robust solutions for the classical chance constrained stochastic program facing ambiguous probability distributions of random parameters. We consider a family of density-based confidence sets based on a general $$\phi $$ź-divergence measure, and formulate DCC from the perspective of robust feasibility by allowing the ambiguous distribution to run adversely within its confidence set. We derive an equivalent reformulation for DCC and show that it is equivalent to a classical chance constraint with a perturbed risk level. We also show how to evaluate the perturbed risk level by using a bisection line search algorithm for general $$\phi $$ź-divergence measures. In several special cases, our results can be strengthened such that we can derive closed-form expressions for the perturbed risk levels. In addition, we show that the conservatism of DCC vanishes as the size of historical data goes to infinity. Furthermore, we analyze the relationship between the conservatism of DCC and the size of historical data, which can help indicate the value of data. Finally, we conduct extensive computational experiments to test the performance of the proposed DCC model and compare various $$\phi $$ź-divergence measures based on a capacitated lot-sizing problem with a quality-of-service requirement.

437 citations


Journal ArticleDOI
TL;DR: In this paper, the authors present an algorithm for fitting a manifold to an unknown probability distribution supported in a separable Hilbert space, only using i.i.d samples from that distribution.
Abstract: The hypothesis that high dimensional data tend to lie in the vicinity of a low dimensional manifold is the basis of manifold learning. The goal of this paper is to develop an algorithm (with accompanying complexity guarantees) for fitting a manifold to an unknown probability distribution supported in a separable Hilbert space, only using i.i.d samples from that distribution. More precisely, our setting is the following. Suppose that data are drawn independently at random from a probability distribution $P$ supported on the unit ball of a separable Hilbert space $H$. Let $G(d, V, \tau)$ be the set of submanifolds of the unit ball of $H$ whose volume is at most $V$ and reach (which is the supremum of all $r$ such that any point at a distance less than $r$ has a unique nearest point on the manifold) is at least $\tau$. Let $L(M, P)$ denote mean-squared distance of a random point from the probability distribution $P$ to $M$. We obtain an algorithm that tests the manifold hypothesis in the following sense. The algorithm takes i.i.d random samples from $P$ as input, and determines which of the following two is true (at least one must be): (a) There exists $M \in G(d, CV, \frac{\tau}{C})$ such that $L(M, P) \leq C \epsilon.$ (b) There exists no $M \in G(d, V/C, C\tau)$ such that $L(M, P) \leq \frac{\epsilon}{C}.$ The answer is correct with probability at least $1-\delta$.

346 citations


Journal ArticleDOI
TL;DR: In this paper, the meta distribution of the SIR is derived for Poisson bipolar and cellular networks with Rayleigh fading, and a simple approximation for it is provided for the point process.
Abstract: The calculation of the SIR distribution at the typical receiver (or, equivalently, the success probability of transmissions over the typical link) in Poisson bipolar and cellular networks with Rayleigh fading is relatively straightforward, but it only provides limited information on the success probabilities of the individual links This paper focuses on the meta distribution of the SIR, which is the distribution of the conditional success probability $P_{\rm{ s}}$ given the point process, and provides bounds, an exact analytical expression, and a simple approximation for it The meta distribution provides fine-grained information on the SIR and answers questions such as “What fraction of users in a Poisson cellular network achieve 90% link reliability if the required SIR is 5 dB?” Interestingly, in the bipolar model, if the transmit probability $p$ is reduced while increasing the network density $\lambda$ such that the density of concurrent transmitters $\lambda p$ stays constant as $p\rightarrow 0$ , $P_{\rm{ s}}$ degenerates to a constant, ie, all links have exactly the same success probability in the limit, which is the one of the typical link In contrast, in the cellular case, if the interfering base stations are active independently with probability $p$ , the variance of $P_{\rm{ s}}$ approaches a non-zero constant when $p$ is reduced to 0 while keeping the mean success probability constant

295 citations


Proceedings Article
05 Dec 2016
TL;DR: In this article, the dual optimal transport problem can be re-cast as the maximization of an expectation, which results in a smooth dual optimization problem which can be addressed with algorithms that have a provably faster convergence.
Abstract: Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation; (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different computational setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat the current state of the art finite dimensional OT solver (Sinkhorn's algorithm) ; (ii) when comparing a discrete distribution to a continuous density, a re-formulation (semi-discrete) of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, and is more efficient than discretizing beforehand the two densities. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems.

276 citations


Posted Content
TL;DR: In this paper, a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein's identity with the reproducing kernel Hilbert space theory is derived, and applied to test how well a probabilistic model fits a set of observations.
Abstract: We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein's identity with the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly.

251 citations


Journal ArticleDOI
TL;DR: The main contribution is to derive the optimal control in this case which in fact is given in closed-form (Theorem 1), and in the zero-noise limit, the solution of a (deterministic) mass transport problem with general quadratic cost.
Abstract: We address the problem of steering the state of a linear stochastic system to a prescribed distribution over a finite horizon with minimum energy, and the problem to maintain the state at a stationary distribution over an infinite horizon with minimum power. For both problems the control and Gaussian noise channels are allowed to be distinct, thereby, placing the results of this paper outside of the scope of previous work both in probability and in control. The special case where the disturbance and control enter through the same channels has been addressed in the first part of this work that was presented as Part I. Herein, we present sufficient conditions for optimality in terms of a system of dynamically coupled Riccati equations in the finite horizon case and in terms of algebraic conditions for the stationary case. We then address the question of feasibility for both problems. For the finite-horizon case, provided the system is controllable, we prove that without any restriction on the directionality of the stochastic disturbance it is always possible to steer the state to any arbitrary Gaussian distribution over any specified finite time-interval. For the stationary infinite horizon case, it is not always possible to maintain the state at an arbitrary Gaussian distribution through constant state-feedback. It is shown that covariances of admissible stationary Gaussian distributions are characterized by a certain Lyapunov-like equation and, in fact, they coincide with the class of stationary state covariances that can be attained by a suitable stationary colored noise as input. We finally address the question of how to compute suitable controls numerically. We present an alternative to solving the system of coupled Riccati equations, by expressing the optimal controls in the form of solutions to (convex) semi-definite programs for both cases. We conclude with an example to steer the state covariance of the distribution of inertial particles to an admissible stationary Gaussian distribution over a finite interval, to be maintained at that stationary distribution thereafter by constant-gain state-feedback control.

229 citations


Posted Content
TL;DR: In this paper, the generative adversarial training (GAN) approach is shown to be a special case of an existing more general variational divergence estimation approach, and any f-divergence can be used for training GANs.
Abstract: Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models.

212 citations


Journal ArticleDOI
TL;DR: This paper presents a simple, easily-implemented algorithm for dynamically adapting the temperature configuration of a sampler while sampling, and dynamically adjusts the temperature spacing to achieve a uniform rate of exchanges between chains at neighbouring temperatures.
Abstract: Modern problems in astronomical Bayesian inference require efficient methods for sampling from complex, high-dimensional, often multi-modal probability distributions. Most popular methods, such as Markov chain Monte Carlo sampling, perform poorly on strongly multi-modal probability distributions, rarely jumping between modes or settling on just one mode without finding others. Parallel tempering addresses this problem by sampling simultaneously with separate Markov chains from tempered versions of the target distribution with reduced contrast levels. Gaps between modes can be traversed at higher temperatures, while individual modes can be efficiently explored at lower temperatures. In this paper, we investigate how one might choose the ladder of temperatures to achieve more efficient sampling, as measured by the autocorrelation time of the sampler. In particular, we present a simple, easily-implemented algorithm for dynamically adapting the temperature configuration of a sampler while sampling. This algorithm dynamically adjusts the temperature spacing to achieve a uniform rate of exchanges between chains at neighbouring temperatures. We compare the algorithm to conventional geometric temperature configurations on a number of test distributions and on an astrophysical inference problem, reporting efficiency gains by a factor of 1.2-2.5 over a well-chosen geometric temperature configuration and by a factor of 1.5-5 over a poorly chosen configuration. On all of these problems a sampler using the dynamical adaptations to achieve uniform acceptance ratios between neighbouring chains outperforms one that does not.

204 citations


Proceedings Article
19 Jun 2016
TL;DR: A variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices is introduced and "pseudo-data" (Snelson & Ghahramani, 2005) is incorporated in this model, which allows for more efficient posterior sampling while maintaining the properties of the original model.
Abstract: We introduce a variational Bayesian neural network where the parameters are governed via a probability distribution on random matrices. Specifically, we employ a matrix variate Gaussian (Gupta & Nagar, 1999) parameter posterior distribution where we explicitly model the covariance among the input and output dimensions of each layer. Furthermore, with approximate covariance matrices we can achieve a more efficient way to represent those correlations that is also cheaper than fully factorized parameter posteriors. We further show that with the "local reprarametrization trick" (Kingma et al., 2015) on this posterior distribution we arrive at a Gaussian Process (Rasmussen, 2006) interpretation of the hidden units in each layer and we, similarly with (Gal & Ghahramani, 2015), provide connections with deep Gaussian processes. We continue in taking advantage of this duality and incorporate "pseudo-data" (Snelson & Ghahramani, 2005) in our model, which in turn allows for more efficient posterior sampling while maintaining the properties of the original model. The validity of the proposed approach is verified through extensive experiments.

Journal ArticleDOI
TL;DR: This paper uses a powerful nonparametric approach called lower upper bound estimation (LUBE) method to construct the PIs and uses a new framework based on a combination of PIs to overcome the performance instability of neural networks (NNs) used in the LUBE method.
Abstract: This paper makes use of the idea of prediction intervals (PIs) to capture the uncertainty associated with wind power generation in power systems. Since the forecasting errors cannot be appropriately modeled using distribution probability functions, here we employ a powerful nonparametric approach called lower upper bound estimation (LUBE) method to construct the PIs. The proposed LUBE method uses a new framework based on a combination of PIs to overcome the performance instability of neural networks (NNs) used in the LUBE method. Also, a new fuzzy-based cost function is proposed with the purpose of having more freedom and flexibility in adjusting NN parameters used for construction of PIs. In comparison with the other cost functions in the literature, this new formulation allows the decision-makers to apply their preferences for satisfying the PI coverage probability and PI normalized average width individually. As the optimization tool, bat algorithm with a new modification is introduced to solve the problem. The feasibility and satisfying performance of the proposed method are examined using datasets taken from different wind farms in Australia.

Journal ArticleDOI
TL;DR: Based on Lyapunov functions, Halanay inequality, and linear matrix inequalities, sufficient conditions that depend on the probability distribution of the delay coupling and the impulsive delay were obtained and numerical simulations are used to show the effectiveness of the theoretical results.
Abstract: This paper deals with the exponential synchronization of coupled stochastic memristor-based neural networks with probabilistic time-varying delay coupling and time-varying impulsive delay. There is one probabilistic transmittal delay in the delayed coupling that is translated by a Bernoulli stochastic variable satisfying a conditional probability distribution. The disturbance is described by a Wiener process. Based on Lyapunov functions, Halanay inequality, and linear matrix inequalities, sufficient conditions that depend on the probability distribution of the delay coupling and the impulsive delay were obtained. Numerical simulations are used to show the effectiveness of the theoretical results.

Journal ArticleDOI
TL;DR: In this article, a Brownian particle diffusing under a time-modulated stochastic resetting mechanism to a fixed position is studied and the rate of resetting r(t) is a function of the time t since the last reset event.
Abstract: We study a Brownian particle diffusing under a time-modulated stochastic resetting mechanism to a fixed position. The rate of resetting r(t) is a function of the time t since the last reset event. We derive a sufficient condition on r(t) for a steady-state probability distribution of the position of the particle to exist. We derive the form of the steady-state distributions under some particular choices of r(t) and also consider the late time relaxation behavior of the probability distribution. We consider first passage time properties for the Brownian particle to reach the origin and derive a formula for the mean first passage time (MFPT). Finally, we study optimal properties of the MFPT and show that a threshold function is at least locally optimal for the problem of minimizing the MFPT.

Journal ArticleDOI
TL;DR: In this paper, a data-driven risk-averse stochastic unit commitment model is proposed, where risk aversion stems from the worst-case probability distribution of the renewable energy generation amount, and the corresponding solution methods to solve the problem are developed.
Abstract: Considering recent development of deregulated energy markets and the intermittent nature of renewable energy generation, it is important for power system operators to ensure cost effectiveness while maintaining the system reliability To achieve this goal, significant research progress has recently been made to develop stochastic optimization models and solution methods to improve reliability unit commitment run practice, which is used in the day-ahead market for ISOs/RTOs to ensure sufficient generation capacity available in real time to accommodate uncertainties Most stochastic optimization approaches assume the renewable energy generation amounts follow certain distributions However, in practice, the distributions are unknown and instead, a certain amount of historical data are available In this research, we propose a data-driven risk-averse stochastic unit commitment model, where risk aversion stems from the worst-case probability distribution of the renewable energy generation amount, and develop the corresponding solution methods to solve the problem Given a set of historical data, our proposed approach first constructs a confidence set for the distributions of the uncertain parameters using statistical inference and solves the corresponding risk-averse stochastic unit commitment problem Then, we show that the conservativeness of the proposed stochastic program vanishes as the number of historical data increases to infinity Finally, the computational results numerically show how the risk-averse stochastic unit commitment problem converges to the risk-neutral one, which indicates the value of data

01 Jan 2016
TL;DR: In this paper, the authors consider two sets of functions Li9, Lq and Mlf, Mp of independent random variables Xl9 Xn with the condition that the unknown functions in these two types of equations are polynomials of an assigned degree.
Abstract: involving an unknown function ?5 of a single variable t Conditions under which the unknown functions in these two types of equations are polynomials of an assigned degree are given The third, on the characterization of normal and gamma distributions, extends the earlier work of the authors (Rao, 1967 and Khatri and Rao, 1968*) We consider two sets of functions Li9 , Lq and Mlf , Mp of independent random variables Xl9 Xn with the condition

Proceedings ArticleDOI
19 Jun 2016
TL;DR: This work shows that O(d/ε) copies suffice to obtain an estimate ρ that satisfies ||ρ − ρ||F2 ≤ ε (with high probability), and is the first to show that nontrivial tomography can be obtained using a number of copies that is just linear in the dimension.
Abstract: We continue our analysis of: (i) "Quantum tomography", i.e., learning a quantum state, i.e., the quantum generalization of learning a discrete probability distribution; (ii) The distribution of Young diagrams output by the RSK algorithm on random words. Regarding (ii), we introduce two powerful new tools: first, a precise upper bound on the expected length of the longest union of k disjoint increasing subsequences in a random length-n word with letter distribution α1 ≥ α2 ≥ … ≥ αd. Our bound has the correct main term and second-order term, and holds for all n, not just in the large-n limit. Second, a new majorization property of the RSK algorithm that allows one to analyze the Young diagram formed by the lower rows λk, λk+1, … of its output. These tools allow us to prove several new theorems concerning the distribution of random Young diagrams in the nonasymptotic regime, giving concrete error bounds that are optimal, or nearly so, in all parameters. As one example, we give a fundamentally new proof of the celebrated fact that the expected length of the longest increasing sequence in a random length-n permutation is bounded by 2√n. This is the k = 1, αi ≡ 1/d, d → ∞ special case of a much more general result we prove: the expected length of the kth Young diagram row produced by an α-random word is αk n ± 2√αkd n. From our new analyses of random Young diagrams we derive several new results in quantum tomography, including: (i) learning the eigenvalues of an unknown state to e-accuracy in Hellinger-squared, chi-squared, or KL distance, using n = O(d2/e) copies; (ii) learning the top-k eigenvalues of an unknown state to e-accuracy in Hellinger-squared or chi-squared distance using n = O(kd/e) copies or in l22 distance using n = O(k/e) copies; (iii) learning the optimal rank-k approximation of an unknown state to e-fidelity (Hellinger-squared distance) using n = O(kd/e) copies. We believe our new techniques will lead to further advances in quantum learning; indeed, they have already subsequently been used for efficient von Neumann entropy estimation.

Journal ArticleDOI
TL;DR: ANNz2, a new implementation of the public software for photometric redshift (photo-z) estimation of Collister & Lahav, which now includes generation of full probability distribution functions (PDFs), is presented.
Abstract: We present ANNz2, a new implementation of the public software for photometric redshift (photo-z) estimation of Collister & Lahav, which now includes generation of full probability distribution functions (PDFs). ANNz2 utilizes multiple machine learning methods, such as artificial neural networks and boosted decision/regression trees. The objective of the algorithm is to optimize the performance of the photo-z estimation, to properly derive the associated uncertainties, and to produce both single-value solutions and PDFs. In addition, estimators are made available, which mitigate possible problems of non-representative or incomplete spectroscopic training samples. ANNz2 has already been used as part of the first weak lensing analysis of the Dark Energy Survey, and is included in the experiment's first public data release. Here we illustrate the functionality of the code using data from the tenth data release of the Sloan Digital Sky Survey and the Baryon Oscillation Spectroscopic Survey. The code is available for download at http://github.com/IftachSadeh/ANNZ.

Journal ArticleDOI
TL;DR: CFSFDP-HD proposes a nonparametric method for estimating the probability distribution of a given dataset based on heat diffusion in an infinite domain, which accounts for both selection of the cutoff distance and boundary correction of the kernel density estimation.

Journal ArticleDOI
TL;DR: In this paper, Wang et al. compared the performance of parametric and non-parametric models for wind speed probability distribution and the estimation methods for these models' parameters (the widely used methods and stochastic heuristic optimization algorithm).
Abstract: The statistical characteristics of wind and the selection of suitable wind turbines are essential to effectively evaluate wind energy potential and design wind farms. Using four sites in central China as examples, this research reviews and compares the popular parametric and non-parametric models for wind speed probability distribution and the estimation methods for these models’ parameters (the widely used methods and stochastic heuristic optimization algorithm). The simulations reveal that the non-parametric model outperforms all of the selected parametric models in terms of the fitting accuracy and the operational simplicity, and the stochastic heuristic optimization algorithm is superior to the widely used estimation methods. This study also reviews and discusses six power curves proposed by the literature and the power loss caused by the mutual wake effect between turbines in the wind energy potential assessment process. The evaluation results demonstrate that choice of power curves influences the selection of wind turbines and that consideration of the mutual wake effect may help to optimize wind farm design in wind energy assessment.

Journal ArticleDOI
01 Jan 2016
TL;DR: An estimation of distribution algorithm (EDA)-based memetic algorithm (MA) is proposed for solving the distributed assembly permutation flow-shop scheduling problem (DAPFSP) with the objective to minimize the maximum completion time.
Abstract: In this paper, an estimation of distribution algorithm (EDA)-based memetic algorithm (MA) is proposed for solving the distributed assembly permutation flow-shop scheduling problem (DAPFSP) with the objective to minimize the maximum completion time. A novel bi-vector-based method is proposed to represent a solution for the DAPFSP. In the searching phase of the EDA-based MA (EDAMA), the EDA-based exploration and the local-search-based exploitation are incorporated within the MA framework. For the EDA-based exploration phase, a probability model is built to describe the probability distribution of superior solutions. Besides, a novel selective-enhancing sampling mechanism is proposed for generating new solutions by sampling the probability model. For the local-search-based exploitation phase, the critical path of the DAPFSP is analyzed to avoid invalid searching operators. Based on the analysis, a critical-path-based local search strategy is proposed to further improve the potential solutions obtained in the EDA-based searching phase. Moreover, the effect of parameter setting is investigated based on the Taguchi method of design-of-experiment. Suitable parameter values are suggested for instances with different scales. Finally, numerical simulations based on 1710 benchmark instances are carried out. The experimental results and comparisons with existing algorithms show the effectiveness of the EDAMA in solving the DAPFSP. In addition, the best-known solutions of 181 instances are updated by the EDAMA.

Proceedings Article
19 Jun 2016
TL;DR: A new discrepancy statistic for measuring differences between two probability distributions is derived based on combining Stein's identity with the reproducing kernel Hilbert space theory and a new class of powerful goodness-of-fit tests are derived that are widely applicable for complex and high dimensional distributions.
Abstract: We derive a new discrepancy statistic for measuring differences between two probability distributions based on combining Stein's identity with the reproducing kernel Hilbert space theory. We apply our result to test how well a probabilistic model fits a set of observations, and derive a new class of powerful goodness-of-fit tests that are widely applicable for complex and high dimensional distributions, even for those with computationally intractable normalization constants. Both theoretical and empirical properties of our methods are studied thoroughly.

Journal ArticleDOI
TL;DR: In this paper, a new Metropolis-adjusted Langevin algorithm (MALA) is proposed to simulate efficiently from high-dimensional densities that are log-concave, a class of probability distributions that is widely used in modern highdimensional statistics and data analysis.
Abstract: This paper presents a new Metropolis-adjusted Langevin algorithm (MALA) that uses convex analysis to simulate efficiently from high-dimensional densities that are log-concave, a class of probability distributions that is widely used in modern high-dimensional statistics and data analysis. The method is based on a new first-order approximation for Langevin diffusions that exploits log-concavity to construct Markov chains with favourable convergence properties. This approximation is closely related to Moreau---Yoshida regularisations for convex functions and uses proximity mappings instead of gradient mappings to approximate the continuous-time process. The proposed method complements existing MALA methods in two ways. First, the method is shown to have very robust stability properties and to converge geometrically for many target densities for which other MALA are not geometric, or only if the step size is sufficiently small. Second, the method can be applied to high-dimensional target densities that are not continuously differentiable, a class of distributions that is increasingly used in image processing and machine learning and that is beyond the scope of existing MALA and HMC algorithms. To use this method it is necessary to compute or to approximate efficiently the proximity mappings of the logarithm of the target density. For several popular models, including many Bayesian models used in modern signal and image processing and machine learning, this can be achieved with convex optimisation algorithms and with approximations based on proximal splitting techniques, which can be implemented in parallel. The proposed method is demonstrated on two challenging high-dimensional and non-differentiable models related to image resolution enhancement and low-rank matrix estimation that are not well addressed by existing MCMC methodology.

Journal ArticleDOI
TL;DR: In this article, the authors investigated load frequency control of power systems with probabilistic interval delays and derived sufficient delay-distribution-dependent stability and stabilization criteria to obtain the gain of PI-based LFC and the allowable upper bound of the communication delay simultaneously.
Abstract: This paper investigates load frequency control (LFC) of power systems with probabilistic interval delays. The LFC design specifically takes the probability distribution characteristic of the communication delays into account. Firstly, by considering the probability distribution characteristic of the communication delays in the modeling, the original power systems with a PI-based LFC are transformed into a stochastic closed-loop time-delay system. Secondly, by using the Lyapunov-Krasovskii functional method, sufficient delay-distribution-dependent stability and stabilization criteria are derived for the power systems with a PI-based LFC. Following this, an algorithm is provided to obtain the gain of PI-based LFC and the allowable upper bound of the communication delay simultaneously while preserving the desired $H_{\infty}$ performance. Finally, a case study is applied to illustrate the effectiveness of the proposed delay-distribution-dependent PI-based LFC design method.

Journal ArticleDOI
TL;DR: In this article, a probabilistic power flow analysis technique based on the stochastic response surface method is proposed to estimate the probability distributions and statistics of power flow responses without using series expansions such as the Gram-Charlier, Cornish-Fisher or Edgeworth series.
Abstract: This paper proposes a probabilistic power flow analysis technique based on the stochastic response surface method. The probability distributions and statistics of power flow responses can be accurately and efficiently estimated by the proposed method without using series expansions such as the Gram-Charlier, Cornish-Fisher, or Edgeworth series. The stochastic continuous input variables following normal distributions such as loads or non-normal distributions such as photovoltaic generation and wind power and their multiple correlations can be easily modeled. The correctness, effectiveness and adaptability of the proposed method are demonstrated by comparing the probabilistic power flow analysis results of the IEEE 14-bus and 57-bus standard test systems obtained from the proposed method, the point estimate method, and the Monte Carlo simulation method.

Proceedings ArticleDOI
26 Apr 2016
TL;DR: This work introduces a new saliency map model which formulates a map as a generalized Bernoulli distribution and trains a deep architecture to predict such maps using novel loss functions which pair the softmax activation function with measures designed to compute distances between probability distributions.
Abstract: Most saliency estimation methods aim to explicitly model low-level conspicuity cues such as edges or blobs and may additionally incorporate top-down cues using face or text detection. Data-driven methods for training saliency models using eye-fixation data are increasingly popular, particularly with the introduction of large-scale datasets and deep architectures. However, current methods in this latter paradigm use loss functions designed for classification or regression tasks whereas saliency estimation is evaluated on topographical maps. In this work, we introduce a new saliency map model which formulates a map as a generalized Bernoulli distribution. We then train a deep architecture to predict such maps using novel loss functions which pair the softmax activation function with measures designed to compute distances between probability distributions. We show in extensive experiments the effectiveness of such loss functions over standard ones on four public benchmark datasets, and demonstrate improved performance over state-of-the-art saliency methods.

Posted Content
TL;DR: This work proposes to train a deep directed generative model (not a Markov chain) so that its sampling distribution approximately matches the energy function that is being trained, Inspired by generative adversarial networks.
Abstract: Training energy-based probabilistic models is confronted with apparently intractable sums, whose Monte Carlo estimation requires sampling from the estimated probability distribution in the inner loop of training. This can be approximately achieved by Markov chain Monte Carlo methods, but may still face a formidable obstacle that is the difficulty of mixing between modes with sharp concentrations of probability. Whereas an MCMC process is usually derived from a given energy function based on mathematical considerations and requires an arbitrarily long time to obtain good and varied samples, we propose to train a deep directed generative model (not a Markov chain) so that its sampling distribution approximately matches the energy function that is being trained. Inspired by generative adversarial networks, the proposed framework involves training of two models that represent dual views of the estimated probability distribution: the energy function (mapping an input configuration to a scalar energy value) and the generator (mapping a noise vector to a generated configuration), both represented by deep neural networks.

Journal ArticleDOI
Sanping Zhou1, Jinjun Wang1, Shun Zhang1, Yudong Liang1, Yihong Gong1 
TL;DR: A weighting function between the local energy term and the global energy term is proposed by using the local and global variances information, which enables the model to select the weights adaptively in segmenting images with intensity inhomogeneity.

Journal ArticleDOI
TL;DR: This paper presents two case studies in which probability distributions, instead of individual numbers, are inferred from data to describe quantities such as maximal current densities, and shows how changes in these probability distributions across data sets offer insight into which currents cause beat-to-beat variability in canine APs.

Posted Content
TL;DR: A new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications, based on entropic regularization of the primal OT problem, which results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence.
Abstract: Optimal transport (OT) defines a powerful framework to compare probability distributions in a geometrically faithful way. However, the practical impact of OT is still limited because of its computational burden. We propose a new class of stochastic optimization algorithms to cope with large-scale problems routinely encountered in machine learning applications. These methods are able to manipulate arbitrary distributions (either discrete or continuous) by simply requiring to be able to draw samples from them, which is the typical setup in high-dimensional learning problems. This alleviates the need to discretize these densities, while giving access to provably convergent methods that output the correct distance without discretization error. These algorithms rely on two main ideas: (a) the dual OT problem can be re-cast as the maximization of an expectation ; (b) entropic regularization of the primal OT problem results in a smooth dual optimization optimization which can be addressed with algorithms that have a provably faster convergence. We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS). This is currently the only known method to solve this problem, apart from computing OT on finite samples. We backup these claims on a set of discrete, semi-discrete and continuous benchmark problems.