Showing papers on "Probability distribution published in 2013"

PDF

Open Access

Posted Content•

Estimating Continuous Distributions in Bayesian Classifiers

[...]

George H. John¹, Pat Langley¹•Institutions (1)

20 Feb 2013-arXiv: Learning

TL;DR: This paper abandon the normality assumption and instead use statistical methods for nonparametric density estimation for kernel estimation, which suggests that kernel estimation is a useful tool for learning Bayesian models.

...read moreread less

Abstract: When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

...read moreread less

3,071 citations

Book•

Reliability Evaluation of Engineering Systems: Concepts and Techniques

[...]

Roy Billinton, Ronald N. Allan

09 Jun 2013

TL;DR: This chapter discusses the application of the Binomial Distribution to network Modelling and Evaluation of Simple Systems and System Reliability Evaluation Using Probability Distributions.

...read moreread less

Abstract: Introduction. Basic Probability Theory. Application of the Binomial Distribution. Network Modelling and Evaluation of Simple Systems. Network Modelling and Evaluation of Complex Systems. Probability Distributions in Reliability Evaluation. System Reliability Evaluation Using Probability Distributions. Monte Carlo Simulation. Epilogue.

...read moreread less

1,062 citations

Journal Article•DOI•

Distribution of the ratio of consecutive level spacings in random matrix ensembles.

[...]

Y. Y. Atas¹, E. Bogomolny¹, Olivier Giraud¹, Guillaume Roux¹•Institutions (1)

University of Paris-Sud¹

21 Feb 2013-Physical Review Letters

TL;DR: The authors' Wigner-like surmises are shown to be very accurate when compared to numerics and exact calculations in the large matrix size limit, and quantitative improvements are found through a polynomial expansion.

...read moreread less

Abstract: We derive expressions for the probability distribution of the ratio of two consecutive level spacings for the classical ensembles of random matrices. This ratio distribution was recently introduced to study spectral properties of many-body problems, as, contrary to the standard level spacing distributions, it does not depend on the local density of states. Our Wigner-like surmises are shown to be very accurate when compared to numerics and exact calculations in the large matrix size limit. Quantitative improvements are found through a polynomial expansion. Examples from a quantum many-body lattice model and from zeros of the Riemann zeta function are presented.

...read moreread less

705 citations

Journal Article•DOI•

The Computational Complexity of Linear Optics

[...]

Scott Aaronson, Alex Arkhipov

09 Feb 2013-Theory of Computing

TL;DR: In this paper, it was shown that even an approximate or noisy classical simulation would already imply a collapse of the polynomial hierarchy, and hence the hierarchy collapses to the third level.

...read moreread less

Abstract: We give new evidence that quantum computers -- moreover, rudimentary quantum computers built entirely out of linear-optical elements -- cannot be efficiently simulated by classical computers. In particular, we define a model of computation in which identical photons are generated, sent through a linear-optical network, then nonadaptively measured to count the number of photons in each mode. This model is not known or believed to be universal for quantum computation, and indeed, we discuss the prospects for realizing the model using current technology. On the other hand, we prove that the model is able to solve sampling problems and search problems that are classically intractable under plausible assumptions. Our first result says that, if there exists a polynomial-time classical algorithm that samples from the same probability distribution as a linear-optical network, then P^#P=BPP^NP, and hence the polynomial hierarchy collapses to the third level. Unfortunately, this result assumes an extremely accurate simulation. Our main result suggests that even an approximate or noisy classical simulation would already imply a collapse of the polynomial hierarchy. For this, we need two unproven conjectures: the "Permanent-of-Gaussians Conjecture", which says that it is #P-hard to approximate the permanent of a matrix A of independent N(0,1) Gaussian entries, with high probability over A; and the "Permanent Anti-Concentration Conjecture", which says that |Per(A)|>=sqrt(n!)/poly(n) with high probability over A. We present evidence for these conjectures, both of which seem interesting even apart from our application. This paper does not assume knowledge of quantum optics. Indeed, part of its goal is to develop the beautiful theory of noninteracting bosons underlying our model, and its connection to the permanent function, in a self-contained way accessible to theoretical computer scientists.

...read moreread less

619 citations

Journal Article•DOI•

MCMC Methods for Functions: Modifying Old Algorithms to Make Them Faster

[...]

Simon L. Cotter, Gareth O. Roberts, Andrew M. Stuart, David White

28 Aug 2013-Statistical Science

TL;DR: An approach to modifying a whole range of MCMC methods, applicable whenever the target measure has density with respect to a Gaussian process or Gaussian random field reference measure, which ensures that their speed of convergence is robust under mesh refinement.

...read moreread less

Abstract: Many problems arising in applications result in the need to probe a probability distribution for functions. Examples include Bayesian nonparametric statistics and conditioned diffusion processes. Standard MCMC algorithms typically become arbitrarily slow under the mesh refinement dictated by nonparametric description of the unknown function. We describe an approach to modifying a whole range of MCMC methods, applicable whenever the target measure has density with respect to a Gaussian process or Gaussian random field reference measure, which ensures that their speed of convergence is robust under mesh refinement. Gaussian processes or random fields are fields whose marginal distributions, when evaluated at any finite set of NNpoints, are ℝ^N-valued Gaussians. The algorithmic approach that we describe is applicable not only when the desired probability measure has density with respect to a Gaussian process or Gaussian random field reference measure, but also to some useful non-Gaussian reference measures constructed through random truncation. In the applications of interest the data is often sparse and the prior specification is an essential part of the overall modelling strategy. These Gaussian-based reference measures are a very flexible modelling tool, finding wide-ranging application. Examples are shown in density estimation, data assimilation in fluid mechanics, subsurface geophysics and image registration. The key design principle is to formulate the MCMC method so that it is, in principle, applicable for functions; this may be achieved by use of proposals based on carefully chosen time-discretizations of stochastic dynamical systems which exactly preserve the Gaussian reference measure. Taking this approach leads to many new algorithms which can be implemented via minor modification of existing algorithms, yet which show enormous speed-up on a wide range of applied problems.

...read moreread less

553 citations

Journal Article•DOI•

Equivalence of distance-based and RKHS-based statistics in hypothesis testing

[...]

Dino Sejdinovic¹, Bharath K. Sriperumbudur, Arthur Gretton², Kenji Fukumizu•Institutions (2)

University College London¹, Max Planck Society²

01 Oct 2013-Annals of Statistics

TL;DR: In this paper, a unifying framework linking two classes of statistics used in two-sample and independence testing is presented, namely, the energy distance and distance covariances from the statistics literature; and the maximum mean discrepancy (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces.

...read moreread less

Abstract: We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, maximum mean discrepancies (MMD), that is, distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with a semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to distance covariance using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests: we show in particular that the energy distance most commonly employed in statistics is just one member of a parametric family of kernels, and that other choices from this family can yield more powerful tests.

...read moreread less

342 citations

MCMC Methods for Functions: ModifyingOld Algorithms to Make Them Faster

[...]

Simon L. Cotter, Gareth O. Roberts, Andrew M. Stuart, David White

01 Jan 2013

TL;DR: In this paper, the authors describe an approach to modify a whole range of MCMC methods, applicable whenever the target measure has density with respect to a Gaussian process or Gaussian random field reference measure, which ensures that their speed of convergence is robust under mesh refinement.

...read moreread less

Abstract: Many problems arising in applications result in the need to probe a probability distribution for functions. Examples include Bayesian nonparametric statistics and conditioned diffusion processes. Standard MCMC algorithms typically become arbitrarily slow under the mesh refinement dictated by nonparametric description of the un- known function. We describe an approach to modifying a whole range of MCMC methods, applicable whenever the target measure has density with respect to a Gaussian process or Gaussian random field reference measure, which ensures that their speed of convergence is robust under mesh refinement. Gaussian processes or random fields are fields whose marginal distri- butions, when evaluated at any finite set of N points, are RN-valued Gaussians. The algorithmic approach that we describe is applicable not only when the desired probability measure has density with respect to a Gaussian process or Gaussian random field reference measure, but also to some useful non-Gaussian reference measures constructed through random truncation. In the applications of interest the data is often sparse and the prior specification is an essential part of the over- all modelling strategy. These Gaussian-based reference measures are a very flexible modelling tool, finding wide-ranging application. Examples are shown in density estimation, data assimilation in fluid mechanics, subsurface geophysics and image registration. The key design principle is to formulate the MCMC method so that it is, in principle, applicable for functions; this may be achieved by use of proposals based on carefully chosen time-discretizations of stochas- tic dynamical systems which exactly preserve the Gaussian reference measure. Taking this approach leads to many new algorithms which can be implemented via minor modification of existing algorithms, yet which show enormous speed-up on a wide range of applied problems.

...read moreread less

340 citations

Journal Article•DOI•

Connectivity Modeling System: A probabilistic modeling tool for the multi-scale tracking of biotic and abiotic variability in the ocean

[...]

Claire B. Paris¹, Judith Helgers¹, Erik van Sebille¹, Ashwanth Srinivasan¹•Institutions (1)

University of Miami¹

01 Apr 2013-Environmental Modelling and Software

TL;DR: The Connectivity Modeling System is described, a probabilistic, multi-scale model that provides Lagrangian descriptions of oceanic phenomena and can be used in a broad range of oceanographic applications, from the fate of pollutants to the pathways of water masses in the global ocean.

...read moreread less

Abstract: Pelagic organisms' movement and motion of buoyant particles are driven by processes operating across multiple, spatial and temporal scales. We developed a probabilistic, multi-scale model, the Connectivity Modeling System (CMS), to gain a mechanistic understanding of dispersion and migration processes in the ocean. The model couples offline a new nested-grid technique to a stochastic Lagrangian framework where individual variability is introduced by drawing particles' attributes at random from specified probability distributions of traits. This allows 1) to track seamlessly a large number of both actively swimming and inertial particles over multiple, independent ocean model domains and 2) to generate ensemble forecasts or hindcasts of the particles' three dimensional trajectories, dispersal kernels, and transition probability matrices used for connectivity estimates. In addition, CMS provides Lagrangian descriptions of oceanic phenomena (advection, dispersion, retention) and can be used in a broad range of oceanographic applications, from the fate of pollutants to the pathways of water masses in the global ocean. Here we describe the CMS modular system where particle behavior can be augmented with specific features, and a parallel module implementation simplifies data management and CPU intensive computations associated with solving for the tracking of millions of active particles. Some novel features include on-the-fly data access of operational hydrodynamic models, individual particle variability and inertial motion, and multi-nesting capabilities to optimize resolution. We demonstrate the performance of the interpolation algorithm by testing accuracy in tracing the flow stream lines in both time and space and the efficacy of probabilistic modeling in evaluating the bio-physical coupling against empirical data. Finally, following recommended practices for the development of community models, we provide an open source code with a series of coupled standalone, optional modules detailed in a user's guide.

...read moreread less

281 citations

Book Chapter•DOI•

The Gamma Function

[...]

Willi Freeden¹, Martin Gutting¹•Institutions (1)

Kaiserslautern University of Technology¹

01 Jan 2013

TL;DR: The Gamma function as discussed by the authors is a generalized factorial function that can be used to estimate the probability distribution of a probability distribution, and it has been used in many applications, e.g., as part of probability distributions.

...read moreread less

Abstract: In what follows, we introduce the classical Gamma function in Sect. 2.1. It is essentially understood to be a generalized factorial. However, there are many further applications, e.g., as part of probability distributions (see, e.g., Evans et al. 2000). The main properties of the Gamma function are explained in this chapter (for a more detailed discussion the reader is referred to, e.g., Artin (1964), Lebedev (1973), Muller (1998), Nielsen (1906), and Whittaker and Watson (1948) and the references therein).

...read moreread less

267 citations

Journal Article•DOI•

Impact of Pointing Errors on the Performance of Mixed RF/FSO Dual-Hop Transmission Systems

[...]

Imran Shafique Ansari¹, Ferkan Yilmaz², Mohamed-Slim Alouini¹•Institutions (2)

King Abdullah University of Science and Technology¹, Vodafone²

01 May 2013-IEEE Wireless Communications Letters

TL;DR: In this paper, the performance analysis of a dual-hop relay transmission system composed of asymmetric radio-frequency (RF)/free-space optical (FSO) links with pointing errors is presented.

...read moreread less

Abstract: In this work, the performance analysis of a dual-hop relay transmission system composed of asymmetric radio-frequency (RF)/free-space optical (FSO) links with pointing errors is presented. More specifically, we build on the system model presented in to derive new exact closed-form expressions for the cumulative distribution function, probability density function, moment generating function, and moments of the end-to-end signal-to-noise ratio in terms of the Meijer's G function. We then capitalize on these results to offer new exact closed-form expressions for the higher-order amount of fading, average error rate for binary and M-ary modulation schemes, and the ergodic capacity, all in terms of Meijer's G functions. Our new analytical results were also verified via computer-based Monte-Carlo simulation results.

...read moreread less

253 citations

Journal Article•DOI•

Probabilistic Load Flow Method Based on Nataf Transformation and Latin Hypercube Sampling

[...]

Yan Chen¹, Jinyu Wen¹, Shijie Cheng¹•Institutions (1)

Huazhong University of Science and Technology¹

01 Apr 2013-IEEE Transactions on Sustainable Energy

TL;DR: The main advantage of the proposed probabilistic load flow method is that high accurate solution can be obtained with less computation, and it is almost unconstrained for the probability distributions of the input random variables.

...read moreread less

Abstract: This paper proposed a probabilistic load flow method that can address the correlated power sources and loads. The proposed probabilistic load flow method is based on the Nataf transformation and the Latin Hypercube Sampling. The main advantage of the proposed method is that high accurate solution can be obtained with less computation. Also, it is almost unconstrained for the probability distributions of the input random variables. Considering the uncertainties of correlated wind power, solar energy and loads, the effectiveness and the accuracy of the proposed probabilistic load flow method are verified by the comparative tests in a modified IEEE 14-bus system and a modified IEEE 118-bus system.

...read moreread less

Journal Article•DOI•

Maximum Likelihood Estimation from Uncertain Data in the Belief Function Framework

[...]

T. Denoeux

01 Jan 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes a variant of the EM algorithm that iteratively maximizes the maximization of a generalized likelihood criterion, which can be interpreted as a degree of agreement between the statistical model and the uncertain observations.

...read moreread less

Abstract: We consider the problem of parameter estimation in statistical models in the case where data are uncertain and represented as belief functions. The proposed method is based on the maximization of a generalized likelihood criterion, which can be interpreted as a degree of agreement between the statistical model and the uncertain observations. We propose a variant of the EM algorithm that iteratively maximizes this criterion. As an illustration, the method is applied to uncertain data clustering using finite mixture models, in the cases of categorical and continuous attributes.

...read moreread less

Book•

Quantitative Methods for Business

[...]

David R. Anderson, Dennis J. Sweeney, Thomas Williams

01 Jan 2013

TL;DR: This revision of QUANTITATIVE METHODS for Business provides students with a conceptual understanding of the role that quantitative methods play in the decision-making process and motivates students by using examples that illustrate situations in which quantitative methods are useful in decision making.

...read moreread less

Abstract: Preface. 1. Introduction. 2. Introduction to Probability. 3. Probability Distributions. 4. Decision Analysis. 5. Utility and Game Theory. 6. Time Series Analysis and Forecasting. 7. Introduction to Linear Programming. 8. Linear Programming: Sensitivity Analysis and Interpretation of Solution. 9. Linear Programming Applications in Marketing, Finance, and Operations Management. 10. Distribution and Network Models. 11. Integer Linear Programming. 12. Advanced Optimization Applications. 13. Project Scheduling: PERT/CPM. 14. Inventory Models. 15. Waiting Line Models. 16. Simulation. 17. Markov Processes. Appendix A: Building Spreadsheet Models. Appendix B: Binomial Probabilities. Appendix C: Poisson Probabilities. Appendix D: Areas for the Standard Normal Distribution. Appendix E: Values for e-?. Appendix F: References and Bibliography. Appendix G: Self-Test Solutions and Answers to Even-Numbered Problems.

...read moreread less

Journal Article•DOI•

Information-Theoretically Optimal Compressed Sensing via Spatial Coupling and Approximate Message Passing

[...]

David L. Donoho¹, Adel Javanmard¹, Andrea Montanari¹•Institutions (1)

Stanford University¹

01 Nov 2013-IEEE Transactions on Information Theory

TL;DR: An approximate message passing (AMP) algorithm is used and a rigorous proof is given that this approach is successful as soon as the undersampling rate δ exceeds the (upper) Rényi information dimension of the signal, d̅(pX).

...read moreread less

Abstract: We study the compressed sensing reconstruction problem for a broad class of random, band-diagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding theory. As demonstrated heuristically and numerically by Krzakala [30], message passing algorithms can effectively solve the reconstruction problem for spatially coupled measurements with undersampling rates close to the fraction of nonzero coordinates. We use an approximate message passing (AMP) algorithm and analyze it through the state evolution method. We give a rigorous proof that this approach is successful as soon as the undersampling rate δ exceeds the (upper) Renyi information dimension of the signal, d(pX). More precisely, for a sequence of signals of diverging dimension n whose empirical distribution converges to pX, reconstruction is with high probability successful from d(pX) n+o(n) measurements taken according to a band diagonal matrix. For sparse signals, i.e., sequences of dimension n and k(n) nonzero entries, this implies reconstruction from k(n)+o(n) measurements. For “discrete” signals, i.e., signals whose coordinates take a fixed finite set of values, this implies reconstruction from o(n) measurements. The result is robust with respect to noise, does not apply uniquely to random signals, but requires the knowledge of the empirical distribution of the signal pX.

...read moreread less

Posted Content•

Relational Bayesian Networks

[...]

Manfred Jaeger¹•Institutions (1)

Stanford University¹

06 Feb 2013-arXiv: Artificial Intelligence

TL;DR: In this article, a new method is developed to represent probabilistic relations on multiple random events, where a probability distribution over the relations is directly represented by a Bayesian network.

...read moreread less

Abstract: A new method is developed to represent probabilistic relations on multiple random events. Where previously knowledge bases containing probabilistic rules were used for this purpose, here a probability distribution over the relations is directly represented by a Bayesian network. By using a powerful way of specifying conditional probability distributions in these networks, the resulting formalism is more expressive than the previous ones. Particularly, it provides for constraints on equalities of events, and it allows to define complex, nested combination functions.

...read moreread less

Journal Article•DOI•

A probabilistic approach to modelling uncertain logical arguments

[...]

Anthony Hunter¹•Institutions (1)

University College London¹

01 Jan 2013-International Journal of Approximate Reasoning

TL;DR: This paper considers logic-based argumentation with uncertain arguments by considering models of the language, which can be used to give a probability distribution over arguments that are constructed using classical logic, and shows how this formalization of uncertainty of logical arguments relates to uncertainty of abstract arguments.

...read moreread less

Journal Article•DOI•

Structural reliability analysis based on the concepts of entropy, fractional moment and dimensional reduction method

[...]

Xufang Zhang¹, Mahesh D. Pandey¹•Institutions (1)

University of Waterloo¹

01 Jul 2013-Structural Safety

TL;DR: In this paper, the authors proposed a new method to derive the probability distribution of a function of random variables representing the structural response, based on the maximum entropy principle in which constraints are specified in terms of the fractional moments, in place of commonly used integer moments.

...read moreread less

Journal Article•DOI•

A Versatile Probability Distribution Model for Wind Power Forecast Errors and Its Application in Economic Dispatch

[...]

Zhao-Sui Zhang¹, Yuanzhang Sun¹, David Wenzhong Gao², Jin Lin¹, Lin Cheng¹ - Show less +1 more•Institutions (2)

Tsinghua University¹, University of Denver²

18 Mar 2013-IEEE Transactions on Power Systems

TL;DR: In this paper, a probability distribution model named "versatile distribution" is formulated and developed along with its properties and applications, which can well represent forecast errors for all forecast timescales and magnitudes.

...read moreread less

Abstract: The existence of wind power forecast errors is one of the most challenging issues for wind power system operation. It is difficult to find a reasonable method for the representation of forecast errors and apply it in scheduling. In this paper, a probability distribution model named “versatile distribution” is formulated and developed along with its properties and applications. The model can well represent forecast errors for all forecast timescales and magnitudes. The incorporation of the model in economic dispatch (ED) problems can simplify the wind-induced uncertainties via a few analytical terms in the problem formulation. The ED problem with wind power could hence be solved by the classical optimization methods, such as sequential linear programming which has been widely accepted by industry for solving ED problems. Discussions are also extended on the incorporation of the proposed versatile distribution into unit commitment problems. The results show that the new distribution is more effective than other commonly used distributions (i.e., Gaussian and Beta) with more accurate representation of forecast errors and better formulation and solution of ED problems.

...read moreread less

Journal Article•DOI•

Scenario-based dynamic economic emission dispatch considering load and wind power uncertainties

[...]

Jamshid Aghaei¹, Taher Niknam¹, Rasoul Azizipanah-Abarghooee¹, Jose M. Arroyo²•Institutions (2)

Shiraz University of Technology¹, University of Castilla–La Mancha²

01 May 2013-International Journal of Electrical Power & Energy Systems

TL;DR: To solve the complicated nonlinear, non-smooth, and non-differentiable SDEED, an enhanced particle swarm optimization (PSO) algorithm is applied to obtain the best solution for the corresponding scenarios to improve the quality of the solutions attained by PSO.

...read moreread less

Journal Article•DOI•

Origins of power-law degree distribution in the heterogeneity of human activity in social networks

[...]

Lev Muchnik¹, Sen Pei², Sen Pei³, Lucas C. Parra³, Saulo D. S. Reis³, Saulo D. S. Reis⁴, José S. Andrade⁴, Shlomo Havlin⁵, Hernán A. Makse⁴, Hernán A. Makse³ - Show less +6 more•Institutions (5)

Hebrew University of Jerusalem¹, Beihang University², City College of New York³, Federal University of Ceará⁴, Bar-Ilan University⁵

07 May 2013-Scientific Reports

TL;DR: This analysis indicates that heavy-tailed degree distribution is causally determined by similarly skewed distribution of human activity, which cannot be explained by interactive models, like preferential attachment, since the observed actions are not likely to be caused by interactions with other people.

...read moreread less

Abstract: The probability distribution of number of ties of an individual in a social network follows a scale-free power-law. However, how this distribution arises has not been conclusively demonstrated in direct analyses of people's actions in social networks. Here, we perform a causal inference analysis and find an underlying cause for this phenomenon. Our analysis indicates that heavy-tailed degree distribution is causally determined by similarly skewed distribution of human activity. Specifically, the degree of an individual is entirely random - following a “maximum entropy attachment” model - except for its mean value which depends deterministically on the volume of the users' activity. This relation cannot be explained by interactive models, like preferential attachment, since the observed actions are not likely to be caused by interactions with other people.

...read moreread less

Journal Article•DOI•

Risk Management and Climate Change

[...]

Howard G. Kunreuther¹, Geoffrey Heal², Myles R. Allen³, Ottmar Edenhofer⁴, Christopher B. Field⁵, Gary W. Yohe⁶ - Show less +2 more•Institutions (6)

University of Pennsylvania¹, Columbia University², University of Oxford³, Potsdam Institute for Climate Impact Research⁴, Carnegie Institution for Science⁵, Wesleyan University⁶

01 May 2013-Nature Climate Change

TL;DR: In this article, the authors highlight the importance of decision-making tools designed for situations where generally agreed-upon probability distributions are not available and stakeholders show different degrees of risk tolerance.

...read moreread less

Abstract: Climate change studies rarely yield consensus on the probability distribution of exposure, vulnerability, or possible outcomes, and therefore the evaluation of alternative policy strategies is difficult. This Perspective highlights the importance of decision-making tools designed for situations where generally agreed-upon probability distributions are not available and stakeholders show different degrees of risk tolerance.

...read moreread less

Journal Article•DOI•

An effective estimation of distribution algorithm for solving the distributed permutation flow-shop scheduling problem

[...]

Shengyao Wang¹, Ling Wang¹, Min Liu¹, Ye Xu¹•Institutions (1)

Tsinghua University¹

01 Sep 2013-International Journal of Production Economics

TL;DR: In this article, an effective estimation of distribution algorithm (EDA) is proposed to solve the distributed permutation flow shop scheduling problem (DPFSP), where the earliest completion factory rule is employed for the permutation based encoding to generate feasible schedules and calculate the schedule objective value.

...read moreread less

Journal Article•DOI•

Testing Closeness of Discrete Distributions

[...]

Tugkan Batu¹, Lance Fortnow², Ronitt Rubinfeld³, Warren D. Smith, Patrick White - Show less +1 more•Institutions (3)

London School of Economics and Political Science¹, Northwestern University², Massachusetts Institute of Technology³

01 Feb 2013-Journal of the ACM

TL;DR: In this article, the authors present an algorithm which uses sublinear in n, specifically, O(n2/3e−8/3 log n), independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than {e4/3n−1/3/32, en−1 /2/4}) or large (more than e) in e 1 distance.

...read moreread less

Abstract: Given samples from two distributions over an n-element set, we wish to test whether these distributions are statistically close. We present an algorithm which uses sublinear in n, specifically, O(n2/3e−8/3 log n), independent samples from each distribution, runs in time linear in the sample size, makes no assumptions about the structure of the distributions, and distinguishes the cases when the distance between the distributions is small (less than {e4/3n−1/3/32, en−1/2/4}) or large (more than e) in e1 distance. This result can be compared to the lower bound of Ω(n2/3e−2/3) for this problem given by Valiant [2008].Our algorithm has applications to the problem of testing whether a given Markov process is rapidly mixing. We present sublinear algorithms for several variants of this problem as well.

...read moreread less

Journal Article•DOI•

Structural reliability analysis using non-probabilistic convex model

[...]

Chen Jiang¹, R.G. Bi¹, G. Y. Lu¹, Xu Han¹•Institutions (1)

Hunan University¹

01 Feb 2013-Computer Methods in Applied Mechanics and Engineering

TL;DR: In this article, a non-probabilistic reliability model is given for structures with convex model uncertainty, which is defined as a ratio of the multidimensional volume falling into the reliability domain to the one of the whole model.

...read moreread less

Journal Article•DOI•

Recursive filtering with random parameter matrices, multiple fading measurements and correlated noises

[...]

Jun Hu¹, Jun Hu², Zidong Wang³, Zidong Wang⁴, Huijun Gao², Huijun Gao⁵ - Show less +2 more•Institutions (5)

Harbin University of Science and Technology¹, Harbin Institute of Technology², Brunel University London³, Tsinghua University⁴, King Abdulaziz University⁵

01 Nov 2013-Automatica

TL;DR: The purpose of the addressed filtering problem is to design an unbiased and recursive filter for the random parameter matrices, stochastic nonlinearity, and multiple fading measurements as well as correlated noises.

...read moreread less

Proceedings Article•

Differentially Private Feature Selection via Stability Arguments, and the Robustness of the Lasso

[...]

Abhradeep Thakurta¹, Adam Smith•Institutions (1)

Stanford University¹

13 Jun 2013

TL;DR: This work designs differentially private algorithms for statistical model selection and gives sufficient conditions for the LASSO estimator to be robust to small changes in the data set, and shows that these conditions hold with high probability under essentially the same stochastic assumptions that are used in the literature to analyze convergence of the LassO.

...read moreread less

Abstract: We design differentially private algorithms for statistical model selection. Given a data set and a large, discrete collection of “models”, each of which is a family of probability distributions, the goal is to determine the model that best “fits” the data. This is a basic problem in many areas of statistics and machine learning. We consider settings in which there is a well-defined answer, in the following sense: Suppose that there is a nonprivate model selection proceduref which is the reference to which we compare our performance. Our differentially private algorithms output the correct valuef(D) wheneverf is stable on the input data setD. We work with two notions, perturbation stability and subsampling stability. We give two classes of results: generic ones, that apply to any function with discrete output set; and specific algorithms for the problem of sparse linear regression. The algorithms we describe are efficient and in some cases match the optimal nonprivate asymptotic sample complexity. Our algorithms for sparse linear regression require analyzing the stability properties of the popular LASSO estimator. We give sufficient conditions for the LASSO estimator to be robust to small changes in the data set, and show that these conditions hold with high probability under essentially the same stochastic assumptions that are used in the literature to analyze convergence of the LASSO.

...read moreread less

Journal Article•DOI•

Clustering Uncertain Data Based on Probability Distribution Similarity

[...]

Bin Jiang¹, Jian Pei¹, Yufei Tao², Xuemin Lin³•Institutions (3)

Simon Fraser University¹, The Chinese University of Hong Kong², University of New South Wales³

01 Apr 2013-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work uses the well-known Kullback-Leibler divergence to measure similarity between uncertain objects in both the continuous and discrete cases, and integrates it into partitioning and density-based clustering methods to cluster uncertain objects.

...read moreread less

Abstract: Clustering on uncertain data, one of the essential tasks in mining uncertain data, posts significant challenges on both modeling similarity between uncertain objects and developing efficient computational methods The previous methods extend traditional partitioning clustering methods like $(k)$-means and density-based clustering methods like DBSCAN to uncertain data, thus rely on geometric distances between objects Such methods cannot handle uncertain objects that are geometrically indistinguishable, such as products with the same mean but very different variances in customer ratings Surprisingly, probability distributions, which are essential characteristics of uncertain objects, have not been considered in measuring similarity between uncertain objects In this paper, we systematically model uncertain objects in both continuous and discrete domains, where an uncertain object is modeled as a continuous and discrete random variable, respectively We use the well-known Kullback-Leibler divergence to measure similarity between uncertain objects in both the continuous and discrete cases, and integrate it into partitioning and density-based clustering methods to cluster uncertain objects Nevertheless, a naive implementation is very costly Particularly, computing exact KL divergence in the continuous case is very costly or even infeasible To tackle the problem, we estimate KL divergence in the continuous case by kernel density estimation and employ the fast Gauss transform technique to further speed up the computation Our extensive experiment results verify the effectiveness, efficiency, and scalability of our approaches

...read moreread less

Journal Article•DOI•

Full Counting Statistics in a Propagating Quantum Front and Random Matrix Spectra

[...]

Viktor Eisler¹, Zoltán Rácz²•Institutions (2)

University of Vienna¹, Eötvös Loránd University²

05 Feb 2013-Physical Review Letters

TL;DR: One-dimensional free fermions are studied with emphasis on propagating fronts emerging from a step initial condition and it is found that the full counting statistics coincide with the eigenvalue statistics of the edge spectrum of matrices from the Gaussian unitary ensemble.

...read moreread less

Abstract: One-dimensional free fermions are studied with emphasis on propagating fronts emerging from a step initial condition. The probability distribution of the number of particles at the edge of the front is determined exactly. It is found that the full counting statistics coincide with the eigenvalue statistics of the edge spectrum of matrices from the Gaussian unitary ensemble. The correspondence established between the random matrix eigenvalues and the particle positions yields the order statistics of the rightmost particles in the front and, furthermore, it implies their subdiffusive spreading.

...read moreread less

Book•

The Methods of Distances in the Theory of Probability and Statistics

[...]

Svetlozar T. Rachev, Lev B. Klebanov, Stoyan V. Stoyanov, Frank J. Fabozzi

03 Jan 2013

TL;DR: In this paper, a structural classification of probability distances and probability metrics is presented, including primary, simple and compound probability distances, and minimal and maximal distances and norms, and a structural class of probability metrics.

...read moreread less

Abstract: Main directions in the theory of probability metrics- Probability distances and probability metrics: Definitions- Primary, simple and compound probability distances, and minimal and maximal distances and norms- A structural classification of probability distances-Monge-Kantorovich mass transference problem, minimal distances and minimal norms- Quantitative relationships between minimal distances and minimal norms- K-Minimal metrics- Relations between minimal and maximal distances- Moment problems related to the theory of probability metrics: Relations between compound and primary distances- Moment distances- Uniformity in weak and vague convergence- Glivenko-Cantelli theorem and Bernstein-Kantorovich invariance principle- Stability of queueing systems-Optimal quality usage- Ideal metrics with respect to summation scheme for iid random variables- Ideal metrics and rate of convergence in the CLT for random motions- Applications of ideal metrics for sums of iid random variables to the problems of stability and approximation in risk theory- How close are the individual and collective models in risk theory?- Ideal metric with respect to maxima scheme of iid random elements- Ideal metrics and stability of characterizations of probability distributions- Positive and negative de nite kernels and their properties- Negative definite kernels and metrics: Recovering measures from potential- Statistical estimates obtained by the minimal distances method- Some statistical tests based on N-distances- Distances defined by zonoids- N-distance tests of uniformity on the hypersphere-

...read moreread less

Journal Article•DOI•

Determining the Probability of Project Cost Overruns

[...]

Peter E.D. Love¹, Xiangyu Wang¹, Chun Pong Sing¹, Robert L. K. Tiong•Institutions (1)

Curtin University¹

01 Mar 2013-Journal of Construction Engineering and Management-asce

TL;DR: The statistical characteristics of cost overruns experienced from contract award in 276 Australian construction and engineering projects were analyzed in this article, where the skewness and kurtosis values of the cost overrun are computed to determine if the empirical distribution of the data follows a normal distribution.

...read moreread less

Abstract: The statistical characteristics of cost overruns experienced from contract award in 276 Australian construction and engineering projects were analyzed The skewness and kurtosis values of the cost overruns are computed to determine if the empirical distribution of the data follows a normal distribution The Kolmogorov-Smirnov, Anderson-Darling, and chi-squared nonparametric tests are used to determine the goodness of fit of the selected probability distributions A three-parameter Frechet probability function is found to describe the behavior of cost overruns and provide the best overall distribution fit The Frechet distribution is then used to calculate the probability of a cost overrun being experienced The statistical characteristics of contract size and cost overruns were also analyzed The Cauchy (

...read moreread less

Collapse