scispace - formally typeset
Search or ask a question

Showing papers on "Probability distribution published in 2007"


Journal Article
TL;DR: In this article, the authors show how to construct a variety of "trapdoor" cryptographic tools assuming the worst-case hardness of standard lattice problems (such as approximating the length of the shortest nonzero vector to within certain polynomial factors).
Abstract: We show how to construct a variety of "trapdoor" cryptographic tools assuming the worst-case hardness of standard lattice problems (such as approximating the length of the shortest nonzero vector to within certain polynomial factors). Our contributions include a new notion of trapdoor function with preimage sampling, simple and efficient "hash-and-sign" digital signature schemes, and identity-based encryption. A core technical component of our constructions is an efficient algorithm that, given a basis of an arbitrary lattice, samples lattice points from a discrete Gaussian probability distribution whose standard deviation is essentially the length of the longest Gram-Schmidt vector of the basis. A crucial security property is that the output distribution of the algorithm is oblivious to the particular geometry of the given basis.

1,312 citations


Journal ArticleDOI
TL;DR: In this paper, a Bayesian method was proposed to account for measurement errors in linear regression of astronomical data. The method is based on deriving a likelihood function for the measured data, and focus on the case when the intrinsic distribution of the independent variables can be approximated using a mixture of Gaussian functions.
Abstract: I describe a Bayesian method to account for measurement errors in linear regression of astronomical data. The method allows for heteroscedastic and possibly correlated measurement errors and intrinsic scatter in the regression relationship. The method is based on deriving a likelihood function for the measured data, and I focus on the case when the intrinsic distribution of the independent variables can be approximated using a mixture of Gaussian functions. I generalize the method to incorporate multiple independent variables, nondetections, and selection effects (e.g., Malmquist bias). A Gibbs sampler is described for simulating random draws from the probability distribution of the parameters, given the observed data. I use simulation to compare the method with other common estimators. The simulations illustrate that the Gaussian mixture model outperforms other common estimators and can effectively give constraints on the regression parameters, even when the measurement errors dominate the observed scatter, source detection fraction is low, or the intrinsic distribution of the independent variables is not a mixture of Gaussian functions. I conclude by using this method to fit the X-ray spectral slope as a function of Eddington ratio using a sample of 39 z 0.8 radio-quiet quasars. I confirm the correlation seen by other authors between the radio-quiet quasar X-ray spectral slope and the Eddington ratio, where the X-ray spectral slope softens as the Eddington ratio increases. IDL routines are made available for performing the regression.

1,264 citations


Journal ArticleDOI
TL;DR: In this article, a closed-form cardinalized probability hypothesis density (CPHD) filter is proposed, which propagates not only the PHD but also the entire probability distribution on target number.
Abstract: The multitarget recursive Bayes nonlinear filter is the theoretically optimal approach to multisensor-multitarget detection, tracking, and identification. For applications in which this filter is appropriate, it is likely to be tractable for only a small number of targets. In earlier papers we derived closed-form equations for an approximation of this filter based on propagation of a first-order multitarget moment called the probability hypothesis density (PHD). In a recent paper, Erdinc, Willett, and Bar-Shalom argued for the need for a PHD-type filter which remains first-order in the states of individual targets, but which is higher-order in target number. In this paper we show that this is indeed possible. We derive a closed-form cardinalized PHD (CPHD) filter, which propagates not only the PHD but also the entire probability distribution on target number.

830 citations


Journal Article
TL;DR: This paper proposes a new method called importance weighted cross validation (IWCV), for which its unbiasedness even under the covariate shift is proved, and the IWCV procedure is the only one that can be applied for unbiased classification under covariates.
Abstract: A common assumption in supervised learning is that the input points in the training set follow the same probability distribution as the input points that will be given in the future test phase However, this assumption is not satisfied, for example, when the outside of the training region is extrapolated The situation where the training input points and test input points follow different distributions while the conditional distribution of output values given input points is unchanged is called the covariate shift Under the covariate shift, standard model selection techniques such as cross validation do not work as desired since its unbiasedness is no longer maintained In this paper, we propose a new method called importance weighted cross validation (IWCV), for which we prove its unbiasedness even under the covariate shift The IWCV procedure is the only one that can be applied for unbiased classification under covariate shift, whereas alternatives to IWCV exist for regression The usefulness of our proposed method is illustrated by simulations, and furthermore demonstrated in the brain-computer interface, where strong non-stationarity effects can be seen between training and test sessions

807 citations


Posted Content
TL;DR: An application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction.
Abstract: We present the nested Chinese restaurant process (nCRP), a stochastic process which assigns probability distributions to infinitely-deep, infinitely-branching trees. We show how this stochastic process can be used as a prior distribution in a Bayesian nonparametric model of document collections. Specifically, we present an application to information retrieval in which documents are modeled as paths down a random tree, and the preferential attachment dynamics of the nCRP leads to clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a posterior distribution over trees, topics and allocations of words to levels of the tree. We demonstrate this algorithm on collections of scientific abstracts from several journals. This model exemplifies a recent trend in statistical machine learning--the use of Bayesian nonparametric methods to infer distributions on flexible data structures.

580 citations


Journal ArticleDOI
TL;DR: In this article, a nine-member ensemble of hydrologic predictions was used to test and evaluate the Bayesian model averaging (BMA) scheme and the test results showed that BMA scheme has the advantage of generating more skillful and equally reliable probabilistic predictions than original ensemble.

578 citations


01 Dec 2007
TL;DR: It is shown that the breadth of the distribution and, in particular, the probability of large temperature increases are relatively insensitive to decreases in uncertainties associated with the underlying climate processes.
Abstract: Uncertainties in projections of future climate change have not lessened substantially in past decades. Both models and observations yield broad probability distributions for long-term increases in global mean temperature expected from the doubling of atmospheric carbon dioxide, with small but finite probabilities of very large increases. We show that the shape of these probability distributions is an inevitable and general consequence of the nature of the climate system, and we derive a simple analytic form for the shape that fits recent published distributions very well. We show that the breadth of the distribution and, in particular, the probability of large temperature increases are relatively insensitive to decreases in uncertainties associated with the underlying climate processes.

540 citations


Journal ArticleDOI
TL;DR: A representation space is introduced, to be called the complexity-entropy causality plane, which contains suitable functionals of the pertinent probability distribution, namely, the entropy of the system and an appropriate statistical complexity measure, respectively.
Abstract: Chaotic systems share with stochastic processes several properties that make them almost undistinguishable. In this communication we introduce a representation space, to be called the complexity-entropy causality plane. Its horizontal and vertical axis are suitable functionals of the pertinent probability distribution, namely, the entropy of the system and an appropriate statistical complexity measure, respectively. These two functionals are evaluated using the Bandt-Pompe recipe to assign a probability distribution function to the time series generated by the system. Several well-known model-generated time series, usually regarded as being of either stochastic or chaotic nature, are analyzed so as to illustrate the approach. The main achievement of this communication is the possibility of clearly distinguishing between them in our representation space, something that is rather difficult otherwise.

516 citations


Journal ArticleDOI
26 Oct 2007-Science
TL;DR: The authors show that the shape of these probability distributions is an inevitable and general consequence of the nature of the climate system, and derive a simple analytic form for the shape that fits recent published distributions very well.
Abstract: Uncertainties in projections of future climate change have not lessened substantially in past decades. Both models and observations yield broad probability distributions for long-term increases in global mean temperature expected from the doubling of atmospheric carbon dioxide, with small but finite probabilities of very large increases. We show that the shape of these probability distributions is an inevitable and general consequence of the nature of the climate system, and we derive a simple analytic form for the shape that fits recent published distributions very well. We show that the breadth of the distribution and, in particular, the probability of large temperature increases are relatively insensitive to decreases in uncertainties associated with the underlying climate processes.

507 citations


DOI
01 Jan 2007
TL;DR: This paper formulate minimal requirements that should be imposed on a scenario generation method before it can be used for solving the stochastic programming model and shows how the requirements can be tested.
Abstract: Stochastic programs can only be solved with discrete distributions of limited cardinality. Input, however, normally comes in the form of continuous distributions or large data sets. Creating a limited discrete distribution from input is called scenario generation. In this paper, we discuss how to evaluate the quality or suitability of scenario generation methods for a given stochastic programming model. We formulate minimal requirements that should be imposed on a scenario generation method before it can be used for solving the stochastic programming model. We also show how the requirements can be tested. The procedures for testing a scenario generation method is illustrated on a case from portfolio management.

500 citations


Journal ArticleDOI
TL;DR: The authors show how the possibility of ties that results from atoms in the probability distribution invalidates various familiar relations that lie at the root of copula theory in the continuous case.
Abstract: The authors review various facts about copulas linking discrete distributions. They show how the possibility of ties that results from atoms in the probability distribution invalidates various familiar relations that lie at the root of copula theory in the continuous case. They highlight some of the dangers and limitations of an undiscriminating transposition of modeling and inference practices from the continuous setting into the discrete one.

Journal ArticleDOI
TL;DR: In this article, the predictive probability density functions (PDFs) for weather quantities are represented as a weighted average of PDFs centered on the individual bias-corrected forecasts, where the weights are posterior probabilities of the models generating the forecasts and reflect the forecasts' relative contributions to predictive skill over a training period.
Abstract: Bayesian model averaging (BMA) is a statistical way of postprocessing forecast ensembles to create predictive probability density functions (PDFs) for weather quantities. It represents the predictive PDF as a weighted average of PDFs centered on the individual bias-corrected forecasts, where the weights are posterior probabilities of the models generating the forecasts and reflect the forecasts’ relative contributions to predictive skill over a training period. It was developed initially for quantities whose PDFs can be approximated by normal distributions, such as temperature and sea level pressure. BMA does not apply in its original form to precipitation, because the predictive PDF of precipitation is nonnormal in two major ways: it has a positive probability of being equal to zero, and it is skewed. In this study BMA is extended to probabilistic quantitative precipitation forecasting. The predictive PDF corresponding to one ensemble member is a mixture of a discrete component at zero and a gam...

Proceedings ArticleDOI
26 Aug 2007
TL;DR: It is shown that Toeplitz-structured matrices with entries drawn independently from the same distributions are also sufficient to recover x from y with high probability, and the performance of such matrices is compared with that of fully independent and identically distributed ones.
Abstract: The problem of recovering a sparse signal x Rn from a relatively small number of its observations of the form y = Ax Rk, where A is a known matrix and k « n, has recently received a lot of attention under the rubric of compressed sensing (CS) and has applications in many areas of signal processing such as data cmpression, image processing, dimensionality reduction, etc. Recent work has established that if A is a random matrix with entries drawn independently from certain probability distributions then exact recovery of x from these observations can be guaranteed with high probability. In this paper, we show that Toeplitz-structured matrices with entries drawn independently from the same distributions are also sufficient to recover x from y with high probability, and we compare the performance of such matrices with that of fully independent and identically distributed ones. The use of Toeplitz matrices in CS applications has several potential advantages: (i) they require the generation of only O(n) independent random variables; (ii) multiplication with Toeplitz matrices can be efficiently implemented using fast Fourier transform, resulting in faster acquisition and reconstruction algorithms; and (iii) Toeplitz-structured matrices arise naturally in certain application areas such as system identification.

Journal ArticleDOI
TL;DR: The proposed distribution turns out to be a very convenient tool for modelling cascaded Nakagami-m fading channels and analyzing the performance of digital communications systems operating over such channels.
Abstract: A generic and novel distribution, referred to as Nakagami, constructed as the product of N statistically independent, but not necessarily identically distributed, Nakagami-m random variables (RVs), is introduced and analyzed. The proposed distribution turns out to be a very convenient tool for modelling cascaded Nakagami-m fading channels and analyzing the performance of digital communications systems operating over such channels. The moments-generating, probability density, cumulative distribution, and moments functions of the N *Nakagami distribution are developed in closed form using the Meijer's G -function. Using these formulas, generic closed-form expressions for the outage probability, amount of fading, and average error probabilities for several binary and multilevel modulation signals of digital communication systems operating over the N *Nakagami fading and the additive white Gaussian noise channel are presented. Complementary numerical and computer simulation performance evaluation results verify the correctness of the proposed formulation. The suitability of the N *Nakagami fading distribution to approximate the lognormal distribution is also being investigated. Using Kolmogorov--Smirnov tests, the rate of convergence of the central limit theorem as pertaining to the multiplication of Nakagami-m RVs is quantified.

Journal Article
TL;DR: In this paper, the authors discuss how to generate random unitary matrices from the classical compact groups U(N), O(N) and USpN with probability distributions given by the respective invariant measures.
Abstract: We discuss how to generate random unitary matrices from the classical compact groups U(N), O(N) and USp(N) with probability distributions given by the respective invariant measures. The algorithm is straightforward to implement using standard linear algebra packages. This approach extends to the Dyson circular ensembles too. This article is based on a lecture given by the author at the summer school on Number Theory and Random Matrix Theory held at the University of Rochester in June 2006. The exposition is addressed to a general mathematical audience.

Journal ArticleDOI
TL;DR: This paper proposes a new method that utilizes the projection method twice-once to constrain the entire distribution and once to Constrain the statistics of the distribution, and illustrates these algorithms in a tracking system that uses unit quaternions to encode orientation.
Abstract: The state space description of some physical systems possess nonlinear equality constraints between some state variables. In this paper, we consider the problem of applying a Kalman filter-type estimator in the presence of such constraints. We categorize previous approaches into pseudo-observation and projection methods and identify two types of constraints-those that act on the entire distribution and those that act on the mean of the distribution. We argue that the pseudo-observation approach enforces neither type of constraint and that the projection method enforces the first type of constraint only. We propose a new method that utilizes the projection method twice-once to constrain the entire distribution and once to constrain the statistics of the distribution. We illustrate these algorithms in a tracking system that uses unit quaternions to encode orientation

Journal ArticleDOI
TL;DR: In this paper, the authors demonstrate the feasibility of fitting cell-by-cell probability distributions to grids of monthly interpolated, continent-wide data and provide a foundation for use of the gamma distribution to generate drivers for various rain-related models.
Abstract: Evaluating a range of scenarios that accurately reflect precipitation variability is critical for water resource applications. Inputs to these applications can be provided using location- and interval-specific probability distributions. These distributions make it possible to estimate the likelihood of rainfall being within a specified range. In this paper, we demonstrate the feasibility of fitting cell-by-cell probability distributions to grids of monthly interpolated, continent-wide data. Future work will then detail applications of these grids to improved satellite-remote sensing of drought and interpretations of probabilistic climate outlook forum forecasts. The gamma distribution is well suited to these applications because it is fairly familiar to African scientists, and capable of representing a variety of distribution shapes. This study tests the goodness-of-fit using the Kolmogorov–Smirnov (KS) test, and compares these results against another distribution commonly used in rainfall events, the Weibull. The gamma distribution is suitable for roughly 98% of the locations over all months. The techniques and results presented in this study provide a foundation for use of the gamma distribution to generate drivers for various rain-related models. These models are used as decision support tools for the management of water and agricultural resources as well as food reserves by providing decision makers with ways to evaluate the likelihood of various rainfall accumulations and assess different scenarios in Africa. Copyright © 2006 Royal Meteorological Society

Journal ArticleDOI
TL;DR: It is found that multipeaked probability distributions, similar to the distributions found in (driven) stochastically resonant systems, are found in autonomous chaotic systems.
Abstract: We determine probabilities of recurrence time into finite-sized, physically meaningful subsets of phase space. We consider three different autonomous chaotic systems: (i) scattering in a three-peaked potential, (ii) connected billiards, and (iii) Lorenz equations. We find multipeaked probability distributions, similar to the distributions found in (driven) stochastically resonant systems. In nondriven systems, such as ours, only monotonic decaying distributions (exponentials, stretched exponentials, power laws, and slight variations or combinations of these) have hitherto been reported. Discrete peaks in autonomous systems have as yet escaped attention in autonomous systems and correspond to specific trajectory subsets involving an integer number of loops.

Posted Content
TL;DR: In this article, the authors show how to construct a variety of "trapdoor" cryptographic tools assuming the worst-case hardness of standard lattice problems (such as approximating the length of the shortest nonzero vector to within certain polynomial factors).
Abstract: We show how to construct a variety of "trapdoor" cryptographic tools assuming the worst-case hardness of standard lattice problems (such as approximating the length of the shortest nonzero vector to within certain polynomial factors). Our contributions include a new notion of trapdoor function with preimage sampling, simple and efficient "hash-and-sign" digital signature schemes, and identity-based encryption. A core technical component of our constructions is an efficient algorithm that, given a basis of an arbitrary lattice, samples lattice points from a discrete Gaussian probability distribution whose standard deviation is essentially the length of the longest Gram-Schmidt vector of the basis. A crucial security property is that the output distribution of the algorithm is oblivious to the particular geometry of the given basis.

Journal ArticleDOI
TL;DR: An integrated platform for multi-sensor equipment diagnosis and prognosis based on hidden semi-Markov model (HSMM), which shows that the increase of correct diagnostic rate is indeed very promising and the equipment prognosis can be implemented in the same integrated framework.

Journal ArticleDOI
TL;DR: In this article, the authors consider the problem of finding optimal portfolios in cases when the underlying probability model is not perfectly known and apply a maximin approach which uses a "confidence set" for the probability distribution.
Abstract: In this paper, we consider the problem of finding optimal portfolios in cases when the underlying probability model is not perfectly known. For the sake of robustness, a maximin approach is applied which uses a ‘confidence set’ for the probability distribution. The approach shows the tradeoff between return, risk and robustness in view of the model ambiguity. As a consequence, a monetary value of information in the model can be determined.

Journal ArticleDOI
TL;DR: A novel statistical model for diffusion-weighted MR signal attenuation which postulates that the water molecule diffusion can be characterized by a continuous mixture of diffusion tensors, and an efficient scheme for estimating the water molecules displacement probability functions on a voxel-by-voxel basis is presented.

Journal ArticleDOI
TL;DR: This work considers the problem of scheduling under uncertainty where the uncertain problem parameters can be described by a known probability distribution function and introduces a small number of auxiliary variables and additional constraints into the original MILP problem, generating a deterministic robust counterpart problem which provides the optimal/feasible solution.

Journal ArticleDOI
TL;DR: This paper proposes Scalable Bloom Filters, a variant of Bloom filters that can adapt dynamically to the number of elements stored, while assuring a maximum false positive probability.

Journal ArticleDOI
TL;DR: In this paper, the authors studied the convergence rate of the posterior distribution for Bayesian density estimation with Dirichlet mixtures of normal distributions as the prior and derived a new general rate theorem by considering a countable covering of the parameter space whose prior probabilities satisfy a summability condition.
Abstract: We study the rates of convergence of the posterior distribution for Bayesian density estimation with Dirichlet mixtures of normal distributions as the prior. The true density is assumed to be twice continuously differentiable. The bandwidth is given a sequence of priors which is obtained by scaling a single prior by an appropriate order. In order to handle this problem, we derive a new general rate theorem by considering a countable covering of the parameter space whose prior probabilities satisfy a summability condition together with certain individual bounds on the Hellinger metric entropy. We apply this new general theorem on posterior convergence rates by computing bounds for Hellinger (bracketing) entropy numbers for the involved class of densities, the error in the approximation of a smooth density by normal mixtures and the concentration rate of the prior. The best obtainable rate of convergence of the posterior turns out to be equivalent to the well-known frequentist rate for integrated mean squared error n -2/5 up to a logarithmic factor.

Journal ArticleDOI
TL;DR: In this article, a dimensionless reduced time delay was proposed to examine the dependence of time delays on the complexity of lens potentials, such as higher order perturbations, nonisothermality, and substructures.
Abstract: Time delays between lensed multiple images have been known to provide an interesting probe of the Hubble constant, but such an application is often limited by degeneracies with the shape of lens potentials. We propose a new statistical approach to examine the dependence of time delays on the complexity of lens potentials, such as higher order perturbations, nonisothermality, and substructures. Specifically, we introduce a dimensionless reduced time delay and explore its behavior analytically and numerically as a function of the image configuration, which is characterized by the asymmetry and opening angle of the image pair. In particular, we derive a realistic conditional probability distribution for a given image configuration from Monte Carlo simulations. We find that the probability distribution is sensitive to the image configuration such that more symmetric and/or smaller opening-angle image pairs are more easily affected by perturbations on the primary lens potential. On average time delays of double lenses are less scattered than those of quadruple lenses. Furthermore, the realistic conditional distribution allows a new statistical method to constrain the Hubble constant from observed time delays. We find that 16 published time delay quasars constrain H0 to be 70 ± 6 km s-1 Mpc-1, where the value and its error are estimated using jackknife resampling. Systematic errors coming from the heterogeneous nature of the quasar sample and the uncertainty of the input distribution of lens potentials can be larger than the statistical error. After including rough estimates of important systematic errors, we find H0 = 68 ± 6(stat.) ± 8(syst.) km s-1 Mpc-1. The reasonable agreement of the value of the Hubble constant with other estimates indicates the usefulness of our new approach as a cosmological and astrophysical probe, particularly in the era of large-scale synoptic surveys.

Journal ArticleDOI
TL;DR: This paper explores the use of three methods of parameter and predictive uncertainty analysis, and compares their performance when used in conjunction with a lumped parameter model for surface water flow (HSPF) in a large watershed.
Abstract: Where numerical models are employed as an aid to environmental management, the uncertainty associated with predictions made by such models must be assessed. A number of different methods are available to make such an assessment. This paper explores the use of three such methods, and compares their performance when used in conjunction with a lumped parameter model for surface water flow (HSPF) in a large watershed. Linear (or first-order) uncertainty analysis has the advantage that it can be implemented with virtually no computational burden. While the results of such an analysis can be extremely useful for assessing parameter uncertainty in a relative sense, and ascertaining the degree of correlation between model parameters, its use in analyzing predictive uncertainty is often limited. Markov Chain Monte Carlo (MCMC) methods are far more robust, and can produce reliable estimates of parameter and predictive uncertainty. As well as this, they can provide the modeler with valuable qualitative information on the shape of parameter and predictive probability distributions; these shapes can be quite complex, especially where local objective function optima lie within those parts of parameter space that are considered probable after calibration has been undertaken. Nonlinear calibration-constrained optimization can also provide good estimates of parameter and predictive uncertainty, even in situations where the objective function surface is complex. Furthermore, they can achieve these estimates using far fewer model runs than MCMC methods. However, they do not provide the same amount of qualitative information on the probability structure of parameter space as do MCMC methods, a situation that can be partially rectified by combining their use with an efficient gradient-based search method that is specifically designed to locate different local optima. All methods of parameter and predictive uncertainty analysis discussed herein are implemented using freely-available software. Hence similar studies, or extensions of the present study, can be easily undertaken in other modeling contexts by other modelers.

Proceedings ArticleDOI
27 Mar 2007
TL;DR: A novel algorithm for biological sequence compression that makes use of both statistical properties and repetition within sequences that outperforms existing compressors on typical DNA and protein sequence datasets while maintaining a practical running time is introduced.
Abstract: This paper introduces a novel algorithm for biological sequence compression that makes use of both statistical properties and repetition within sequences. A panel of experts is maintained to estimate the probability distribution of the next symbol in the sequence to be encoded. Expert probabilities are combined to obtain the final distribution. The resulting information sequence provides insight for further study of the biological sequence. Each symbol is then encoded by arithmetic coding. Experiments show that our algorithm outperforms existing compressors on typical DNA and protein sequence datasets while maintaining a practical running time

Journal ArticleDOI
TL;DR: In this article, a Bayesian nonparametric approach is used to evaluate the probability of discovering a certain number of new species in a new sample of population units, conditional on the number of species recorded in a basic sample.
Abstract: SUMMARY We consider the problem of evaluating the probability of discovering a certain number of new species in a new sample of population units, conditional on the number of species recorded in a basic sample. We use a Bayesian nonparametric approach. The different species proportions are assumed to be random and the observations from the population exchangeable. We provide a Bayesian estimator, under quadratic loss, for the probability of discovering new species which can be compared with well-known frequentist estimators. The results we obtain are illustrated through a numerical example and an application to a genomic dataset concerning the discovery of new genes by sequencing additional single-read sequences of cDNA fragments.

Journal ArticleDOI
TL;DR: In this paper, the authors provide an overview of computationally efficient approaches for quantifying the influence of parameter uncertainties on the states and outputs of nonlinear dynamical systems with finite-time control trajectories, focusing primarily on computing probability distributions.