scispace - formally typeset
Search or ask a question

Showing papers on "Probability distribution published in 2006"


Journal ArticleDOI
10 Jul 2006
TL;DR: A novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by the experiments.
Abstract: Motivation: Many problems in data integration in bioinformatics can be posed as one common question: Are two sets of observations generated by the same distribution? We propose a kernel-based statistical test for this problem, based on the fact that two distributions are different if and only if there exists at least one function having different expectation on the two distributions. Consequently we use the maximum discrepancy between function means as the basis of a test statistic. The Maximum Mean Discrepancy (MMD) can take advantage of the kernel trick, which allows us to apply it not only to vectors, but strings, sequences, graphs, and other common structured data types arising in molecular biology. Results: We study the practical feasibility of an MMD-based test on three central data integration tasks: Testing cross-platform comparability of microarray data, cancer diagnosis, and data-content based schema matching for two different protein function classification schemas. In all of these experiments, including high-dimensional ones, MMD is very accurate in finding samples that were generated from the same distribution, and outperforms its best competitors. Conclusions: We have defined a novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by our experiments. Availability: Contact: [email protected]

1,315 citations


Journal ArticleDOI
TL;DR: In this paper, the authors consider a family of diffusion maps, defined as the embedding of complex data onto a low dimensional Euclidean space via the eigenvectors of suitably defined random walks defined on the given datasets.

1,108 citations


Book
03 Mar 2006
TL;DR: This book discusses the foundations of Probability Models, computer-Based Numerical and Simulation Methods in Probability, and Elements of Quality Assurance and Acceptance Sampling.
Abstract: Chapter 1 - Role of Probability and Statistics in Engineering Chapter 2 -- Fundamentals of Probability Models Chapter 3 -- Analytical Models of Random Phenomena Chapter 4 -- Functions of Random Variables Chapter 5 - Computer-Based Numerical and Simulation Methods in Probability Chapter 6 -- Statistical Inferences from Observational Data Chapter 7 -- Determination of Probability Distribution Models Chapter 8 -- Regression and Correlation Analyses Chapter 9 -- The Bayesian Approach Chapter 10 - Elements of Quality Assurance and Acceptance Sampling (Available only online at the Wiley web site) Appendices: Table A.1 -- Standard Normal Probabilities Table A.2 - CDF of the Binomial Distribution Table A.3 - Critical Values of t Distribution at Confidence Level (1- a)=p Table A.4 - Critical Values of the c2 Distribution at Confidence Level (1-a)=pTable A.5 - Critical Values of Dna at Significance Level a in the K-S Test Table A.6 - Critical Values of the Anderson-Darling Goodness-of-fit Test (for 4 specific distributions)

909 citations


Proceedings ArticleDOI
05 May 2006
TL;DR: In this paper, the authors derived a closed-form cardinalized probability hypothesis density (CPHD) filter, which propagates not only the PHD but also the entire probability distribution on target number.
Abstract: The multitarget recursive Bayes nonlinear filter is the theoretically optimal approach to multisensor-multitarget detection, tracking, and identification. For applications in which this filter is appropriate, it is likely to be tractable for only a small number of targets. In earlier papers we derived closed-form equations for an approximation of this filter based on propagation of a first-order multitarget moment called the probability hypothesis density (PHD). In a recent paper, Erdinc, Willett, and Bar-Shalom argued for the need for a PHD-type filter which remains first-order in the states of individual targets, but which is higher-order in target number. In this paper we show that this and much more is possible. We derive a closed-form cardinalized PHD (CPHD), filter, which propagates not only the PHD but also the entire probability distribution on target number.

642 citations


Journal ArticleDOI
TL;DR: The addressed controller design problem is transformed to an auxiliary convex optimization problem, which can be solved by a linear matrix inequality (LMI) approach, and an illustrative example is provided to show the applicability of the proposed method.
Abstract: This note is concerned with a new controller design problem for networked systems with random communication delays. Two kinds of random delays are simultaneously considered: i) from the controller to the plant, and ii) from the sensor to the controller, via a limited bandwidth communication channel. The random delays are modeled as a linear function of the stochastic variable satisfying Bernoulli random binary distribution. The observer-based controller is designed to exponentially stabilize the networked system in the sense of mean square, and also achieve the prescribed H/sub /spl infin// disturbance attenuation level. The addressed controller design problem is transformed to an auxiliary convex optimization problem, which can be solved by a linear matrix inequality (LMI) approach. An illustrative example is provided to show the applicability of the proposed method.

613 citations


Journal ArticleDOI
TL;DR: ME-gPC provides an efficient and flexible approach to solving differential equations with random inputs, especially for problems related to long-term integration, large perturbation, and stochastic discontinuities.
Abstract: We develop a multi-element generalized polynomial chaos (ME-gPC) method for arbitrary probability measures and apply it to solve ordinary and partial differential equations with stochastic inputs. Given a stochastic input with an arbitrary probability measure, its random space is decomposed into smaller elements. Subsequently, in each element a new random variable with respect to a conditional probability density function (PDF) is defined, and a set of orthogonal polynomials in terms of this random variable is constructed numerically. Then, the generalized polynomial chaos (gPC) method is implemented element-by-element. Numerical experiments show that the cost for the construction of orthogonal polynomials is negligible compared to the total time cost. Efficiency and convergence of ME-gPC are studied numerically by considering some commonly used random variables. ME-gPC provides an efficient and flexible approach to solving differential equations with random inputs, especially for problems related to long-term integration, large perturbation, and stochastic discontinuities.

497 citations


Journal ArticleDOI
TL;DR: A novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods and introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees.
Abstract: Multigene sequence data have great potential for elucidating important and interesting evolutionary processes, but statistical methods for extracting information from such data remain limited. Although various biological processes may cause different genes to have different genealogical histories (and hence different tree topologies), we also may expect that the number of distinct topologies among a set of genes is relatively small compared with the number of possible topologies. Therefore evidence about the tree topology for one gene should influence our inferences of the tree topology on a different gene, but to what extent? In this paper, we present a new approach for modeling and estimating concordance among a set of gene trees given aligned molecular sequence data. Our approach introduces a one-parameter probability distribution to describe the prior distribution of concordance among gene trees. We describe a novel 2-stage Markov chain Monte Carlo (MCMC) method that first obtains independent Bayesian posterior probability distributions for individual genes using standard methods. These posterior distributions are then used as input for a second MCMC procedure that estimates a posterior distribution of gene-to-tree maps (GTMs). The posterior distribution of GTMs can then be summarized to provide revised posterior probability distributions for each gene (taking account of concordance) and to allow estimation of the proportion of the sampled genes for which any given clade is true (the sample-wide concordance factor). Further, under the assumption that the sampled genes are drawn randomly from a genome of known size, we show how one can obtain an estimate, with credibility intervals, on the proportion of the entire genome for which a clade is true (the genome-wide concordance factor). We demonstrate the method on a set of 106 genes from 8 yeast species.

485 citations


Journal ArticleDOI
11 Aug 2006
TL;DR: This work presents a new, systematic approach for analyzing network topologies, introducing the dK-series of probability distributions specifying all degree correlations within d-sized subgraphs of a given graph G, and demonstrates that these graphs reproduce, with increasing accuracy, important properties of measured and modeled Internet topologies.
Abstract: Researchers have proposed a variety of metrics to measure important graph properties, for instance, in social, biological, and computer networks. Values for a particular graph metric may capture a graph's resilience to failure or its routing efficiency. Knowledge of appropriate metric values may influence the engineering of future topologies, repair strategies in the face of failure, and understanding of fundamental properties of existing networks. Unfortunately, there are typically no algorithms to generate graphs matching one or more proposed metrics and there is little understanding of the relationships among individual metrics or their applicability to different settings. We present a new, systematic approach for analyzing network topologies. We first introduce the dK-series of probability distributions specifying all degree correlations within d-sized subgraphs of a given graph G. Increasing values of d capture progressively more properties of G at the cost of more complex representation of the probability distribution. Using this series, we can quantitatively measure the distance between two graphs and construct random graphs that accurately reproduce virtually all metrics proposed in the literature. The nature of the dK-series implies that it will also capture any future metrics that may be proposed. Using our approach, we construct graphs for d=0, 1, 2, 3 and demonstrate that these graphs reproduce, with increasing accuracy, important properties of measured and modeled Internet topologies. We find that the d=2 case is sufficient for most practical purposes, while d=3 essentially reconstructs the Internet AS-and router-level topologies exactly. We hope that a systematic method to analyze and synthesize topologies offers a significant improvement to the set of tools available to network topology and protocol researchers.

481 citations


Journal ArticleDOI
TL;DR: The robust Hinfin filtering problem is studied for stochastic uncertain discrete time-delay systems with missing measurements and filters such that, for all possible missing observations and all admissible parameter uncertainties, the filtering error system is exponentially mean-square stable.
Abstract: In this paper, the robust Hinfin filtering problem is studied for stochastic uncertain discrete time-delay systems with missing measurements. The missing measurements are described by a binary switching sequence satisfying a conditional probability distribution. We aim to design filters such that, for all possible missing observations and all admissible parameter uncertainties, the filtering error system is exponentially mean-square stable, and the prescribed Hinfin performance constraint is met. In terms of certain linear matrix inequalities (LMIs), sufficient conditions for the solvability of the addressed problem are obtained. When these LMIs are feasible, an explicit expression of a desired robust Hinfin filter is also given. An optimization problem is subsequently formulated by optimizing the Hinfin filtering performances. Finally, a numerical example is provided to demonstrate the effectiveness and applicability of the proposed design approach

474 citations


Journal ArticleDOI
TL;DR: In this paper, the authors construct a statistical theory of reactive trajectories between two pre-specified sets A and B, i.e. the portions of the path of a Markov process during which the path makes a transition from A to B. This problem is relevant e.g. in the context of metastability.
Abstract: We construct a statistical theory of reactive trajectories between two pre-specified sets A and B, i.e. the portionsof the path of a Markov process during which the path makes a transition from A to B. This problem is relevant e.g. in the context of metastability, in which case the two sets A and B are metastable sets, though the formalism we propose is independent of any such assumptions on A and B. We show that various probability distributions on the reactive trajectories can be expressed in terms of the equilibrium distribution of the process and the so-called committor functions which give the probability that the process reaches first B before reaching A, either backward or forward in time. Using these objects, we obtain (i) the distribution of reactive trajectories, which gives the proportion of time reactive trajectories spend in sets outside of A and B; (ii) the hitting point distribution of the reactive trajectories on a surface, which measures where the reactive trajectories hit the surface when they cross it; (iii) the last hitting point distribution of the reactive trajectories on the surface; (iv) the probability current of reactive trajectories, the integral of which on a surface gives the net average flux of reactive trajectories across this surface; (v) the average frequency of reactive trajectories, which gives the average number of transitions between A and B per unit of time; and (vi) the traffic distribution of reactive trajectories, which gives some information about the regions the reactive trajectories visit regardless of the time they spend in these regions.

454 citations


Proceedings Article
01 Jan 2006
TL;DR: The BLOG model as discussed by the authors is a formal language for defining probability models with unknown objects and identity uncertainty, and it can be used to describe a generative process in which some steps add objects to the world, and others determine attributes and relations on these objects.
Abstract: We introduce BLOG, a formal language for defining probability models with unknown objects and identity uncertainty. A BLOG model describes a generative process in which some steps add objects to the world, and others determine attributes and relations on these objects. Subject to certain acyclicity constraints, a BLOG model specifies a unique probability distribution over first-order model structures that can contain varying and unbounded numbers of objects. Furthermore, inference algorithms exist for a large class of BLOG models.

Journal ArticleDOI
TL;DR: In this article, an auxiliary variable method is presented which requires only that independent samples can be drawn from the unnormalised density at any particular parameter value, and is illustrated by producing posterior samples for parameters of the Ising model given a particular lattice realisation.
Abstract: Maximum likelihood parameter estimation and sampling from Bayesian posterior distributions are problematic when the probability density for the parameter of interest involves an intractable normalising constant which is also a function of that parameter. In this paper, an auxiliary variable method is presented which requires only that independent samples can be drawn from the unnormalised density at any particular parameter value. The proposal distribution is constructed so that the normalising constant cancels from the Metropolis-Hastings ratio. The method is illustrated by producing posterior samples for parameters of the Ising model given a particular lattice realisation.

Journal ArticleDOI
TL;DR: An overview of numerical possibility theory is proposed, showing that some notions in statistics are naturally interpreted in the language of this theory and providing a natural definition of a subjective possibility distribution that sticks to the Bayesian framework of exchangeable bets.

Journal ArticleDOI
TL;DR: It is shown that, for a wide class of probability distributions on the data, the probability constraints can be converted explicitly into convex second-order cone constraints; hence the probability-constrained linear program can be solved exactly with great efficiency.
Abstract: In this paper, we discuss linear programs in which the data that specify the constraints are subject to random uncertainty. A usual approach in this setting is to enforce the constraints up to a given level of probability. We show that, for a wide class of probability distributions (namely, radial distributions) on the data, the probability constraints can be converted explicitly into convex second-order cone constraints; hence, the probability-constrained linear program can be solved exactly with great efficiency. Next, we analyze the situation where the probability distribution of the data is not completely specified, but is only known to belong to a given class of distributions. In this case, we provide explicit convex conditions that guarantee the satisfaction of the probability constraints for any possible distribution belonging to the given class.

Journal ArticleDOI
TL;DR: It is concluded that, to be most useful, non-stationarity considerations be incorporated into new risk assessment frameworks.

Journal ArticleDOI
TL;DR: In this article, the authors studied an empirical risk minimization problem, where the goal is to obtain very general upper bounds on the excess risk of a class of measurable functions, expressed in terms of relevant geometric parameters of the class.
Abstract: Let ℱ be a class of measurable functions f:S↦[0, 1] defined on a probability space (S, $\mathcal{A}$, P). Given a sample (X1, …, Xn) of i.i.d. random variables taking values in S with common distribution P, let Pn denote the empirical measure based on (X1, …, Xn). We study an empirical risk minimization problem Pnf→min , f∈ℱ. Given a solution fn of this problem, the goal is to obtain very general upper bounds on its excess risk $$\mathcal{E}_{P}(\hat{f}_{n}):=P\hat{f}_{n}-\inf_{f\in \mathcal{F}}Pf,$$ expressed in terms of relevant geometric parameters of the class ℱ. Using concentration inequalities and other empirical processes tools, we obtain both distribution-dependent and data-dependent upper bounds on the excess risk that are of asymptotically correct order in many examples. The bounds involve localized sup-norms of empirical and Rademacher processes indexed by functions from the class. We use these bounds to develop model selection techniques in abstract risk minimization problems that can be applied to more specialized frameworks of regression and classification.

Book
19 Jun 2006
TL;DR: The StatCalc Reliability Model Fitting Methods of Estimation Inference Random Number Generation Some Special Functions and Examples are presented.
Abstract: INTRODUCTION TO STATCALC Introduction of StatCalc PRELIMINARIES Random Variables and Expectations Moments and Other Functions Some Functions Relevant to Reliability Model Fitting Methods of Estimation Inference Random Number Generation Some Special Functions DISCRETE UNIFORM DISTRIBUTION Description Moments BINOMIAL DISTRIBUTION Description Moments Computing Table Values Test for the Proportion Confidence Intervals for the Proportion A Test for the Difference between Two Proportions Fisher's Exact Test Properties and Results Random Number Generation Computation of Probabilities HYPERGEOMETRIC DISTRIBUTION Description Moments Computing Table Values Point Estimation Test for the Proportion Confidence Intervals and Sample Size Calculation A Test for the Difference between Two Proportions Properties and Results Random Number Generation Computation of Probabilities POISSON DISTRIBUTION Description Moments Computing Table Values Point Estimation Test for the Mean Confidence Intervals for the Mean Test for the Ratio of Two Means Confidence Intervals for the Ratio of Two Means A Test for the Difference between Two Means Model Fitting with Examples Properties and Results Random Number Generation Computation of Probabilities GEOMETRIC DISTRIBUTION Description Moments Computing Table Values Properties and Results Random Number Generation NEGATIVE BINOMIAL DISTRIBUTION Description Moments Computing Table Values Point Estimation A Test for the Proportion Confidence Intervals for the Proportion Properties and Results Random Number Generation A Computational Method for Probabilities LOGARITHMIC SERIES DISTRIBUTION Description Moments Computing Table Values Inferences Properties and Results Random Number Generation A Computational Algorithm for Probabilities UNIFORM DISTRIBUTION Description Moments Inferences Properties and Results Random Number Generation NORMAL DISTRIBUTION Description Moments Computing Table Values One-Sample Inference Two-Sample Inference Tolerance Intervals Properties and Results Relation to Other Distributions Random Number Generation Computing the Distribution Function CHI-SQUARE DISTRIBUTION Description Moments Computing Table Values Applications Properties and Results Random Number Generation Computing the Distribution Function F DISTRIBUTION Description Moments Computing Table Values Properties and Results Random Number Generation A Computational Method for Probabilities STUDENT'S t DISTRIBUTION Description Moments Computing Table Values Distribution of the Maximum of Several |t| Variables Properties and Results Random Number Generation A Computational Method for Probabilities EXPONENTIAL DISTRIBUTION Description Moments Computing Table Values Inferences Properties and Results Random Number Generation GAMMA DISTRIBUTION Description Moments Computing Table Values Applications with Some Examples Inferences Properties and Results Random Number Generation A Computational Method for Probabilities BETA DISTRIBUTION Description Moments Computing Table Values Inferences Applications with an Example Properties and Results Random Number Generation Evaluating the Distribution Function NONCENTRAL CHI-SQUARE DISTRIBUTION Description Moments Computing Table Values Applications Properties and Results Random Number Generation Evaluating the Distribution Function NONCENTRAL F DISTRIBUTION Description Moments Computing Table Values Applications Properties and Results Random Number Generation Evaluating the Distribution Function NONCENTRAL t DISTRIBUTION Description Moments Computing Table Values Applications Properties and Results Random Number Generation Evaluating the Distribution Function LAPLACE DISTRIBUTION Description Moments Computing Table Values Inferences Applications Relation to Other Distributions Random Number Generation LOGISTIC DISTRIBUTION Description Moments Computing Table Values Maximum Likelihood Estimators Applications Properties and Results Random Number Generation LOGNORMAL DISTRIBUTION Description Moments Computing Table Values Maximum Likelihood Estimators Confidence Interval and Test for the Mean Inferences for the Difference between Two Means Inferences for the Ratio of Two Means Applications Properties and Results Random Number Generation Computation of Probabilities and Percentiles PARETO DISTRIBUTION Description Moments Computing Table Value Inferences Applications Properties and Results Random Number Generation Computation of Probabilities and Percentiles WEIBULL DISTRIBUTION Description Moments Computing Table Values Applications Point Estimation Properties and Results Random Number Generation Computation of Probabilities and Percentiles EXTREME VALUE DISTRIBUTION Description Moments Computing Table Values Maximum Likelihood Estimators Applications Properties and Results Random Number Generation Computation of Probabilities and Percentiles CAUCHY DISTRIBUTION Description Moments Computing Table Values Inference Applications Properties and Results Random Number Generation Computation of Probabilities and Percentiles INVERSE GAUSSIAN DISTRIBUTION Description Moments Computing Table Values One-Sample Inference Two-Sample Inference Random Number Generation Computational Methods for Probabilities and Percentiles RAYLEIGH DISTRIBUTION Description Moments Computing Table Values Maximum Likelihood Estimator Relation to Other Distributions Random Number Generation BIVARIATE NORMAL DISTRIBUTION Description Computing Table Values An Example Inferences on Correlation Coefficients Inferences on the Difference between Two Correlation Coefficients Some Properties Random Number Generation A Computational Algorithm for Probabilities DISTRIBUTION OF RUNS Description Computing Table Values Examples SIGN TEST AND CONFIDENCE INTERVAL FOR THE MEDIAN Hypothesis Test for the Median Confidence Interval for the Median Computing Table Values An Example WILCOXON SIGNED-RANK TEST Description Moments and an Approximation Computing Table Values An Example WILCOXON RANK-SUM TEST Description Moments and an Approximation Mann-Whitney U Statistic Computing Table Values An Example NONPARAMETRIC TOLERANCE INTERVAL Description Computing Table Values An Example TOLERANCE FACTORS FOR A MULTIVARIATE NORMAL POPULATION Description Computing Tolerance Factors Examples DISTRIBUTION OF THE SAMPLE MULTIPLE CORRELATION COEFFICIENT Description Moments Inferences Some Results Random Number Generation A Computational Method for Probabilities Computing Table Values REFERENCES INDEX

Journal ArticleDOI
TL;DR: This work provides alternative characterizations of the IGFR property that lead to simplify verifying whether the IG FR condition holds and relates the limit of the generalized failure rate and the moments of a distribution.
Abstract: Distributions with an increasing generalized failure rate (IGFR) have useful applications in pricing and supply chain contracting problems. We provide alternative characterizations of the IGFR property that lead to simplify verifying whether the IGFR condition holds. We also relate the limit of the generalized failure rate and the moments of a distribution.

Journal ArticleDOI
TL;DR: In this paper, a two-point estimate method (2PEM) is proposed to account for uncertainties in the optimal power flow (OPF) problem in the context of competitive electricity markets, where uncertainties can be seen as a by-product of the economic pressure that forces market participants to behave in an "unpredictable" manner; hence, probability distributions of locational marginal prices are calculated as a result.
Abstract: This paper presents an application of a two-point estimate method (2PEM) to account for uncertainties in the optimal power flow (OPF) problem in the context of competitive electricity markets. These uncertainties can be seen as a by-product of the economic pressure that forces market participants to behave in an "unpredictable" manner; hence, probability distributions of locational marginal prices are calculated as a result. Instead of using computationally demanding methods, the proposed approach needs 2n runs of the deterministic OPF for n uncertain variables to get the result in terms of the first three moments of the corresponding probability density functions. Another advantage of the 2PEM is that it does not require derivatives of the nonlinear function used in the computation of the probability distributions. The proposed method is tested on a simple three-bus test system and on a more realistic 129-bus test system. Results are compared against more accurate results obtained from MCS. The proposed method demonstrates a high level of accuracy for mean values when compared to the MCS; for standard deviations, the results are better in those cases when the number of uncertain variables is relatively low or when their dispersion is not large

Journal ArticleDOI
TL;DR: The Pareto distribution is a simple model for nonnegative data with a power law probability tail as mentioned in this paper, and there is a natural upper bound that truncates the probability tail.
Abstract: The Pareto distribution is a simple model for nonnegative data with a power law probability tail. In many practical applications, there is a natural upper bound that truncates the probability tail. This article derives estimators for the truncated Pareto distribution, investigates their properties, and illustrates a way to check for fit. These methods are illustrated with applications from finance, hydrology, and atmospheric science.

Journal ArticleDOI
TL;DR: In this article, the authors studied the dynamical evolution of young groups/clusters, with N = 100-1000 members, from their embedded stage out to ages of ~10 Myr.
Abstract: This paper studies the dynamical evolution of young groups/clusters, with N = 100-1000 members, from their embedded stage out to ages of ~10 Myr. We use N-body simulations to explore how their evolution depends on the system size N and the initial conditions. Motivated by recent observations suggesting that stellar groups begin their evolution with subvirial speeds, this study compares subvirial starting states with virial starting states. Multiple realizations of equivalent cases (100 simulations per initial condition) are used to build up a robust statistical description of these systems, e.g., the probability distribution of closest approaches, the mass profiles, and the probability distribution for the radial location of cluster members. These results provide a framework from which to assess the effects of groups/clusters on the processes of star and planet formation and to study cluster evolution. The distributions of radial positions are used in conjunction with the probability distributions of the expected far-ultraviolet (FUV) luminosities (calculated here as a function of cluster size N) to determine the radiation exposure of circumstellar disks. The distributions of closest approaches are used in conjunction with scattering cross sections (calculated here as a function of stellar mass using ~105 Monte Carlo scattering experiments) to determine the probability of disruption for newly formed solar systems. We use the nearby cluster NGC 1333 as a test case in this investigation. The main conclusion of this study is that clusters in this size range have only a modest effect on forming planetary systems. The interaction rates are low, so that the typical solar system experiences a single encounter with closest approach distance b ~ 1000 AU. The radiation exposure is also low, with median FUV flux G0 ~ 900 (1.4 ergs s-1 cm-2), so that photoevaporation of circumstellar disks is only important beyond 30 AU. Given the low interaction rates and modest radiation levels, we suggest that solar system disruption is a rare event in these clusters.

Journal ArticleDOI
TL;DR: An improved version of the FastICA algorithm is proposed which is asymptotically efficient, i.e., its accuracy given by the residual error variance attains the Cramer-Rao lower bound (CRB).
Abstract: FastICA is one of the most popular algorithms for independent component analysis (ICA), demixing a set of statistically independent sources that have been mixed linearly. A key question is how accurate the method is for finite data samples. We propose an improved version of the FastICA algorithm which is asymptotically efficient, i.e., its accuracy given by the residual error variance attains the Cramer-Rao lower bound (CRB). The error is thus as small as possible. This result is rigorously proven under the assumption that the probability distribution of the independent signal components belongs to the class of generalized Gaussian (GG) distributions with parameter alpha, denoted GG(alpha) for alpha>2. We name the algorithm efficient FastICA (EFICA). Computational complexity of a Matlab implementation of the algorithm is shown to be only slightly (about three times) higher than that of the standard symmetric FastICA. Simulations corroborate these claims and show superior performance of the algorithm compared with algorithm JADE of Cardoso and Souloumiac and nonparametric ICA of Boscolo on separating sources with distribution GG(alpha) with arbitrary alpha, as well as on sources with bimodal distribution, and a good performance in separating linearly mixed speech signals

Book
01 Jan 2006
TL;DR: In this paper, the authors consider a class of models where the objective function is defined by a pop ulation objective function Q(6, P) for 6 e 0.9 and seek random sets that contain this objective function with at least some prespecified probability asymptotically.
Abstract: distribution of the observed data. The class of models we consider is defined by a pop ulation objective function Q(6, P) for 6 e 0. The point of departure from the classical extremum estimation framework is that it is not assumed that Q(6,P) has a unique minimizer in the parameter space 0. The goal may be either to draw inferences about some unknown point in the set of minimizers of the population objective function or to draw inferences about the set of minimizers itself. In this paper, the object of interest is 0o(P) = argminee(9 Q(0, P), and so we seek random sets that contain this set with at least some prespecified probability asymptotically. We also consider situations where the object of interest is the image of 0o(P) under a known function. Random sets that satisfy the desired coverage property are constructed under weak assumptions. Condi tions are provided under which the confidence regions are asymptotically valid not only pointwise in P, but also uniformly in P. We illustrate the use of our methods with an empirical study of the impact of top-coding outcomes on inferences about the parame ters of a linear regression. Finally, a modest simulation study sheds some light on the finite-sample behavior of our procedure.

Book
01 Jan 2006
TL;DR: In this article, the authors present an overview of the state-of-the-art in the area of survivability, risk, and robustness analysis in non-parametric Bayes models.
Abstract: Preface. Acknowledgements. 1 Introduction and Overview. 1.1 Preamble: What do 'Reliability', 'Risk' and 'Robustness' Mean? 1.2 Objectives and Prospective Readership. 1.3 Reliability, Risk and Survival: State-of-the-Art. 1.4 Risk Management: A Motivation for Risk Analysis. 1.5 Books on Reliability, Risk and Survival Analysis. 1.6 Overview of the Book. 2 The Quantification of Uncertainty. 2.1 Uncertain Quantities and Uncertain Events: Their Definition and Codification. 2.2 Probability: A Satisfactory Way to Quantify Uncertainty. 2.3 Overview of the Different Interpretations of Probability. 2.4 Extending the Rules of Probability: Law of Total Probability and Bayes' Law. 2.5 The Bayesian Paradigm: A Prescription for Reliability, Risk and Survival. Analysis. 2.6 Probability Models, Parameters, Inference and Prediction. 2.7 Testing Hypotheses: Posterior Odds and Bayes Factors. 2.8 Utility as Probability and Maximization of Expected Utility. 2.9 Decision Trees and Influence Diagrams for Risk Analysis. 3 Exchangeability and Indifference. 3.1 Introduction to Exchangeability: de Finetti's Theorem. 3.2 de Finetti-style Theorems for Infinite Sequences of Non-binary Random. 3.3 Error Bounds on de Finetti-style Results for Finite Sequences of Random. 4 Stochastic Models of Failure. 4.1 Introduction. 4.2 Preliminaries: Univariate, Multivariate and Multi-indexed Distribution Functions. 4.3 The Predictive Failure Rate Function of a Univariate Probability Distribution. 4.4 Interpretation and Uses of the Failure Rate Function - the Model Failure Rate. 4.5 Multivariate Analogues of the Failure Rate Function. 4.6 The Hazard Potential of Items and Individuals. 4.7 Probability Models for Interdependent Lifelengths. 4.8 Causality and Models for Cascading Failures. 4.9 Failure Distributions with Multiple Scales. 5 Parametric Failure Data Analysis. 5.1 Introduction and Perspective. 5.2 Assessing Predictive Distributions in the Absence of Data. 5.3 Prior Distributions in Chance Distributions. 5.4 Predictive Distributions Incorporating Failure Data. 5.5 Information from Life-tests: Learning from Data. 5.6 Optimal Testing: Design of Life-testing Experiments. 5.7 Adversarial Life-testing and Acceptance Sampling. 5.8 Accelerated Life-testing and Dose-response Experiments. 6 Composite Reliability: Signatures. 6.1 Introduction: Hierarchical Models. 6.2 'Composite Reliability': Partial Exchangeability. 6.3 Signature Analysis and Signatures as Covariates. 7 Survival in Dynamic Environments. 7.1 Introduction: Why Stochastic Hazard Functions? 7.2 Hazard Rate Processes. 7.3 Cumulative Hazard Processes. 7.4 Competing Risks and Competing Risk Processes. 7.5 Degradation and Aging Processes. 8 Point Processes for Event Histories. 8.1 Introduction: What is Event History? 8.2 Other Point Processes in Reliability and Life-testing. 8.3 Multiplicative Intensity and Multivariate Point Processes. 8.4 Dynamic Processes and Statistical Models: Martingales. 8.5 Point Processes with Multiplicative Intensities. 9 Non-parametric Bayes Methods in Reliability. 9.1 The What and Why of Non-parametric Bayes. 9.2 The Dirichlet Distribution and its Variants. 9.3 A Non-parametric Bayes Approach to Bioassay. 9.4 Prior Distributions on the Hazard Function. 9.5 Prior Distributions for the Cumulative Hazard Function. 9.6 Priors for the Cumulative Distribution Function. 10 Survivability of Co-operative, Competing and Vague Systems. 10.1 Introduction: Notion of Systems and their Components. 10.2 Coherent Systems and their Qualitative Properties. 10.3 The Survivability of Coherent Systems. 10.4 Machine Learning Methods in Survivability Assessment. 10.5 Reliability Allocation: Optimal System Design. Systems. 10.6 The Utility of Reliability: Optimum System Selection. 10.7 Multi-state and Vague Stochastic Systems. 11 Reliability and Survival in Econometrics and Finance. 11.1 Introduction and Overview. 11.2 Relating Metrics of Reliability to those of Income Inequality. 11.3 Invoking Reliability Theory in Financial Risk Assessment. 11.4 Inferential Issues in Asset Pricing. 11.5 Concluding Comments. Appendix A Markov Chain Monte Carlo Simulation. A.1 The Gibbs Sampling Algorithm. Appendix B Fourier Series Models and the Power Spectrum. Appendix C Network Survivability and Borel's Paradox. Bibliography. Index.

Journal ArticleDOI
TL;DR: In this paper, the authors present an intuitive review of these developments and contrast these estimators with multiple imputation from both a theoretical and a practical viewpoint, leading to the development of doubly robust or doubly protected estimators.
Abstract: Multiple imputation is now a well-established technique for analysing data sets where some units have incomplete observations. Provided that the imputation model is correct, the resulting estimates are consistent. An alternative, weighting by the inverse probability of observing complete data on a unit, is conceptually simple and involves fewer modelling assumptions, but it is known to be both inefficient (relative to a fully parametric approach) and sensitive to the choice of weighting model. Over the last decade, there has been a considerable body of theoretical work to improve the performance of inverse probability weighting, leading to the development of 'doubly robust' or 'doubly protected' estimators. We present an intuitive review of these developments and contrast these estimators with multiple imputation from both a theoretical and a practical viewpoint.

Journal ArticleDOI
TL;DR: A hybrid method is presented here, which jointly propagates probabilistic and possibilistic uncertainty and produces results in the form of a random fuzzy interval.
Abstract: Random variability and imprecision are two distinct facets of the uncertainty affecting parameters that influence the assessment of risk. While random variability can be represented by probability distribution functions, imprecision (or partial ignorance) is better accounted for by possibility distributions (or families of probability distributions). Because practical situations of risk computation often involve both types of uncertainty, methods are needed to combine these two modes of uncertainty representation in the propagation step. A hybrid method is presented here, which jointly propagates probabilistic and possibilistic uncertainty. It produces results in the form of a random fuzzy interval. This paper focuses on how to properly summarize this kind of information; and how to address questions pertaining to the potential violation of some tolerance threshold. While exploitation procedures proposed previously entertain a confusion between variability and imprecision, thus yielding overly conservative results, a new approach is proposed, based on the theory of evidence, and is illustrated using synthetic examples

Journal ArticleDOI
TL;DR: In this paper, necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model or other more general exponential models are formulated.
Abstract: We formulate necessary and sufficient conditions for an arbitrary discrete probability distribution to factor according to an undirected graphical model, or a log-linear model, or other more general exponential models. For decomposable graphical models these conditions are equivalent to a set of conditional independence statements similar to the Hammersley-Clifford theorem; however, we show that for nondecomposable graphical models they are not. We also show that nondecomposable models can have nonrational maximum likelihood estimates. These results are used to give several novel characterizations of decomposable graphical models.

Journal ArticleDOI
TL;DR: In this paper, the authors show that the sample analog estimator of the population identification region is given by a transformation of a Minkowski average of set valued random variables (SVRVs), which is a mapping that associates a set (rather than a real number) with each element of the sample space.
Abstract: We propose inference procedures for partially identified population features for which the population identification region can be written as a transformation of the Aumann expectation of a properly defined set valued random variable (SVRV). An SVRV is a mapping that associates a set (rather than a real number) with each element of the sample space. Examples of population features in this class include interval-identified scalar parameters, best linear predictors with interval outcome data, and parameters of semiparametric binary models with interval regressor data. We extend the analogy principle to SVRVs and show that the sample analog estimator of the population identification region is given by a transformation of a Minkowski average of SVRVs. Using the results of the mathematics literature on SVRVs, we show that this estimator converges in probability to the population identification region with respect to the Hausdorff distance. We then show that the Hausdorff distance and the directed Hausdorff distance between the population identification region and the estimator, when properly normalized by √n, converge in distribution to functions of a Gaussian process whose covariance kernel depends on parameters of the population identification region. We provide consistent bootstrap procedures to approximate these limiting distributions. Using similar arguments as those applied for vector valued random variables, we develop a methodology to test assumptions about the true identification region and its subsets. We show that these results can be used to construct a confidence collection and a directed confidence collection. Those are (respectively) collection of sets that, when specified as a null hypothesis for the true value (a subset of values) of the population identification region, cannot be rejected by our tests.

Journal ArticleDOI
TL;DR: The compact representation of incomplete probabilistic knowledge which can be encountered in risk evaluation problems, for instance in environmental studies is considered and the respective appropriateness of pairs of cumulative distributions, continuous possibility distributions or discrete random sets for representing information about the mean value, the mode, the median and other fractiles of ill-known probability distributions is discussed.

Proceedings Article
16 Jul 2006
TL;DR: In this paper, a necessary and sufficient graphical condition for the cases when the causal effect of an arbitrary set of variables on another arbitrary set can be determined uniquely from the available information, as well as an algorithm which computes the effect whenever this condition holds, is provided.
Abstract: This paper is concerned with estimating the effects of actions from causal assumptions, represented concisely as a directed graph, and statistical knowledge, given as a probability distribution. We provide a necessary and sufficient graphical condition for the cases when the causal effect of an arbitrary set of variables on another arbitrary set can be determined uniquely from the available information, as well as an algorithm which computes the effect whenever this condition holds. Furthermore, we use our results to prove completeness of do-calculus [Pearl, 1995], and a version of an identification algorithm in [Tian, 2002] for the same identification problem. Finally, we derive a complete characterization of semi-Markovian models in which all causal effects are identifiable.