scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Energy statistics: A class of statistics based on distances

TL;DR: Energy distance is a statistical distance between the distributions of random vectors, which characterizes equality of distributions as mentioned in this paper, and there is an elegant relation to the notion of potential energy between statistical observations.
About: This article is published in Journal of Statistical Planning and Inference.The article was published on 2013-08-01. It has received 561 citations till now. The article focuses on the topics: Energy distance & Distance correlation.
Citations
More filters
Posted Content
TL;DR: It is shown that statistical properties of adversarial examples are essential to their detection, and they are not drawn from the same distribution than the original data, and can thus be detected using statistical tests.
Abstract: Machine Learning (ML) models are applied in a variety of tasks such as network intrusion detection or Malware classification. Yet, these models are vulnerable to a class of malicious inputs known as adversarial examples. These are slightly perturbed inputs that are classified incorrectly by the ML model. The mitigation of these adversarial inputs remains an open problem. As a step towards understanding adversarial examples, we show that they are not drawn from the same distribution than the original data, and can thus be detected using statistical tests. Using thus knowledge, we introduce a complimentary approach to identify specific inputs that are adversarial. Specifically, we augment our ML model with an additional output, in which the model is trained to classify all adversarial inputs. We evaluate our approach on multiple adversarial example crafting methods (including the fast gradient sign and saliency map methods) with several datasets. The statistical test flags sample sets containing adversarial inputs confidently at sample sizes between 10 and 100 data points. Furthermore, our augmented model either detects adversarial examples as outliers with high accuracy (> 80%) or increases the adversary's cost - the perturbation added - by more than 150%. In this way, we show that statistical properties of adversarial examples are essential to their detection.

613 citations

Posted Content
TL;DR: A generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses and reproduces the possible segmentation variants as well as the frequencies with which they occur significantly better than published approaches.
Abstract: Many real-world vision problems suffer from inherent ambiguities. In clinical applications for example, it might not be clear from a CT scan alone which particular region is cancer tissue. Therefore a group of graders typically produces a set of diverse but plausible segmentations. We consider the task of learning a distribution over segmentations given an input. To this end we propose a generative segmentation model based on a combination of a U-Net with a conditional variational autoencoder that is capable of efficiently producing an unlimited number of plausible hypotheses. We show on a lung abnormalities segmentation task and on a Cityscapes segmentation task that our model reproduces the possible segmentation variants as well as the frequencies with which they occur, doing so significantly better than published approaches. These models could have a high impact in real-world applications, such as being used as clinical decision-making algorithms accounting for multiple plausible semantic segmentation hypotheses to provide possible diagnoses and recommend further actions to resolve the present ambiguities.

295 citations

Journal ArticleDOI
TL;DR: In this paper, an image processing technique designed to transfer colour information from one image to another is adapted for use as a multivariate bias correction algorithm (MBCn) for climate model projections/predictions of multiple climate variables.
Abstract: Most bias correction algorithms used in climatology, for example quantile mapping, are applied to univariate time series. They neglect the dependence between different variables. Those that are multivariate often correct only limited measures of joint dependence, such as Pearson or Spearman rank correlation. Here, an image processing technique designed to transfer colour information from one image to another—the N-dimensional probability density function transform—is adapted for use as a multivariate bias correction algorithm (MBCn) for climate model projections/predictions of multiple climate variables. MBCn is a multivariate generalization of quantile mapping that transfers all aspects of an observed continuous multivariate distribution to the corresponding multivariate distribution of variables from a climate model. When applied to climate model projections, changes in quantiles of each variable between the historical and projection period are also preserved. The MBCn algorithm is demonstrated on three case studies. First, the method is applied to an image processing example with characteristics that mimic a climate projection problem. Second, MBCn is used to correct a suite of 3-hourly surface meteorological variables from the Canadian Centre for Climate Modelling and Analysis Regional Climate Model (CanRCM4) across a North American domain. Components of the Canadian Forest Fire Weather Index (FWI) System, a complicated set of multivariate indices that characterizes the risk of wildfire, are then calculated and verified against observed values. Third, MBCn is used to correct biases in the spatial dependence structure of CanRCM4 precipitation fields. Results are compared against a univariate quantile mapping algorithm, which neglects the dependence between variables, and two multivariate bias correction algorithms, each of which corrects a different form of inter-variable correlation structure. MBCn outperforms these alternatives, often by a large margin, particularly for annual maxima of the FWI distribution and spatiotemporal autocorrelation of precipitation fields.

282 citations


Cites background from "Energy statistics: A class of stati..."

  • ...(2011), Vrac and Friederichs (2015), Mehrotra and Sharma (2016), and Cannon (2016), MBCn is not restricted to correcting a specified measure of joint dependence, such as Pearson or Spearman rank correlation, nor does it make strong stationarity assumptions about climate model temporal sequencing....

    [...]

Journal ArticleDOI
TL;DR: In this paper, the authors define the partial distance correlation statistics with the help of a new Hilbert space, and develop and implement a test for zero partial distance correlations, and provide an unbiased estimator of squared distance covariance, and a neat solution to the problem of distance correlation for dissimilarities rather than distances.
Abstract: Distance covariance and distance correlation are scalar coefficients that characterize independence of random vectors in arbitrary dimension. Properties, extensions and applications of distance correlation have been discussed in the recent literature, but the problem of defining the partial distance correlation has remained an open question of considerable interest. The problem of partial distance correlation is more complex than partial correlation partly because the squared distance covariance is not an inner product in the usual linear space. For the definition of partial distance correlation, we introduce a new Hilbert space where the squared distance covariance is the inner product. We define the partial distance correlation statistics with the help of this Hilbert space, and develop and implement a test for zero partial distance correlation. Our intermediate results provide an unbiased estimator of squared distance covariance, and a neat solution to the problem of distance correlation for dissimilarities rather than distances.

195 citations

References
More filters
Journal Article
TL;DR: Copyright (©) 1999–2012 R Foundation for Statistical Computing; permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and permission notice are preserved on all copies.
Abstract: Copyright (©) 1999–2012 R Foundation for Statistical Computing. Permission is granted to make and distribute verbatim copies of this manual provided the copyright notice and this permission notice are preserved on all copies. Permission is granted to copy and distribute modified versions of this manual under the conditions for verbatim copying, provided that the entire resulting derived work is distributed under the terms of a permission notice identical to this one. Permission is granted to copy and distribute translations of this manual into another language, under the above conditions for modified versions, except that this permission notice may be stated in a translation approved by the R Core Team.

272,030 citations

Book
08 Dec 1980
TL;DR: In this paper, the basic sample statistics are used for Parametric Inference, and the Asymptotic Theory in Parametric Induction (ATIP) is used to estimate the relative efficiency of given statistics.
Abstract: Preliminary Tools and Foundations. The Basic Sample Statistics. Transformations of Given Statistics. Asymptotic Theory in Parametric Inference. U--Statistics. Von Mises Differentiable Statistical Functions. M--Estimates. L--Estimates. R--Estimates. Asymptotic Relative Efficiency. Appendix. References. Author Index. Subject Index.

4,827 citations

Journal ArticleDOI
TL;DR: The theory of proper scoring rules on general probability spaces is reviewed and developed, and the intuitively appealing interval score is proposed as a utility function in interval estimation that addresses width as well as coverage.
Abstract: Scoring rules assess the quality of probabilistic forecasts, by assigning a numerical score based on the predictive distribution and on the event or value that materializes. A scoring rule is proper if the forecaster maximizes the expected score for an observation drawn from the distributionF if he or she issues the probabilistic forecast F, rather than G ≠ F. It is strictly proper if the maximum is unique. In prediction problems, proper scoring rules encourage the forecaster to make careful assessments and to be honest. In estimation problems, strictly proper scoring rules provide attractive loss and utility functions that can be tailored to the problem at hand. This article reviews and develops the theory of proper scoring rules on general probability spaces, and proposes and discusses examples thereof. Proper scoring rules derive from convex functions and relate to information measures, entropy functions, and Bregman divergences. In the case of categorical variables, we prove a rigorous version of the ...

4,644 citations


"Energy statistics: A class of stati..." refers background in this paper

  • ...See also historical comments on “Hoeffding-type inequalities” and their generalizations in Gneiting and Raftery (2007, Section 5.2)....

    [...]

  • ...The energy score (see Gneiting and Raftery [15])....

    [...]

  • ...(iv) The energy score (see Gneiting and Raftery, 2007)....

    [...]

Journal ArticleDOI
TL;DR: The pages of this expensive but invaluable reference work are dense with formulae of stupefying complexity as discussed by the authors, where definite/indefinite integral properties of a great variety of special functions are discussed.
Abstract: The pages of this expensive but invaluable reference work are dense with formulae of stupefying complexity. Chapters 1 and 2 treat definite/indefinite integral properties of a great variety of special functions, Chapters 3 and 4 (which are relatively brief) treat definite integrals of some piece-wi

3,784 citations


Additional excerpts

  • ...Applying formulas 3.3.2.1, p. 585, 2.2.4.24 p. 298 and 2.5.3.13 p. 387 of Prudnikov et al. (1986), we obtain A≔ Z Rd−1 dz2dz3…dzd ð1þz22þz23þ⋯þz2dÞðdþαÞ=2 ¼ 2π ðd−1Þ=2 Γ d−1 2 Z ∞ 0 xd−2 dx ð1þx2ÞðdþαÞ=2 ¼ πðd−1Þ=2Γ αþ1 2 Γ dþα 2 ; d da Z ∞ 0 1−cos au u1þα du ¼ aα−1 Z ∞ 0 sin v vα dv¼ aα−1…...

    [...]