Testing for Outliers with Conformal p-values

Open AccessPosted Content

Testing for Outliers with Conformal p-values

Stephen Bates, +4 more

- 16 Apr 2021 -

arXiv: Methodology

Chats0

TLDR

In this paper, the authors propose a conformal inference framework for nonparametric outlier detection, which yields p-values that are marginally valid but mutually dependent for different test points.

Abstract:

This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Conformal prediction beyond exchangeability

Rina Foygel Barber, +3 more

- 27 Feb 2022 -

Annals of Statistics

TL;DR: These algorithms are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable.

...read moreread less

Posted Content

Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning.

Alec Farid, +2 more

- 25 Jun 2021 -

arXiv: Robotics

TL;DR: In this paper, the authors leverage PAC-Bayes theory to train a policy with a guaranteed bound on performance on the training distribution, and then rely on the following intuition: violation of the performance bound on test environments provides evidence that the robot is operating OOD.

...read moreread less

Posted Content

Root-finding Approaches for Computing Conformal Prediction Set.

Eugene Ndiaye, +1 more

- 14 Apr 2021 -

arXiv: Machine Learning

TL;DR: Conformal prediction as discussed by the authors constructs a confidence set for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations of responses and features, which has a coverage guarantee at any nominal level without additional assumptions on their distribution.

...read moreread less

Posted Content

Semi-supervised multiple testing

TL;DR: In this article , a null distribution-free approach for multiple testing in the following semi-supervised setting is proposed, where the user does not know the null distribution, but has at hand a single sample drawn from this null distribution.

...read moreread less

Posted Content

Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach

Ying Jin, +2 more

- 23 Nov 2021 -

arXiv: Methodology

TL;DR: This article proposed a model-free framework for sensitivity analysis of individual treatment effects (ITEs), building upon ideas from conformal inference, which relies on reliable predictive inference of counterfactuals and ITEs in situations where the training data is confounded.

...read moreread less

References

PDF

Open Access

More filters

Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Yoav Benjamini, +1 more

- 01 Jan 1995 -

Journal of the royal statistical society...

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.

...read moreread less

Book ChapterDOI

Individual Comparisons by Ranking Methods

Frank Wilcoxon

- 01 Dec 1945 -

Biometrics

TL;DR: The comparison of two treatments generally falls into one of the following two categories: (a) a number of replications for each of the two treatments, which are unpaired, or (b) we may have a series of paired comparisons, some of which may be positive and some negative as mentioned in this paper.

...read moreread less

Book

Statistical Methods for Research Workers

R. A. Fisher

TL;DR: The prime object of as discussed by the authors is to put into the hands of research workers, and especially of biologists, the means of applying statistical tests accurately to numerical data accumulated in their own laboratories or available in the literature.

...read moreread less

Journal ArticleDOI

The control of the false discovery rate in multiple testing under dependency

Yoav Benjamini, +1 more

- 01 Aug 2001 -

Annals of Statistics

TL;DR: In this paper, it was shown that a simple FDR controlling procedure for independent test statistics can also control the false discovery rate when test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses.

...read moreread less

Collapse

arXiv: Machine Learning

Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures

Yaowu Liu, +1 more

- 27 Aug 2018 -

arXiv: Methodology

Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models

Adel Javanmard, +1 more

Testing for Outliers with Conformal p-values

Citations

Conformal prediction beyond exchangeability

Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning.

Root-finding Approaches for Computing Conformal Prediction Set.

Semi-supervised multiple testing

Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach

References

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Scikit-learn: Machine Learning in Python

Individual Comparisons by Ranking Methods

Statistical Methods for Research Workers

The control of the false discovery rate in multiple testing under dependency

Related Papers (5)

Algorithmic Learning in a Random World

Conformalized Quantile Regression

Exact and Robust Conformal Inference Methods for Predictive Machine Learning With Dependent Data

Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures

Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models