scispace - formally typeset
Open AccessPosted Content

Testing for Outliers with Conformal p-values

Reads0
Chats0
TLDR
In this paper, the authors propose a conformal inference framework for nonparametric outlier detection, which yields p-values that are marginally valid but mutually dependent for different test points.
Abstract
This paper studies the construction of p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We propose a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid but mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Our results depart from classical conformal inference as we leverage concentration inequalities rather than combinatorial arguments to establish our finite-sample guarantees. Furthermore, our techniques also yield a uniform confidence bound for the false positive rate of any outlier detection algorithm, as a function of the threshold applied to its raw statistics. Finally, the relevance of our results is demonstrated by numerical experiments on real and simulated data.

read more

Citations
More filters
Journal ArticleDOI

Conformal prediction beyond exchangeability

TL;DR: These algorithms are provably robust, with substantially less loss of coverage when exchangeability is violated due to distribution drift or other challenging features of real data, while also achieving the same coverage guarantees as existing conformal prediction methods if the data points are in fact exchangeable.
Posted Content

Task-Driven Out-of-Distribution Detection with Statistical Guarantees for Robot Learning.

TL;DR: In this paper, the authors leverage PAC-Bayes theory to train a policy with a guaranteed bound on performance on the training distribution, and then rely on the following intuition: violation of the performance bound on test environments provides evidence that the robot is operating OOD.
Posted Content

Root-finding Approaches for Computing Conformal Prediction Set.

TL;DR: Conformal prediction as discussed by the authors constructs a confidence set for an unobserved response of a feature vector based on previous identically distributed and exchangeable observations of responses and features, which has a coverage guarantee at any nominal level without additional assumptions on their distribution.
Posted Content

Semi-supervised multiple testing

TL;DR: In this article , a null distribution-free approach for multiple testing in the following semi-supervised setting is proposed, where the user does not know the null distribution, but has at hand a single sample drawn from this null distribution.
Posted Content

Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach

TL;DR: This article proposed a model-free framework for sensitivity analysis of individual treatment effects (ITEs), building upon ideas from conformal inference, which relies on reliable predictive inference of counterfactuals and ITEs in situations where the training data is confounded.
References
More filters
Journal ArticleDOI

Controlling the false discovery rate: a practical and powerful approach to multiple testing

TL;DR: In this paper, a different approach to problems of multiple significance testing is presented, which calls for controlling the expected proportion of falsely rejected hypotheses -the false discovery rate, which is equivalent to the FWER when all hypotheses are true but is smaller otherwise.
Journal Article

Scikit-learn: Machine Learning in Python

TL;DR: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems, focusing on bringing machine learning to non-specialists using a general-purpose high-level language.
Book ChapterDOI

Individual Comparisons by Ranking Methods

TL;DR: The comparison of two treatments generally falls into one of the following two categories: (a) a number of replications for each of the two treatments, which are unpaired, or (b) we may have a series of paired comparisons, some of which may be positive and some negative as mentioned in this paper.
Book

Statistical Methods for Research Workers

R. A. Fisher
TL;DR: The prime object of as discussed by the authors is to put into the hands of research workers, and especially of biologists, the means of applying statistical tests accurately to numerical data accumulated in their own laboratories or available in the literature.
Journal ArticleDOI

The control of the false discovery rate in multiple testing under dependency

TL;DR: In this paper, it was shown that a simple FDR controlling procedure for independent test statistics can also control the false discovery rate when test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses.
Related Papers (5)