scispace - formally typeset
Search or ask a question

Showing papers on "Uncertainty quantification published in 2020"


Journal ArticleDOI
TL;DR: In this article, a deep neural network (DNN) is used to enforce the initial and boundary conditions, and the governing partial differential equations (i.e., Navier-Stokes equations) are incorporated into the loss of the DNN to drive the training.

341 citations


Journal ArticleDOI
TL;DR: A Bayesian neural network is invoked and a natural way of quantifying uncertainties in classification problems by decomposing the moment-based predictive uncertainty into two parts: aleatoric and epistemic uncertainty is proposed.

261 citations


Journal ArticleDOI
TL;DR: A physics-informed neural network for cardiac activation mapping that accounts for the underlying wave propagation dynamics and quantifies the epistemic uncertainty associated with these predictions to open the door toward physics-based electro-anatomic mapping.
Abstract: A critical procedure in diagnosing atrial fibrillation is the creation of electro-anatomic activation maps. Current methods generate these mappings from interpolation using a few sparse data points recorded inside the atria; they neither include prior knowledge of the underlying physics nor uncertainty of these recordings. Here we propose a physics-informed neural network for cardiac activation mapping that accounts for the underlying wave propagation dynamics and we quantify the epistemic uncertainty associated with these predictions. These uncertainty estimates not only allow us to quantify the predictive error of the neural network, but also help to reduce it by judiciously selecting new informative measurement locations via active learning. We illustrate the potential of our approach using a synthetic benchmark problem and a personalized electrophysiology model of the left atrium. We show that our new method outperforms linear interpolation and Gaussian process regression for the benchmark problem and linear interpolation at clinical densities for the left atrium. In both cases, the active learning algorithm achieves lower error levels than random allocation. Our findings open the door towards physics-based electro-anatomic mapping with the ultimate goals to reduce procedural time and improve diagnostic predictability for patients affected by atrial fibrillation. Open source code is available at https://github.com/fsahli/EikonalNet.

218 citations


Journal ArticleDOI
TL;DR: In this paper, an auto-regressive dense encoder-decoder convolutional neural network is proposed to solve and model non-linear dynamical systems without training data at a computational cost that is potentially magnitudes lower than standard numerical solvers.

199 citations


Proceedings Article
12 Jul 2020
TL;DR: This work demonstrates through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD and argues that it is timely to focus on understanding the origin of the improved performance of cold posteriors.
Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are—as of early 2020—no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a “cold posterior” that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors. Code available on GitHub.

198 citations


Proceedings ArticleDOI
01 Jun 2020
TL;DR: This work proposes a comprehensive evaluation framework for scalable epistemic uncertainty estimation methods in deep learning and applies this framework to provide the first properly extensive and conclusive comparison of the two current state-of-the- art scalable methods: ensembling and MC-dropout.
Abstract: While deep neural networks have become the go-to approach in computer vision, the vast majority of these models fail to properly capture the uncertainty inherent in their predictions. Estimating this predictive uncertainty can be crucial, for example in automotive applications. In Bayesian deep learning, predictive uncertainty is commonly decomposed into the distinct types of aleatoric and epistemic uncertainty. The former can be estimated by letting a neural network output the parameters of a certain probability distribution. Epistemic uncertainty estimation is a more challenging problem, and while different scalable methods recently have emerged, no extensive comparison has been performed in a real-world setting. We therefore accept this task and propose a comprehensive evaluation framework for scalable epistemic uncertainty estimation methods in deep learning. Our proposed framework is specifically designed to test the robustness required in real-world computer vision applications. We also apply this framework to provide the first properly extensive and conclusive comparison of the two current state-of-the- art scalable methods: ensembling and MC-dropout. Our comparison demonstrates that ensembling consistently provides more reliable and practically useful uncertainty estimates. Code is available at https://github.com/fregu856/evaluating_bdl.

186 citations


Proceedings Article
01 Jan 2020
TL;DR: Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process and outperforms the other single-model approaches.
Abstract: Bayesian neural networks (BNN) and deep ensembles are principled approaches to estimate the predictive uncertainty of a deep learning model. However their practicality in real-time, industrial-scale applications are limited due to their heavy memory and inference cost. This motivates us to study principled approaches to high-quality uncertainty estimation that require only a single deep neural network (DNN). By formalizing the uncertainty quantification as a minimax learning problem, we first identify input distance awareness, i.e., the model's ability to quantify the distance of a testing example from the training data in the input space, as a necessary condition for a DNN to achieve high-quality (i.e., minimax optimal) uncertainty estimation. We then propose Spectral-normalized Neural Gaussian Process (SNGP), a simple method that improves the distance-awareness ability of modern DNNs, by adding a weight normalization step during training and replacing the output layer with a Gaussian process. On a suite of vision and language understanding tasks and on modern architectures (Wide-ResNet and BERT), SNGP is competitive with deep ensembles in prediction, calibration and out-of-domain detection, and outperforms the other single-model approaches.

175 citations


ReportDOI
01 May 2020
TL;DR: This manual is intended to summarize a set of Dakota-related research publications in the areas of surrogate-based optimization, uncertainty quantification, and optimization under uncertainty that provide the foundation for many of Dakota’s iterative analysis capabilities.
Abstract: The Dakota (Design Analysis Kit for Optimization and Terascale Applications) toolkit provides a flexible and extensible interface between simulation codes and iterative analysis methods. Dakota contains algorithms for optimization with gradient and nongradient-based methods; uncertainty quantification with sampling, reliability, and stochastic expansion methods; parameter estimation with nonlinear least squares methods; and sensitivity/variance analysis with design of experiments and parameter study methods. These capabilities may be used on their own or as components within advanced strategies such as surrogate-based optimization, mixed integer nonlinear programming, or optimization under uncertainty. By employing object-oriented design to implement abstractions of the key components required for iterative systems analyses, the Dakota toolkit provides a flexible and extensible problem-solving environment for design and performance analysis of computational models on high performance computers. This report serves as a theoretical manual for selected algorithms implemented within the Dakota software. It is not intended as a comprehensive theoretical treatment, since a number of existing texts cover general optimization theory, statistical analysis, and other introductory topics. Rather, this manual is intended to summarize a set of Dakota-related research publications in the areas of surrogate-based optimization, uncertainty quantification, and optimization under uncertainty that provide the foundation for many of Dakota’s iterative analysis capabilities. Dakota Version 6.1 Theory Manual generated on November 7, 2014

170 citations


Journal ArticleDOI
TL;DR: A novel probabilistic day-ahead net load forecasting method to capture both epistemic uncertainty and aleatoric uncertainty using Bayesian deep learning, which is a new field that combines Bayesian probability theory and deep learning.
Abstract: Decarbonization of electricity systems drives significant and continued investments in distributed energy sources to support the cost-effective transition to low-carbon energy systems. However, the rapid integration of distributed photovoltaic (PV) generation presents great challenges in obtaining reliable and secure grid operations because of its limited visibility and intermittent nature. Under this reality, net load forecasting is facing unprecedented difficulty in answering the following question: How can we accurately predict the net load while capturing the massive uncertainties arising from distributed PV generation and load, especially in the context of high PV penetration? This paper proposes a novel probabilistic day-ahead net load forecasting method to capture both epistemic uncertainty and aleatoric uncertainty using Bayesian deep learning, which is a new field that combines Bayesian probability theory and deep learning. The proposed methodological framework employs clustering in subprofiles and considers residential rooftop PV outputs as input features to enhance the performance of aggregated net load forecasting. Numerical experiments have been carried out based on fine-grained smart meter data from the Australian grid with separately recorded measurements of rooftop PV generation and loads. The results demonstrate the superior performance of the proposed scheme compared with a series of state-of-the-art methods and indicate the importance and effectiveness of subprofile clustering and high PV visibility.

141 citations


Journal ArticleDOI
TL;DR: Inspired by the idea of Bayesian machine learning, a Bayesian deep-learning-based (BDL-based) method is proposed in this paper for health prognostics with uncertainty quantification, and a variational-inference-based method is presented for the BNNs learning and inference.
Abstract: Deep-learning-based health prognostics is receiving ever-increasing attention. Most existing methods leverage advanced neural networks for prognostics performance improvement, providing mainly point estimates as prognostics results without addressing prognostics uncertainty. However, uncertainty is critical for both health prognostics and subsequent decision making, especially for safety-critical applications. Inspired by the idea of Bayesian machine learning, a Bayesian deep-learning-based (BDL-based) method is proposed in this paper for health prognostics with uncertainty quantification. State-of-the-art deep learning models are extended into Bayesian neural networks (BNNs), and a variational-inference-based method is presented for the BNNs learning and inference. The proposed method is validated through a ball bearing dataset and a turbofan engine dataset. Other than point estimates, health prognostics using the BDL-based method is enhanced with uncertainty quantification. Scalability and generalization ability of state-of-the-art deep learning models can be well inherited. Stochastic regularization techniques, widely available in mainstream software libraries, can be leveraged to efficiently implement the BDL-based method for practical applications.

139 citations


Journal ArticleDOI
TL;DR: This paper introduces a set of quantitative criteria to capture different uncertainty aspects, and uses these criteria to compare MC-Dropout, Deep Ensembles, and bootstrapping, both theoretically in a unified framework that separates aleatoric/epistemic uncertainty and experimentally on public datasets.
Abstract: Advances in deep neural network (DNN)-based molecular property prediction have recently led to the development of models of remarkable accuracy and generalization ability, with graph convolutional neural networks (GCNNs) reporting state-of-the-art performance for this task. However, some challenges remain, and one of the most important that needs to be fully addressed concerns uncertainty quantification. DNN performance is affected by the volume and the quality of the training samples. Therefore, establishing when and to what extent a prediction can be considered reliable is just as important as outputting accurate predictions, especially when out-of-domain molecules are targeted. Recently, several methods to account for uncertainty in DNNs have been proposed, most of which are based on approximate Bayesian inference. Among these, only a few scale to the large data sets required in applications. Evaluating and comparing these methods has recently attracted great interest, but results are generally fragmented and absent for molecular property prediction. In this paper, we quantitatively compare scalable techniques for uncertainty estimation in GCNNs. We introduce a set of quantitative criteria to capture different uncertainty aspects and then use these criteria to compare MC-dropout, Deep Ensembles, and bootstrapping, both theoretically in a unified framework that separates aleatoric/epistemic uncertainty and experimentally on public data sets. Our experiments quantify the performance of the different uncertainty estimation methods and their impact on uncertainty-related error reduction. Our findings indicate that Deep Ensembles and bootstrapping consistently outperform MC-dropout, with different context-specific pros and cons. Our analysis leads to a better understanding of the role of aleatoric/epistemic uncertainty, also in relation to the target data set features, and highlights the challenge posed by out-of-domain uncertainty.

Journal ArticleDOI
TL;DR: The salient features of the 3D-CNN approach make it a potentially suitable alternative for facilitating material design with fast product design iteration and efficient uncertainty quantification, and the benefits over conventional finite-element-based homogenization are discussed in sequence.

Journal ArticleDOI
TL;DR: Uncertainty quantification (UQ) is an important component of molecular property prediction, particularly for drug discovery applications where model predictions direct experimental design and where experimental design is directly influenced by experimental design.
Abstract: Uncertainty quantification (UQ) is an important component of molecular property prediction, particularly for drug discovery applications where model predictions direct experimental design and where...

Proceedings Article
12 Jul 2020
TL;DR: A rank-1 parameterization of BNNs is proposed, where each weight matrix involves only a distribution on aRank-1 subspace, and the use of mixture approximate posteriors to capture multiple modes is revisited.
Abstract: Bayesian neural networks (BNNs) demonstrate promising success in improving the robustness and uncertainty quantification of modern deep learning. However, they generally struggle with underfitting at scale and parameter efficiency. On the other hand, deep ensembles have emerged as alternatives for uncertainty quantification that, while outperforming BNNs on certain problems, also suffer from efficiency issues. It remains unclear how to combine the strengths of these two approaches and remediate their common issues. To tackle this challenge, we propose a rank-1 parameterization of BNNs, where each weight matrix involves only a distribution on a rank-1 subspace. We also revisit the use of mixture approximate posteriors to capture multiple modes, where unlike typical mixtures, this approach admits a significantly smaller memory increase (e.g., only a 0.4% increase for a ResNet-50 mixture of size 10). We perform a systematic empirical study on the choices of prior, variational posterior, and methods to improve training. For ResNet-50 on ImageNet, Wide ResNet 28-10 on CIFAR-10/100, and an RNN on MIMIC-III, rank-1 BNNs achieve state-of-the-art performance across log-likelihood, accuracy, and calibration on the test sets and out-of-distribution variants.

Posted Content
TL;DR: An algorithm is presented that modifies any classifier to output a predictive set containing the true label with a user-specified probability, such as 90%, which provides a formal finite-sample coverage guarantee for every model and dataset.
Abstract: Convolutional image classifiers can achieve high predictive accuracy, but quantifying their uncertainty remains an unresolved challenge, hindering their deployment in consequential settings. Existing uncertainty quantification techniques, such as Platt scaling, attempt to calibrate the network's probability estimates, but they do not have formal guarantees. We present an algorithm that modifies any classifier to output a predictive set containing the true label with a user-specified probability, such as 90%. The algorithm is simple and fast like Platt scaling, but provides a formal finite-sample coverage guarantee for every model and dataset. Our method modifies an existing conformal prediction algorithm to give more stable predictive sets by regularizing the small scores of unlikely classes after Platt scaling. In experiments on both Imagenet and Imagenet-V2 with ResNet-152 and other classifiers, our scheme outperforms existing approaches, achieving coverage with sets that are often factors of 5 to 10 smaller than a stand-alone Platt scaling baseline.

Journal ArticleDOI
TL;DR: In this paper, a machine learning algorithm, based on deep artificial neural networks, was proposed to predict the underlying input parameters to observable map from a few training samples (computed realizations of this map).

Journal ArticleDOI
TL;DR: An in depth discussion of a recently introduced method for the inverse quantification of spatial interval uncertainty is provided and its performance is illustrated using a case studies taken from literature.
Abstract: This paper gives an overview of recent advances in the field of non-probabilistic uncertainty quantification. Both techniques for the forward propagation and inverse quantification of interval and fuzzy uncertainty are discussed. Also the modeling of spatial uncertainty in an interval and fuzzy context is discussed. An in depth discussion of a recently introduced method for the inverse quantification of spatial interval uncertainty is provided and its performance is illustrated using a case studies taken from literature. It is shown that the method enables an accurate quantification of spatial uncertainty under very low data availability and with a very limited amount of assumptions on the underlying uncertainty. Finally, also a conceptual comparison with the class of Bayesian methods for uncertainty quantification is provided.

Journal ArticleDOI
TL;DR: In this paper, a neural network is used to represent the unknown constitutive relations, and neural networks are compared with piecewise linear functions, radial basis functions, and radial basis function networks, and the neural network outperforms the others in certain cases.

Journal ArticleDOI
TL;DR: In this article, the authors investigated the accelerating up transient vibrations of a rotor system under both the random and uncertain-but-bounded parameters, and used the Polynomial Chaos Expansion (PCE) coupled with the Chebyshev Surrogate Method (CSM) to analyze the propagations of the two categorizes of uncertainties.

Proceedings ArticleDOI
02 Apr 2020
TL;DR: It is shown that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and how model uncertainty is impacted across individual input features and patient subgroups is analyzed.
Abstract: In medicine, both ethical and monetary costs of incorrect predictions can be significant, and the complexity of the problems often necessitates increasingly complex models. Recent work has shown that changing just the random seed is enough for otherwise well-tuned deep neural networks to vary in their individual predicted probabilities. In light of this, we investigate the role of model uncertainty methods in the medical domain. Using RNN ensembles and various Bayesian RNNs, we show that population-level metrics, such as AUC-PR, AUC-ROC, log-likelihood, and calibration error, do not capture model uncertainty. Meanwhile, the presence of significant variability in patient-specific predictions and optimal decisions motivates the need for capturing model uncertainty. Understanding the uncertainty for individual patients is an area with clear clinical impact, such as determining when a model decision is likely to be brittle. We further show that RNNs with only Bayesian embeddings can be a more efficient way to capture model uncertainty compared to ensembles, and we analyze how model uncertainty is impacted across individual input features and patient subgroups.

Journal ArticleDOI
03 Apr 2020
TL;DR: This work proposes a scalable neural network framework with quantification of decomposed uncertainty using a bootstrap ensemble and demonstrates that the proposed uncertainty quantification method provides additional insight to the crowd counting problem and is simple to implement.
Abstract: Research in neural networks in the field of computer vision has achieved remarkable accuracy for point estimation. However, the uncertainty in the estimation is rarely addressed. Uncertainty quantification accompanied by point estimation can lead to a more informed decision, and even improve the prediction quality. In this work, we focus on uncertainty estimation in the domain of crowd counting. With increasing occurrences of heavily crowded events such as political rallies, protests, concerts, etc., automated crowd analysis is becoming an increasingly crucial task. The stakes can be very high in many of these real-world applications. We propose a scalable neural network framework with quantification of decomposed uncertainty using a bootstrap ensemble. We demonstrate that the proposed uncertainty quantification method provides additional insight to the crowd counting problem and is simple to implement. We also show that our proposed method exhibits state-of-the-art performances in many benchmark crowd counting datasets.

Proceedings Article
04 Mar 2020
TL;DR: DUQ as discussed by the authors is a deterministic deep model that can find and reject out-of-distribution data points at test time with a single forward pass, based on the idea of RBF networks.
Abstract: We propose a method for training a deterministic deep model that can find and reject out of distribution data points at test time with a single forward pass. Our approach, deterministic uncertainty quantification (DUQ), builds upon ideas of RBF networks. We scale training in these with a novel loss function and centroid updating scheme and match the accuracy of softmax models. By enforcing detectability of changes in the input using a gradient penalty, we are able to reliably detect out of distribution data. Our uncertainty quantification scales well to large datasets, and using a single model, we improve upon or match Deep Ensembles in out of distribution detection on notable difficult dataset pairs such as FashionMNIST vs. MNIST, and CIFAR-10 vs. SVHN.

Journal ArticleDOI
TL;DR: In this paper, a new method called eigenvector continuation (EC) is used for constructing an efficient and accurate emulator for nuclear many-body observables, thereby enabling uncertainty quantification in multi-nucleon systems.

Journal ArticleDOI
TL;DR: A comprehensive survey of computational intelligence techniques for wind power uncertainty quantification in smart grids and methods of incorporating wind power forecast uncertainties into power system decision-making processes are investigated.
Abstract: The high penetration level of renewable energy is thought to be one of the basic characteristics of future smart grids. Wind power, as one of the most increasing renewable energy, has brought a large number of uncertainties into the power systems. These uncertainties would require system operators to change their traditional ways of decision-making. This article provides a comprehensive survey of computational intelligence techniques for wind power uncertainty quantification in smart grids. First, prediction intervals (PIs) are introduced as a means to quantify the uncertainties in wind power forecasts. Various PI evaluation indices, including the latest trends in comprehensive evaluation techniques, are compared. Furthermore, computational intelligence-based PI construction methods are summarized and classified into traditional methods (parametric) and direct PI construction methods (nonparametric). In the second part of this article, methods of incorporating wind power forecast uncertainties into power system decision-making processes are investigated. Three techniques, namely, stochastic models, fuzzy logic models, and robust optimization, and different power system applications using these techniques are reviewed. Finally, future research directions, such as spatiotemporal and hierarchical forecasting, deep learning-based methods, and integration of predictive uncertainty estimates into the decision-making process, are discussed. This survey can benefit the readers by providing a complete technical summary of wind power uncertainty quantification and decision-making in smart grids.

Posted Content
TL;DR: This article showed that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD, and showed that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence.
Abstract: During the past five years the Bayesian deep learning community has developed increasingly accurate and efficient approximate inference procedures that allow for Bayesian inference in deep neural networks. However, despite this algorithmic progress and the promise of improved uncertainty quantification and sample efficiency there are---as of early 2020---no publicized deployments of Bayesian neural networks in industrial practice. In this work we cast doubt on the current understanding of Bayes posteriors in popular deep neural networks: we demonstrate through careful MCMC sampling that the posterior predictive induced by the Bayes posterior yields systematically worse predictions compared to simpler methods including point estimates obtained from SGD. Furthermore, we demonstrate that predictive performance is improved significantly through the use of a "cold posterior" that overcounts evidence. Such cold posteriors sharply deviate from the Bayesian paradigm but are commonly used as heuristic in Bayesian deep learning papers. We put forward several hypotheses that could explain cold posteriors and evaluate the hypotheses through experiments. Our work questions the goal of accurate posterior approximations in Bayesian deep learning: If the true Bayes posterior is poor, what is the use of more accurate approximations? Instead, we argue that it is timely to focus on understanding the origin of the improved performance of cold posteriors.

Posted Content
TL;DR: It is believed that existing UQ methods are not sufficient for all common use cases and further research is needed, but a practical recommendation is concluded as to which existing techniques seem to perform well relative to others.
Abstract: Uncertainty quantification (UQ) is an important component of molecular property prediction, particularly for drug discovery applications where model predictions direct experimental design and where unanticipated imprecision wastes valuable time and resources. The need for UQ is especially acute for neural models, which are becoming increasingly standard yet are challenging to interpret. While several approaches to UQ have been proposed in the literature, there is no clear consensus on the comparative performance of these models. In this paper, we study this question in the context of regression tasks. We systematically evaluate several methods on five benchmark datasets using multiple complementary performance metrics. Our experiments show that none of the methods we tested is unequivocally superior to all others, and none produces a particularly reliable ranking of errors across multiple datasets. While we believe these results show that existing UQ methods are not sufficient for all common use-cases and demonstrate the benefits of further research, we conclude with a practical recommendation as to which existing techniques seem to perform well relative to others.

Journal ArticleDOI
TL;DR: A variance reduction approach for Monte Carlo (MC) sampling that accelerates the estimation of statistics of computationally expensive simulation models using an ensemble of models with lower cost using a non-recursive approach is described and analyzed.

Journal ArticleDOI
TL;DR: In this paper, the authors developed a methodology for intelligent mission planning using the digital twin approach, with the objective of performing the required work while meeting the damage tolerance requirement, which includes the quantification of the uncertainty in diagnosis, prognosis, and optimization, considering both aleatory and epistemic uncertainty sources.

Journal ArticleDOI
TL;DR: A sequential model calibration and validation (SeCAV) framework is proposed to improve the efficacy of both model parameter calibration and bias correction for the purpose of uncertainty quantification and reduction.

Posted Content
TL;DR: This work proposes a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework and achieves the desired coverage with reasonably short intervals.
Abstract: Evaluating treatment effect heterogeneity widely informs treatment decision making. At the moment, much emphasis is placed on the estimation of the conditional average treatment effect via flexible machine learning algorithms. While these methods enjoy some theoretical appeal in terms of consistency and convergence rates, they generally perform poorly in terms of uncertainty quantification. This is troubling since assessing risk is crucial for reliable decision-making in sensitive and uncertain environments. In this work, we propose a conformal inference-based approach that can produce reliable interval estimates for counterfactuals and individual treatment effects under the potential outcome framework. For completely randomized or stratified randomized experiments with perfect compliance, the intervals have guaranteed average coverage in finite samples regardless of the unknown data generating mechanism. For randomized experiments with ignorable compliance and general observational studies obeying the strong ignorability assumption, the intervals satisfy a doubly robust property which states the following: the average coverage is approximately controlled if either the propensity score or the conditional quantiles of potential outcomes can be estimated accurately. Numerical studies on both synthetic and real datasets empirically demonstrate that existing methods suffer from a significant coverage deficit even in simple models. In contrast, our methods achieve the desired coverage with reasonably short intervals.