scispace - formally typeset
Search or ask a question

Showing papers on "Conditional probability distribution published in 2018"


Proceedings ArticleDOI
15 Oct 2018
TL;DR: This paper proposes a Manifold Embedded Distribution Alignment (MEDA) approach, which learns a domain-invariant classifier in Grassmann manifold with structural risk minimization, while performing dynamic distribution alignment to quantitatively account for the relative importance of marginal and conditional distributions.
Abstract: Visual domain adaptation aims to learn robust classifiers for the target domain by leveraging knowledge from a source domain. Existing methods either attempt to align the cross-domain distributions, or perform manifold subspace learning. However, there are two significant challenges: (1) degenerated feature transformation, which means that distribution alignment is often performed in the original feature space, where feature distortions are hard to overcome. On the other hand, subspace learning is not sufficient to reduce the distribution divergence. (2) unevaluated distribution alignment, which means that existing distribution alignment methods only align the marginal and conditional distributions with equal importance, while they fail to evaluate the different importance of these two distributions in real applications. In this paper, we propose a Manifold Embedded Distribution Alignment (MEDA) approach to address these challenges. MEDA learns a domain-invariant classifier in Grassmann manifold with structural risk minimization, while performing dynamic distribution alignment to quantitatively account for the relative importance of marginal and conditional distributions. To the best of our knowledge, MEDA is the first attempt to perform dynamic distribution alignment for manifold domain adaptation. Extensive experiments demonstrate that MEDA shows significant improvements in classification accuracy compared to state-of-the-art traditional and deep methods.

503 citations


Book ChapterDOI
08 Sep 2018
TL;DR: This work proposes an end-to-end conditional invariant deep domain generalization approach by leveraging deep neural networks for domain-invariant representation learning and proves the effectiveness of the proposed method.
Abstract: Domain generalization aims to learn a classification model from multiple source domains and generalize it to unseen target domains. A critical problem in domain generalization involves learning domain-invariant representations. Let X and Y denote the features and the labels, respectively. Under the assumption that the conditional distribution P(Y|X) remains unchanged across domains, earlier approaches to domain generalization learned the invariant representation T(X) by minimizing the discrepancy of the marginal distribution P(T(X)). However, such an assumption of stable P(Y|X) does not necessarily hold in practice. In addition, the representation learning function T(X) is usually constrained to a simple linear transformation or shallow networks. To address the above two drawbacks, we propose an end-to-end conditional invariant deep domain generalization approach by leveraging deep neural networks for domain-invariant representation learning. The domain-invariance property is guaranteed through a conditional invariant adversarial network that can learn domain-invariant representations w.r.t. the joint distribution P(T(X), Y) if the target domain data are not severely class unbalanced. We perform various experiments to demonstrate the effectiveness of the proposed method.

490 citations


Journal ArticleDOI
TL;DR: In this paper, the authors propose a new framework of "model-X" knockoffs, which reads from a different perspective the knockoff procedure that was originally designed for controlling the false discovery rate in linear models.
Abstract: Many contemporary large‐scale applications involve building interpretable models linking a large set of potential covariates to a response in a non‐linear fashion, such as when the response is binary. Although this modelling problem has been extensively studied, it remains unclear how to control the fraction of false discoveries effectively even in high dimensional logistic regression, not to mention general high dimensional non‐linear models. To address such a practical problem, we propose a new framework of ‘model‐X’ knockoffs, which reads from a different perspective the knockoff procedure that was originally designed for controlling the false discovery rate in linear models. Whereas the knockoffs procedure is constrained to homoscedastic linear models with n⩾p, the key innovation here is that model‐X knockoffs provide valid inference from finite samples in settings in which the conditional distribution of the response is arbitrary and completely unknown. Furthermore, this holds no matter the number of covariates. Correct inference in such a broad setting is achieved by constructing knockoff variables probabilistically instead of geometrically. To do this, our approach requires that the covariates are random (independent and identically distributed rows) with a distribution that is known, although we provide preliminary experimental evidence that our procedure is robust to unknown or estimated distributions. To our knowledge, no other procedure solves the controlled variable selection problem in such generality but, in the restricted settings where competitors exist, we demonstrate the superior power of knockoffs through simulations. Finally, we apply our procedure to data from a case–control study of Crohn's disease in the UK, making twice as many discoveries as the original analysis of the same data.

371 citations


Posted Content
TL;DR: This paper proposes a novel transfer learning approach, named as Balanced Distribution Adaptation (BDA), which can adaptively leverage the importance of the marginal and conditional distribution discrepancies, and several existing methods can be treated as special cases of BDA.
Abstract: Transfer learning has achieved promising results by leveraging knowledge from the source domain to annotate the target domain which has few or none labels. Existing methods often seek to minimize the distribution divergence between domains, such as the marginal distribution, the conditional distribution or both. However, these two distances are often treated equally in existing algorithms, which will result in poor performance in real applications. Moreover, existing methods usually assume that the dataset is balanced, which also limits their performances on imbalanced tasks that are quite common in real problems. To tackle the distribution adaptation problem, in this paper, we propose a novel transfer learning approach, named as Balanced Distribution \underline{A}daptation~(BDA), which can adaptively leverage the importance of the marginal and conditional distribution discrepancies, and several existing methods can be treated as special cases of BDA. Based on BDA, we also propose a novel Weighted Balanced Distribution Adaptation~(W-BDA) algorithm to tackle the class imbalance issue in transfer learning. W-BDA not only considers the distribution adaptation between domains but also adaptively changes the weight of each class. To evaluate the proposed methods, we conduct extensive experiments on several transfer learning tasks, which demonstrate the effectiveness of our proposed algorithms over several state-of-the-art methods.

259 citations


Proceedings Article
21 Feb 2018
TL;DR: In this article, instancewise feature selection is introduced as a methodology for modelinterpretation, which is based on learning a function to extract a subset of features that are most informative for each given example.
Abstract: We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show the effectiveness of our method on a variety of synthetic and real data sets using both quantitative metrics and human evaluation.

257 citations


Journal ArticleDOI
TL;DR: In this article, a test that evaluates nonlinear causalities and possible causal relations in all conditional quantiles is proposed, which provides a sufficient condition for Granger-causality when all quantiles are considered.
Abstract: This paper proposes a consistent parametric test of Granger-causality in quantiles. Although the concept of Granger-causality is defined in terms of the conditional distribution, most articles have tested Granger-causality using conditional mean regression models in which the causal relations are linear. Rather than focusing on a single part of the conditional distribution, we develop a test that evaluates nonlinear causalities and possible causal relations in all conditional quantiles, which provides a sufficient condition for Granger-causality when all quantiles are considered. The proposed test statistic has correct asymptotic size, is consistent against fixed alternatives, and has power against Pitman deviations from the null hypothesis. As the proposed test statistic is asymptotically nonpivotal, we tabulate critical values via a subsampling approach. We present Monte Carlo evidence and an application considering the causal relation between the gold price, the USD/GBP exchange rate, and the o...

163 citations


Proceedings Article
01 Jan 2018
TL;DR: This work proposes an approach for solving causal domain adaptation problems that exploits causal inference and does not rely on prior knowledge of the causal graph, the type of interventions or the intervention targets, and demonstrates a possible implementation on simulated and real world data.
Abstract: An important goal common to domain adaptation and causal inference is to make accurate predictions when the distributions for the source (or training) domain(s) and target (or test) domain(s) differ. In many cases, these different distributions can be modeled as different contexts of a single underlying system, in which each distribution corresponds to a different perturbation of the system, or in causal terms, an intervention. We focus on a class of such causal domain adaptation problems, where data for one or more source domains are given, and the task is to predict the distribution of a certain target variable from measurements of other variables in one or more target domains. We propose an approach for solving these problems that exploits causal inference and does not rely on prior knowledge of the causal graph, the type of interventions or the intervention targets. We demonstrate our approach by evaluating a possible implementation on simulated and real world data.

159 citations


Journal ArticleDOI
TL;DR: In this article, the authors proposed a new class of copula-based dynamic models for high-dimensional conditional distributions, facilitating the estimation of a wide variety of measures of systemic risk.
Abstract: This article proposes a new class of copula-based dynamic models for high-dimensional conditional distributions, facilitating the estimation of a wide variety of measures of systemic risk. Our proposed models draw on successful ideas from the literature on modeling high-dimensional covariance matrices and on recent work on models for general time-varying distributions. Our use of copula-based models enables the estimation of the joint model in stages, greatly reducing the computational burden. We use the proposed new models to study a collection of daily credit default swap (CDS) spreads on 100 U.S. firms over the period 2006 to 2012. We find that while the probability of distress for individual firms has greatly reduced since the financial crisis of 2008–2009, the joint probability of distress (a measure of systemic risk) is substantially higher now than in the precrisis period. Supplementary materials for this article are available online.

144 citations


Journal Article
TL;DR: In this article, the authors relax the usual covariate shift assumption and assume that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks.
Abstract: Methods of transfer learning try to combine knowledge from several related tasks (or domains) to improve performance on a test task. Inspired by causal methodology, we relax the usual covariate shift assumption and assume that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks. We show how this assumption can be motivated from ideas in the field of causality. We focus on the problem of Domain Generalization, in which no examples from the test task are observed. We prove that in an adversarial setting using this subset for prediction is optimal in Domain Generalization; we further provide examples, in which the tasks are sufficiently diverse and the estimator therefore outperforms pooling the data, even on average. If examples from the test task are available, we also provide a method to transfer knowledge from the training tasks and exploit all available features for prediction. However, we provide no guarantees for this method. We introduce a practical method which allows for automatic inference of the above subset and provide corresponding code. We present results on synthetic data sets and a gene deletion data set.

143 citations


Proceedings Article
29 Apr 2018
TL;DR: This paper proposes to learn a feature representation which has domain-invariant class conditional distributions P(h(X)|Y), and proposes a conditional invariant representation which can be guaranteed if the class prior P(Y ) does not change across training and test domains.
Abstract: Domain generalization aims to apply knowledge gained from multiple labeled source domains to unseen target domains. The main difficulty comes from the dataset bias: training data and test data have different distributions, and the training set contains heterogeneous samples from different distributions. Let X denote the features, and Y be the class labels. Existing domain generalization methods address the dataset bias problem by learning a domain-invariant representation h(X) that has the same marginal distribution P(h(X)) across multiple source domains. The functional relationship encoded in P(Y|X) is usually assumed to be stable across domains such that P(Y|h(X)) is also invariant. However, it is unclear whether this assumption holds in practical problems. In this paper, we consider the general situation where both P(X) and P(Y|X) can change across all domains. We propose to learn a feature representation which has domain-invariant class conditional distributions P(h(X)|Y). With the conditional invariant representation, the invariance of the joint distribution P(h(X),Y) can be guaranteed if the class prior P(Y) does not change across training and test domains. Extensive experiments on both synthetic and real data demonstrate the effectiveness of the proposed method.

127 citations


Journal ArticleDOI
TL;DR: In this article, the authors derived the coverage probability of a typical receiver, which is an arbitrarily chosen receiving node, assuming independent Nakagami-$m$ fading over all wireless channels.
Abstract: In this paper, we consider a vehicular network in which the wireless nodes are located on a system of roads. We model the roadways, which are predominantly straight and randomly oriented, by a Poisson line process (PLP) and the locations of nodes on each road as a homogeneous 1D Poisson point process. Assuming that each node transmits independently, the locations of transmitting and receiving nodes are given by two Cox processes driven by the same PLP. For this setup, we derive the coverage probability of a typical receiver, which is an arbitrarily chosen receiving node, assuming independent Nakagami- $m$ fading over all wireless channels. Assuming that the typical receiver connects to its closest transmitting node in the network, we first derive the distribution of the distance between the typical receiver and the serving node to characterize the desired signal power. We then characterize coverage probability for this setup, which involves two key technical challenges. First, we need to handle several cases as the serving node can possibly be located on any line in the network and the corresponding interference experienced at the typical receiver is different in each case. Second, conditioning on the serving node imposes constraints on the spatial configuration of lines, which requires careful analysis of the conditional distribution of the lines. We address these challenges in order to characterize the interference experienced at the typical receiver. We then derive an exact expression for coverage probability in terms of the derivative of Laplace transform of interference power distribution. We analyze the trends in coverage probability as a function of the network parameters: line density and node density. We also provide some theoretical insights by studying the asymptotic characteristics of coverage probability.

Posted Content
TL;DR: This paper showed that the conditional distribution of GDP growth depends on financial conditions, with growth-at-risk (GaR)-defined as conditional growth at the lower 5th percentile-more responsive than the median or upper percentiles.
Abstract: Using panel quantile regressions, we show that the conditional distribution of GDP growth depends on financial conditions, with growth-at-risk (GaR)-defined as conditional growth at the lower 5th percentile-more responsive than the median or upper percentiles. The term structure of GaR features an intertemporal tradeoff: GaR is higher in the short run but lower in the medium run when initial financial conditions are loose relative to typical levels. This shift in the growth distribution generally is not incorporated when solving dynamic stochastic general equilibrium models with macrofinancial linkages, which suggests downside risks to GDP growth are systematically underestimated.

Journal ArticleDOI
TL;DR: In this article, a Gaussian mixture model (GMM) is proposed to represent endmember variability in hyperspectral unmixing, which can not only estimate the abundances and distribution parameters, but also the distinct endmember set for each pixel.
Abstract: Hyperspectral unmixing while considering endmember variability is usually performed by the normal compositional model, where the endmembers for each pixel are assumed to be sampled from unimodal Gaussian distributions. However, in real applications, the distribution of a material is often not Gaussian. In this paper, we use Gaussian mixture models (GMM) to represent endmember variability. We show, given the GMM starting premise, that the distribution of the mixed pixel (under the linear mixing model) is also a GMM (and this is shown from two perspectives). The first perspective originates from random variable transformations and gives a conditional density function of the pixels given the abundances and GMM parameters. With proper smoothness and sparsity prior constraints on the abundances, the conditional density function leads to a standard maximum a posteriori (MAP) problem which can be solved using generalized expectation maximization. The second perspective originates from marginalizing over the endmembers in the GMM, which provides us with a foundation to solve for the endmembers at each pixel. Hence, compared to the other distribution based methods, our model can not only estimate the abundances and distribution parameters, but also the distinct endmember set for each pixel. We tested the proposed GMM on several synthetic and real datasets, and showed its potential by comparing it to current popular methods.

Journal ArticleDOI
TL;DR: Quantile regression as discussed by the authors quantifies the association of explanatory variables with a conditional quantile of a dependent variable without assuming any specific conditional distribution, and hence it can be used to quantify the association between explanatory variables and conditional quantiles.
Abstract: :Quantile regression quantifies the association of explanatory variables with a conditional quantile of a dependent variable without assuming any specific conditional distribution. It hence...

Journal ArticleDOI
TL;DR: In this article, a hidden Markov model (HMM) and Gaussian Mixture Regression (GMR) were combined for probabilistic monthly streamflow forecasting, and the performance of HMM-GMR was verified based on the mean square error and continuous ranked probability score skill scores.

Journal ArticleDOI
TL;DR: A novel PLF method to leverage existing point load forecasts by modeling the conditional forecast residual by conducting point forecasting and integrating the conditional distribution of the residual on the point forecast.
Abstract: Probabilistic load forecasting (PLF) has gained widespread attention in recent years because it presents more uncertainty information about the future loads. To further improve the PLF performance, this letter proposes a novel PLF method to leverage existing point load forecasts by modeling the conditional forecast residual. Specifically, the method firstly conducts point forecasting using the historical load data and related factors to obtain the point forecast. Then, this point forecast is used as an additional input feature to describe the conditional distribution of the residual on the point forecast. Finally, the point forecast and conditional distribution of the residual are integrated together to produce the final probabilistic forecast. By comparing different point forecasting and quantile regression models, comprehensive case studies obtained from a publicly available load dataset with multiple zones demonstrate the advantages of our proposed method. This letter also informatively reveals the relationship between point and probabilistic forecast accuracies.

Journal ArticleDOI
TL;DR: In this paper, a Gaussian mixture model is used to construct analytical conditional distributions of forecast errors for multiple wind farms with respect to different forecast values, and a sampling method is proposed to generate scenarios from the conditional distributions which are non-Gaussian and interdependent.

Posted Content
TL;DR: This paper develops a Bayesian framework for placing priors over these conditional density estimators using variational Bayesian neural networks and presents an efficient method for fitting them to complex densities.
Abstract: Modeling complex conditional distributions is critical in a variety of settings Despite a long tradition of research into conditional density estimation, current methods employ either simple parametric forms or are difficult to learn in practice This paper employs normalising flows as a flexible likelihood model and presents an efficient method for fitting them to complex densities These estimators must trade-off between modeling distributional complexity, functional complexity and heteroscedasticity without overfitting We recognize these trade-offs as modeling decisions and develop a Bayesian framework for placing priors over these conditional density estimators using variational Bayesian neural networks We evaluate this method on several small benchmark regression datasets, on some of which it obtains state of the art performance Finally, we apply the method to two spatial density modeling tasks with over 1 million datapoints using the New York City yellow taxi dataset and the Chicago crime dataset

Posted Content
TL;DR: A new causal discovery method based on a game between players estimating each variable distribution conditionally to the others as a neural net, and an adversary aimed at discriminating the overall joint conditional distribution, and that of the original data.
Abstract: A new causal discovery method, Structural Agnostic Modeling (SAM), is presented in this paper. Leveraging both conditional independencies and distributional asymmetries in the data, SAM aims to find the underlying causal structure from observational data. The approach is based on a game between different players estimating each variable distribution conditionally to the others as a neural net, and an adversary aimed at discriminating the overall joint conditional distribution, and that of the original data. A learning criterion combining distribution estimation, sparsity and acyclicity constraints is used to enforce the end-to-end optimization of the graph structure and parameters through stochastic gradient descent. Besides a theoretical analysis of the approach in the large sample limit, SAM is extensively experimentally validated on synthetic and real data.

Journal ArticleDOI
01 Mar 2018-Extremes
TL;DR: In this paper, a general point process model for extreme episodes in data, and how conditioning the distribution of extreme episodes on threshold exceedance gives four basic representations of the family of generalized Pareto distributions.
Abstract: Multivariate peaks over thresholds modelling based on generalized Pareto distributions has up to now only been used in few and mostly two-dimensional situations. This paper contributes theoretical understanding, models which can respect physical constraints, inference tools, and simulation methods to support routine use, with an aim at higher dimensions. We derive a general point process model for extreme episodes in data, and show how conditioning the distribution of extreme episodes on threshold exceedance gives four basic representations of the family of generalized Pareto distributions. The first representation is constructed on the real scale of the observations. The second one starts with a model on a standard exponential scale which is then transformed to the real scale. The third and fourth representations are reformulations of a spectral representation proposed in Ferreira and de Haan (Bernoulli 20(4), 1717–1737, 2014). Numerically tractable forms of densities and censored densities are found and give tools for flexible parametric likelihood inference. New simulation algorithms, explicit formulas for probabilities and conditional probabilities, and conditions which make the conditional distribution of weighted component sums generalized Pareto are derived.

Journal ArticleDOI
TL;DR: In this paper, a new empirical method was developed to model the prediction uncertainty of the solar irradiance forecast on numerical weather prediction. And the proposed method comprises of four steps: first, predicted and measured solar irradiances were transformed into Gaussian random variables using data observed in a modeling window in the near past.

Journal ArticleDOI
TL;DR: In this paper, a cascade of increasingly complex transformation models that can be estimated, compared and analysed in the maximum likelihood framework is presented, and the asymptotic normality of the proposed estimators is established for discrete and continuous responses.
Abstract: We propose and study properties of maximum likelihood estimators in the class of conditional transformation models. Based on a suitable explicit parameterization of the unconditional or conditional transformation function, we establish a cascade of increasingly complex transformation models that can be estimated, compared and analysed in the maximum likelihood framework. Models for the unconditional or conditional distribution function of any univariate response variable can be set up and estimated in the same theoretical and computational framework simply by choosing an appropriate transformation function and parameterization thereof. The ability to evaluate the distribution function directly allows us to estimate models based on the exact likelihood, especially in the presence of random censoring or truncation. For discrete and continuous responses, we establish the asymptotic normality of the proposed estimators. A reference software implementation of maximum likelihood-based estimation for conditional transformation models that allows the same flexibility as the theory developed here was employed to illustrate the wide range of possible applications.

Journal ArticleDOI
TL;DR: A new criterion of domain-shared group sparsity that is an equivalent condition for conditional distribution alignment is proposed and developed, together with marginal distribution alignment, to develop a domain- Shared group-sparse dictionary learning model to learn domain- shared representations with aligned joint distributions.

Posted Content
TL;DR: A novel Contextual Generative Adversarial Nets (C-GANs) is proposed to specifically take face aging into consideration, and produces appealing results by comparing with the state-of-the-art and ground truth.
Abstract: Face aging, which renders aging faces for an input face, has attracted extensive attention in the multimedia research. Recently, several conditional Generative Adversarial Nets (GANs) based methods have achieved great success. They can generate images fitting the real face distributions conditioned on each individual age group. However, these methods fail to capture the transition patterns, e.g., the gradual shape and texture changes between adjacent age groups. In this paper, we propose a novel Contextual Generative Adversarial Nets (C-GANs) to specifically take it into consideration. The C-GANs consists of a conditional transformation network and two discriminative networks. The conditional transformation network imitates the aging procedure with several specially designed residual blocks. The age discriminative network guides the synthesized face to fit the real conditional distribution. The transition pattern discriminative network is novel, aiming to distinguish the real transition patterns with the fake ones. It serves as an extra regularization term for the conditional transformation network, ensuring the generated image pairs to fit the corresponding real transition pattern distribution. Experimental results demonstrate the proposed framework produces appealing results by comparing with the state-of-the-art and ground truth. We also observe performance gain for cross-age face verification.

Posted Content
TL;DR: This paper introduces the R-package cAIC4 that allows for the computation of the conditional Akaike Information Criterion (cAIC), and introduces a fast and stable implementation for the calculation of the cA IC for linear mixed models estimated with lme4 and additive mixed models Estimated with gamm4.
Abstract: Model selection in mixed models based on the conditional distribution is appropriate for many practical applications and has been a focus of recent statistical research. In this paper we introduce the R-package cAIC4 that allows for the computation of the conditional Akaike Information Criterion (cAIC). Computation of the conditional AIC needs to take into account the uncertainty of the random effects variance and is therefore not straightforward. We introduce a fast and stable implementation for the calculation of the cAIC for linear mixed models estimated with lme4 and additive mixed models estimated with gamm4 . Furthermore, cAIC4 offers a stepwise function that allows for a fully automated stepwise selection scheme for mixed models based on the conditional AIC. Examples of many possible applications are presented to illustrate the practical impact and easy handling of the package.

Journal ArticleDOI
TL;DR: All max- linear models which are generated by a recursive structural equation model are characterized, and it is shown that its max-linear coefficient matrix is the solution of a fixed point equation.
Abstract: We consider a new structural equation model in which all random variables can be written as a max-linear function of their parents and independent noise variables. For the corresponding graph we assume that it is a directed acyclic graph. We show that the model is max-linear and detail the relation between the weights of the structural equation model and the max-linear coefficients. We characterize all max-linear models which are generated by this structural equation model. This leads to the presentation of a max-linear structural equation model as the solution of a fixed point equation and to a unique minimal DAG describing the relationships between the variables.The model structure introduces an order between the random variables, which yields certain model reductions, represented by subgraphs of the DAG which we call order DAGs. This results also in a reduced form for the regular conditional distributions compared to previous representations.

Journal ArticleDOI
TL;DR: In this paper, the authors proposed a matrix variate regression model for high dimensional data, where the response on each unit is a random matrix and the predictor X can be either a scalar, a vector or a matrix, treated as non-stochastic in terms of the conditional distribution Y|X.
Abstract: Summary Modern technology often generates data with complex structures in which both response and explanatory variables are matrix valued. Existing methods in the literature can tackle matrix-valued predictors but are rather limited for matrix-valued responses. We study matrix variate regressions for such data, where the response Y on each experimental unit is a random matrix and the predictor X can be either a scalar, a vector or a matrix, treated as non-stochastic in terms of the conditional distribution Y|X. We propose models for matrix variate regressions and then develop envelope extensions of these models. Under the envelope framework, redundant variation can be eliminated in estimation and the number of parameters can be notably reduced when the matrix variate dimension is large, possibly resulting in significant gains in efficiency. The methods proposed are applicable to high dimensional settings.

Posted Content
TL;DR: This work proposes constructing conformal prediction sets which contain a set of labels rather than a single label, and demonstrates the performance on the ImageNet ILSVRC dataset and the CelebA and IMDB-Wiki facial datasets using high dimensional features obtained from state of the art convolutional neural networks.
Abstract: Most classifiers operate by selecting the maximum of an estimate of the conditional distribution $p(y|x)$ where $x$ stands for the features of the instance to be classified and $y$ denotes its label. This often results in a {\em hubristic bias}: overconfidence in the assignment of a definite label. Usually, the observations are concentrated on a small volume but the classifier provides definite predictions for the entire space. We propose constructing conformal prediction sets which contain a set of labels rather than a single label. These conformal prediction sets contain the true label with probability $1-\alpha$. Our construction is based on $p(x|y)$ rather than $p(y|x)$ which results in a classifier that is very cautious: it outputs the null set --- meaning "I don't know" --- when the object does not resemble the training examples. An important property of our approach is that adversarial attacks are likely to be predicted as the null set or would also include the true label. We demonstrate the performance on the ImageNet ILSVRC dataset and the CelebA and IMDB-Wiki facial datasets using high dimensional features obtained from state of the art convolutional neural networks.

Proceedings Article
03 Dec 2018
TL;DR: In this article, Gaussian Processes (GP) are used to map the latent variables from the conditional distribution to the input of the conditional density estimator, which can be used to model densities even in sparse data regions.
Abstract: Conditional Density Estimation (CDE) models deal with estimating conditional distributions. The conditions imposed on the distribution are the inputs of the model. CDE is a challenging task as there is a fundamental trade-off between model complexity, representational capacity and overfitting. In this work, we propose to extend the model's input with latent variables and use Gaussian processes (GP) to map this augmented input onto samples from the conditional distribution. Our Bayesian approach allows for the modeling of small datasets, but we also provide the machinery for it to be applied to big data using stochastic variational inference. Our approach can be used to model densities even in sparse data regions, and allows for sharing learned structure between conditions. We illustrate the effectiveness and wide-reaching applicability of our model on a variety of real-world problems, such as spatio-temporal density estimation of taxi drop-offs, non-Gaussian noise modeling, and few-shot learning on omniglot images.

Journal ArticleDOI
TL;DR: This work proposes an event-triggered control scheme for discrete-time linear systems subject to Gaussian white noise disturbances that outperforms traditional periodic control for the same average transmission rate and does not generate transmissions in the absence of disturbances.