scispace - formally typeset
Search or ask a question

Showing papers on "Conditional probability distribution published in 2019"


Journal ArticleDOI
TL;DR: The methods proposed are not meant to replace the well-established quantile regression estimator, but provide an additional tool that can allow the estimation of regression quantiles in settings where otherwise that would be difficult or even impossible.

584 citations


Journal ArticleDOI
TL;DR: This paper provides a methodology that incorporates the governing equations of the physical model in the loss/likelihood functions of the model predictive density and the reference conditional density as a minimization problem of the reverse Kullback-Leibler (KL) divergence.

560 citations


Proceedings Article
01 Jan 2019
TL;DR: This paper proposed to incorporate attention into NPs, allowing each input location to attend to relevant context points for the prediction, which greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.
Abstract: Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.

125 citations


Posted Content
TL;DR: It is shown that simple addition of the proposed regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.
Abstract: We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.

118 citations


Posted Content
TL;DR: Attention is incorporated into NPs, allowing each input location to attend to the relevant context points for the prediction, which greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.
Abstract: Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.

117 citations


Proceedings ArticleDOI
Xuezhe Ma1, Chunting Zhou1, Xian Li2, Graham Neubig1, Eduard Hovy1 
05 Sep 2019
TL;DR: This paper turns to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables, achieving comparable performance with state-of-the-art non-autoregressive NMT models.
Abstract: Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art non-autoregressive NMT models and almost constant decoding time w.r.t the sequence length.

101 citations


Journal ArticleDOI
TL;DR: This work proposes computationally efficient approaches to conducting inference in the distributed estimation setting and proves that the proposed procedure does not sacrifice any statistical inferential accuracy provided that the number of distributed computing units and quantile levels are chosen properly.
Abstract: The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct a conditional quantile regression process through projection based on these estimated quantile curves. Our general quantile regression framework covers both linear models with fixed or growing dimension and series approximation models. We prove that the proposed procedure does not sacrifice any statistical inferential accuracy provided that the number of distributed computing units and quantile levels are chosen properly. In particular, a sharp upper bound for the former and a sharp lower bound for the latter are derived to capture the minimal computational cost from a statistical perspective. As an important application, the statistical inference on conditional distribution functions is considered. Moreover, we propose computationally efficient approaches to conducting inference in the distributed estimation setting described above. Those approaches directly utilize the availability of estimators from subsamples and can be carried out at almost no additional computational cost. Simulations confirm our statistical inferential theory.

94 citations


Posted Content
TL;DR: This work proposes a new method, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches.
Abstract: We present a new method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest. For example, in protein design, one may wish to find the protein sequence that maximizes fluorescence. We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e.g., protein sequences) design space to a distribution over a property of interest (e.g. protein fluorescence). At first glance, this problem can be framed as one of optimizing the oracle(s) with respect to the input. However, many state-of-the-art predictive models, such as neural networks, are known to suffer from pathologies, especially for data far from the training distribution. Thus we need to modulate the optimization of the oracle inputs with prior knowledge about what makes `realistic' inputs (e.g., proteins that stably fold). Herein, we propose a new method to solve this problem, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches. Formally, our method achieves its success by using model-based adaptive sampling to estimate the conditional distribution of the input sequences given the desired properties.

89 citations


Proceedings Article
01 Jan 2019
TL;DR: In this paper, the authors propose to explicitly regularize the generator to produce diverse outputs depending on latent codes, which can be easily integrated into most conditional GAN objectives to control a balance between visual quality and diversity.
Abstract: We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.

76 citations


Journal ArticleDOI
TL;DR: It is shown that marginal and conditional distributions have different contributions to the domain divergence, and the proposed Dynamic Distribution Adaptation (DDA) is able to provide good quantitative evaluation of their relative importance, which leads to better performance.
Abstract: Transfer learning aims to learn robust classifiers for the target domain by leveraging knowledge from a source domain. Since the source and the target domains are usually from different distributions, existing methods mainly focus on adapting the cross-domain marginal or conditional distributions. However, in real applications, the marginal and conditional distributions usually have different contributions to the domain discrepancy. Existing methods fail to quantitatively evaluate the different importance of these two distributions, which will result in unsatisfactory transfer performance. In this paper, we propose a novel concept called Dynamic Distribution Adaptation (DDA), which is capable of quantitatively evaluating the relative importance of each distribution. DDA can be easily incorporated into the framework of structural risk minimization to solve transfer learning problems. On the basis of DDA, we propose two novel learning algorithms: (1) Manifold Dynamic Distribution Adaptation (MDDA) for traditional transfer learning, and (2) Dynamic Distribution Adaptation Network (DDAN) for deep transfer learning. Extensive experiments demonstrate that MDDA and DDAN significantly improve the transfer learning performance and setup a strong baseline over the latest deep and adversarial methods on digits recognition, sentiment analysis, and image classification. More importantly, it is shown that marginal and conditional distributions have different contributions to the domain divergence, and our DDA is able to provide good quantitative evaluation of their relative importance which leads to better performance. We believe this observation can be helpful for future research in transfer learning.

73 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: In this article, a dissimilarity coefficient based probabilistic learning objective is proposed to minimize the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object.
Abstract: We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category. In order to model the uncertainty in the location of the objects, we employ a dissimilarity coefficient based probabilistic learning objective. The learning objective minimizes the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution. The main computational challenge is the complex nature of the conditional distribution, which consists of terms over hundreds or thousands of variables. The complexity of the conditional distribution rules out the possibility of explicitly modeling it. Instead, we exploit the fact that deep learning frameworks rely on stochastic optimization. This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution. Extensive experiments on PASCAL VOC 2007 and 2012 data sets demonstrate the efficacy of our proposed approach.

Book
07 May 2019
TL;DR: In this article, Bernoulli's model is used for estimating the probability of a single event model with respect to a set of variables, including the mean and expected value of the variables.
Abstract: Probability * Introduction * Models in General * The Frequency Approach Rejected * The Single Event Model * Symmetry as the Measure of Probability * Independence * Subsets of a Sample Space * Conditional Probability * Randomness * Critique of the Model Some Mathematical Tools * Permutations * Combinations * The Binomial DistributionBernoulli Trials * Random Variables, Mean and the Expected Value * The Variance * The Generating Function * The Weak Law of Large Numbers * The Statistical Assignment of Probability * The Representation of Information Methods for Solving Problems * The Five Methods * The Total Sample Space and Fair Games * Enumeration * Historical Approach * Recursive Approach * Recursive Approach * The Method of Random Variables * Critique of the Notion of a Fair Game * Bernoulli Evaluation * Robustness * InclusionExclusion Principle Countably Infinite Sample Spaces * Introduction * Bernoulli Trials * On the Strategy to be Adopted * State Diagrams * Generating Functions of State Diagrams * Expanding a Rational Generating Function * Checking the Solution * Paradoxes Continuous Sample Spaces * A Philosophy of the Real Number System * Some First Examples * Some Paradoxes * The Normal Distribution * The Distribution of Numbers * Convergence to the Reciprocal Distribution * Random Times * Dead Times * Poisson Distribution in Time * Queing Theorem * Birth and Death Systems * Summary Uniform Probability Assignments Maximum Entropy * What is Entropy? * Shannons Entropy * Some Mathematical Properties of the Entropy Function * Some Simple Applications * The Maximum Entropy Principle Models of Probability * General Remarks * Maximum Likelihood in a Binary Choice * Von Mises Probability * The Mathematical Approach * The Statistical Approach * When The Mean Does Not Exist * Probability as an Extension of Logic * Di Finetti * Subjective Probability * Fuzzy Probability * Probability in Science * Complex Probability Some Limit Theorems * The Biomial Approximation for the case p=1/2 * Approximation by the Normal Distribution * Another Derivation of the Normal Distribution * Random Times * The Zipf Distribution * Summary An Essay on Simulation

Posted Content
Xuezhe Ma1, Chunting Zhou1, Xian Li2, Graham Neubig1, Eduard Hovy1 
TL;DR: This paper proposed a generative flow-based model for non-autoregressive sequence generation using latent variable models and achieved state-of-the-art performance on NMT NMT benchmark datasets.
Abstract: Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art non-autoregressive NMT models and almost constant decoding time w.r.t the sequence length.

Posted Content
TL;DR: This work develops a prediction method that works in conjunction with many powerful classical methods as well as modern high-dimensional methods for estimating conditional distributions and establishes approximate conditional validity under consistent estimation and approximate unconditional validity under model misspecification, under overfitting, and with time series data.
Abstract: We propose a robust method for constructing conditionally valid prediction intervals based on models for conditional distributions such as quantile and distribution regression. Our approach can be applied to important prediction problems including cross-sectional prediction, k-step-ahead forecasts, synthetic controls and counterfactual prediction, and individual treatment effects prediction. Our method exploits the probability integral transform and relies on permuting estimated ranks. Unlike regression residuals, ranks are independent of the predictors, allowing us to construct conditionally valid prediction intervals under heteroskedasticity. We establish approximate conditional validity under consistent estimation and provide approximate unconditional validity under model misspecification, overfitting, and with time series data. We also propose a simple "shape" adjustment of our baseline method that yields optimal prediction intervals.

Proceedings Article
24 May 2019
TL;DR: In this paper, the authors propose a new method for design problems where the goal is to maximize or specify the value of one or more properties of interest (e.g., protein fluorescence).
Abstract: We present a new method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest. For example, in protein design, one may wish to find the protein sequence that maximizes fluorescence. We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e.g., protein sequences) design space to a distribution over a property of interest (e.g. protein fluorescence). At first glance, this problem can be framed as one of optimizing the oracle(s) with respect to the input. However, many state-of-the-art predictive models, such as neural networks, are known to suffer from pathologies, especially for data far from the training distribution. Thus we need to modulate the optimization of the oracle inputs with prior knowledge about what makes `realistic' inputs (e.g., proteins that stably fold). Herein, we propose a new method to solve this problem, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches. Formally, our method achieves its success by using model-based adaptive sampling to estimate the conditional distribution of the input sequences given the desired properties.

Journal ArticleDOI
TL;DR: The package MSGARCH as discussed by the authors implements Markov-switching GARCH (generalized autoregressive conditional heteroscedasticity) models in R with efficient C++ object-oriented programming.
Abstract: We describe the package MSGARCH, which implements Markov-switching GARCH (generalized autoregressive conditional heteroscedasticity) models in R with efficient C++ object-oriented programming. Markov-switching GARCH models have become popular methods to account for regime changes in the conditional variance dynamics of time series. The package MSGARCH allows the user to perform simulations as well as maximum likelihood and Bayesian Markov chain Monte Carlo estimations of a very large class of Markov-switching GARCH-type models. The package also provides methods to make single-step and multi-step ahead forecasts of the complete conditional density of the variable of interest. Risk management tools to estimate conditional volatility, value-at-risk, and expected-shortfall are also available. We illustrate the broad functionality of the MSGARCH package using exchange rate and stock market return data.

Journal ArticleDOI
TL;DR: A novel deep Emotion-Conditional Adaption Network (ECAN) is proposed to learn domain-invariant and discriminative feature representations, which can match both the marginal and the conditional distributions across domains simultaneously.
Abstract: Datasets play an important role in the progress of facial expression recognition algorithms, but they may suffer from obvious biases caused by different cultures and collection conditions. To look deeper into this bias, we first conduct comprehensive experiments on dataset recognition and crossdataset generalization tasks, and for the first time explore the intrinsic causes of the dataset discrepancy. The results quantitatively verify that current datasets have a strong buildin bias and corresponding analyses indicate that the conditional probability distributions between source and target datasets are different. However, previous researches are mainly based on shallow features with limited discriminative ability under the assumption that the conditional distribution remains unchanged across domains. To address these issues, we further propose a novel deep Emotion-Conditional Adaption Network (ECAN) to learn domain-invariant and discriminative feature representations, which can match both the marginal and the conditional distributions across domains simultaneously. In addition, the largely ignored expression class distribution bias is also addressed by a learnable re-weighting parameter, so that the training and testing domains can share similar class distribution. Extensive cross-database experiments on both lab-controlled datasets (CK+, JAFFE, MMI and Oulu-CASIA) and real-world databases (AffectNet, FER2013, RAF-DB 2.0 and SFEW 2.0) demonstrate that our ECAN can yield competitive performances across various facial expression transfer tasks and outperform the state-of-theart methods.

Journal ArticleDOI
TL;DR: The proposed hybrid approach effectively quantifies the uncertainty involved in extreme learning machine network by applying an ensemble structure and a logistic distribution-based ensemble model output statistics technique to provide superior full distributional forecasting skill over the existing approaches.
Abstract: In recent years, probabilistic forecast of electricity price has become of particular interests to market participants as it can effectively model the uncertainties due to competitive market behaviors. Decision makers heavily rely on such forecast to formulate optimal strategies with minimal risk and maximum profits to deal with stochasticity in market and system operation. Different from the widely used volatility models with least square or maximum likelihood techniques in probabilistic forecast of prices, this paper proposes a reliable continuous ranked probability score-oriented predictive density construction strategy for day-ahead electricity prices. The proposed method effectively quantifies the uncertainty involved in extreme learning machine network by applying an ensemble structure and a logistic distribution-based ensemble model output statistics technique. Moreover, an efficient covariance structure directly determined by the empirical correlations of observed probabilistic forecast series is developed to capture the essential temporal interdependence thus to facilitate the operational scenarios’ generation. Through validating on the real day-ahead market in Sweden, the proposed hybrid approach proves to provide superior full distributional forecasting skill over the existing approaches.

Journal ArticleDOI
TL;DR: A novel distribution discrepancy evaluating method called auto-balanced high-order Kullback–Leibler (AHKL) divergence is proposed, which can evaluate both the first and higher-order moment discrepancies and adapt the weights between them dimensionally and automatically, and smooth conditional distribution alignment (SCDA) is developed.

Proceedings Article
24 May 2019
TL;DR: The Strong "No Free Lunch" Theorem as mentioned in this paperawzi et al. showed that any classifier can be adversarially fooled with high probability once the perturbations are slightly greater than the natural noise level in the problem.
Abstract: This manuscript presents some new impossibility results on adversarial robustness in machine learning, a very important yet largely open problem. We show that if conditioned on a class label the data distribution satisfies the $W_2$ Talagrand transportation-cost inequality (for example, this condition is satisfied if the conditional distribution has density which is log-concave; is the uniform measure on a compact Riemannian manifold with positive Ricci curvature, any classifier can be adversarially fooled with high probability once the perturbations are slightly greater than the natural noise level in the problem. We call this result The Strong "No Free Lunch" Theorem as some recent results (Tsipras et al. 2018, Fawzi et al. 2018, etc.) on the subject can be immediately recovered as very particular cases. Our theoretical bounds are demonstrated on both simulated and real data (MNIST). We conclude the manuscript with some speculation on possible future research directions.

Journal ArticleDOI
TL;DR: A vine copula regression method that uses regular vines and handles mixed continuous and discrete variables and can efficiently compute the conditional distribution of the response variable given the explanatory variables is proposed.

Posted Content
TL;DR: This work introduces Conditional Flow Variational Autoencoders (CF-VAE) and proposes two novel regularization schemes which stabilizes training and deals with posterior collapse for stable training and better fit to the target data distribution.
Abstract: Prediction of future states of the environment and interacting agents is a key competence required for autonomous agents to operate successfully in the real world. Prior work for structured sequence prediction based on latent variable models imposes a uni-modal standard Gaussian prior on the latent variables. This induces a strong model bias which makes it challenging to fully capture the multi-modality of the distribution of the future states. In this work, we introduce Conditional Flow Variational Autoencoders (CF-VAE) using our novel conditional normalizing flow based prior to capture complex multi-modal conditional distributions for effective structured sequence prediction. Moreover, we propose two novel regularization schemes which stabilizes training and deals with posterior collapse for stable training and better fit to the target data distribution. Our experiments on three multi-modal structured sequence prediction datasets -- MNIST Sequences, Stanford Drone and HighD -- show that the proposed method obtains state of art results across different evaluation metrics.

25 Sep 2019
TL;DR: In this article, a conditional normalizing flow based prior is proposed to capture complex multi-modal conditional distributions for effective structured sequence prediction, and two novel regularization schemes are proposed to stabilize training and deal with posterior collapse for stable training and better fit to the target data distribution.
Abstract: Prediction of future states of the environment and interacting agents is a key competence required for autonomous agents to operate successfully in the real world. Prior work for structured sequence prediction based on latent variable models imposes a uni-modal standard Gaussian prior on the latent variables. This induces a strong model bias which makes it challenging to fully capture the multi-modality of the distribution of the future states. In this work, we introduce Conditional Flow Variational Autoencoders (CF-VAE) using our novel conditional normalizing flow based prior to capture complex multi-modal conditional distributions for effective structured sequence prediction. Moreover, we propose two novel regularization schemes which stabilizes training and deals with posterior collapse for stable training and better fit to the target data distribution. Our experiments on three multi-modal structured sequence prediction datasets -- MNIST Sequences, Stanford Drone and HighD -- show that the proposed method obtains state of art results across different evaluation metrics.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a new approach based on the bootstrap which overcomes these difficulties and analyzed and compared the performances of LSCV techniques with their bootstrap approach in finite samples.

Journal ArticleDOI
TL;DR: A probabilistic model for fatigue damage diagnosis and prognosis of an OSD through integrating the physical model with field inspections while accounting for the associated uncertainties, using the dynamic Bayesian network (DBN).

Journal ArticleDOI
TL;DR: In this paper, the authors use the deterministic information bottleneck (DIB) to perform geometric clustering by choosing cluster labels that preserve information about data point location on a smoothed data set.
Abstract: The information bottleneck (IB) approach to clustering takes a joint distribution P ( X , Y ) and maps the data X to cluster labels T , which retain maximal information about Y (Tishby, Pereira, & Bialek, 1999 ). This objective results in an algorithm that clusters data points based on the similarity of their conditional distributions P ( Y ∣ X ) . This is in contrast to classic geometric clustering algorithms such as k -means and gaussian mixture models (GMMs), which take a set of observed data points { x i } i = 1 : N and cluster them based on their geometric (typically Euclidean) distance from one another. Here, we show how to use the deterministic information bottleneck (DIB) (Strouse & Schwab, 2017 ), a variant of IB, to perform geometric clustering by choosing cluster labels that preserve information about data point location on a smoothed data set. We also introduce a novel intuitive method to choose the number of clusters via kinks in the information curve. We apply this approach to a variety of simple clustering problems, showing that DIB with our model selection procedure recovers the generative cluster labels. We also show that, in particular limits of our model parameters, clustering with DIB and IB is equivalent to k -means and EM fitting of a GMM with hard and soft assignments, respectively. Thus, clustering with (D)IB generalizes and provides an information-theoretic perspective on these classic algorithms.

Journal ArticleDOI
TL;DR: In this paper, a maximum likelihood approach is proposed to jointly estimate marginal conditional quantiles of multivariate response variables in a linear regression framework, where the authors consider a slight reparameterization of the multivariate asymmetric Laplace distribution proposed by Kotz et al. and exploit its location-scale mixture representation.

Journal ArticleDOI
TL;DR: In this article, the problem of testing a parameter change in general nonlinear integer-valued time series models where the conditional distribution of current observations is assumed to follow a one-parameter exponential family is considered.
Abstract: This study considers the problem of testing a parameter change in general nonlinear integer-valued time series models where the conditional distribution of current observations is assumed to follow a one-parameter exponential family. We consider score-, (standardized) residual-, and estimate-based CUSUM tests and show that their limiting null distributions take the form of the functions of Brownian bridges. Based on the obtained results, we then conduct a comparison study of the performance of CUSUM tests through the use of Monte Carlo simulations. Our findings demonstrate that the standardized residual-based CUSUM test largely outperforms the others.

Posted Content
12 Oct 2019
TL;DR: Two conformal methods based on conditional density estimators that do not depend on this type of assumption to obtain asymptotic conditional coverage are introduced: Dist-split and CD-split.
Abstract: Conformal methods create prediction bands that control average coverage under no assumptions besides i.i.d. data. Besides average coverage, one might also desire to control conditional coverage, that is, coverage for every new testing point. However, without strong assumptions, conditional coverage is unachievable. Given this limitation, the literature has focused on methods with asymptotical conditional coverage. In order to obtain this property, these methods require strong conditions on the dependence between the target variable and the features. We introduce two conformal methods based on conditional density estimators that do not depend on this type of assumption to obtain asymptotic conditional coverage: Dist-split and CD-split. While Dist-split asymptotically obtains optimal intervals, which are easier to interpret than general regions, CD-split obtains optimal size regions, which are smaller than intervals. CD-split also obtains local coverage by creating a data-driven partition of the feature space that scales to high-dimensional settings and by generating prediction bands locally on the partition elements. In a wide variety of simulated scenarios, our methods have a better control of conditional coverage and have smaller length than previously proposed methods.

Proceedings ArticleDOI
01 Apr 2019
TL;DR: Calibrate is designed to incorporate the prior knowledge about the noise and the true item frequencies as two probability distributions, respectively, via statistical inference and significantly outperforms state-of-the-art LDP algorithms for frequency estimation and heavy hitter identification.
Abstract: Estimating frequencies of certain items among a population is a basic step in data analytics, which enables more advanced data analytics (e.g., heavy hitter identification, frequent pattern mining), client software optimization, and detecting unwanted or malicious hijacking of user settings in browsers. Frequency estimation and heavy hitter identification with local differential privacy (LDP) protect user privacy as well as the data collector. Existing LDP algorithms cannot leverage 1) prior knowledge about the noise in the estimated item frequencies and 2) prior knowledge about the true item frequencies. As a result, they achieve suboptimal performance in practice. In this work, we aim to design LDP algorithms that can leverage such prior knowledge. Specifically, we design Calibrate to incorporate the prior knowledge via statistical inference. Calibrate can be appended to an existing LDP algorithm to reduce its estimation errors. We model the prior knowledge about the noise and the true item frequencies as two probability distributions, respectively. Given the two probability distributions and an estimated frequency of an item produced by an existing LDP algorithm, our Calibrate computes the conditional probability distribution of the item’s frequency and uses the mean of the conditional probability distribution as the calibrated frequency for the item. It is challenging to estimate the two probability distributions due to data sparsity. We address the challenge via integrating techniques from statistics and machine learning. Our empirical results on two real-world datasets show that Calibrate significantly outperforms state-of-the-art LDP algorithms for frequency estimation and heavy hitter identification.