Top 351 papers published in the topic of Conditional probability distribution in 2019

Journal Article•DOI•

[...]

José A. F. Machado¹, J.M.C. Santos Silva²•Institutions (2)

Universidade Nova de Lisboa¹, University of Surrey²

01 Nov 2019-Journal of Econometrics

TL;DR: The methods proposed are not meant to replace the well-established quantile regression estimator, but provide an additional tool that can allow the estimation of regression quantiles in settings where otherwise that would be difficult or even impossible.

...read moreread less

584 citations

Journal Article•DOI•

Physics-constrained deep learning for high-dimensional surrogate modeling and uncertainty quantification without labeled data

[...]

Yinhao Zhu¹, Nicholas Zabaras¹, Phaedon-Stelios Koutsourelakis², Paris Perdikaris³•Institutions (3)

University of Notre Dame¹, Technische Universität München², University of Pennsylvania³

01 Oct 2019-Journal of Computational Physics

TL;DR: This paper provides a methodology that incorporates the governing equations of the physical model in the loss/likelihood functions of the model predictive density and the reference conditional density as a minimization problem of the reverse Kullback-Leibler (KL) divergence.

...read moreread less

560 citations

Proceedings Article•

Attentive Neural Processes

[...]

Hyunjik Kim¹, Andriy Mnih¹, Jonathan Schwarz¹, Marta Garnelo¹, S. M. Ali Eslami¹, Dan Rosenbaum², Oriol Vinyals¹, Yee Whye Teh³ - Show less +4 more•Institutions (3)

Google¹, Hebrew University of Jerusalem², University of Oxford³

01 Jan 2019

TL;DR: This paper proposed to incorporate attention into NPs, allowing each input location to attend to relevant context points for the prediction, which greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.

...read moreread less

Abstract: Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.

...read moreread less

125 citations

Posted Content•

Diversity-Sensitive Conditional Generative Adversarial Networks

[...]

Dingdong Yang¹, Seunghoon Hong¹, Yunseok Jang², Tianchen Zhao¹, Honglak Lee¹ - Show less +1 more•Institutions (2)

University of Michigan¹, Seoul National University²

25 Jan 2019-arXiv: Learning

TL;DR: It is shown that simple addition of the proposed regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.

...read moreread less

Abstract: We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.

...read moreread less

118 citations

Posted Content•

Attentive Neural Processes

[...]

Hyunjik Kim¹, Andriy Mnih¹, Jonathan Schwarz¹, Marta Garnelo¹, S. M. Ali Eslami¹, Dan Rosenbaum², Oriol Vinyals¹, Yee Whye Teh³ - Show less +4 more•Institutions (3)

Google¹, Hebrew University of Jerusalem², University of Oxford³

17 Jan 2019-arXiv: Learning

TL;DR: Attention is incorporated into NPs, allowing each input location to attend to the relevant context points for the prediction, which greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.

...read moreread less

Abstract: Neural Processes (NPs) (Garnelo et al 2018a;b) approach regression by learning to map a context set of observed input-output pairs to a distribution over regression functions. Each function models the distribution of the output given an input, conditioned on the context. NPs have the benefit of fitting observed data efficiently with linear complexity in the number of context input-output pairs, and can learn a wide family of conditional distributions; they learn predictive distributions conditioned on context sets of arbitrary size. Nonetheless, we show that NPs suffer a fundamental drawback of underfitting, giving inaccurate predictions at the inputs of the observed data they condition on. We address this issue by incorporating attention into NPs, allowing each input location to attend to the relevant context points for the prediction. We show that this greatly improves the accuracy of predictions, results in noticeably faster training, and expands the range of functions that can be modelled.

...read moreread less

117 citations

Proceedings Article•DOI•

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow.

[...]

Xuezhe Ma¹, Chunting Zhou¹, Xian Li², Graham Neubig¹, Eduard Hovy¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Facebook²

05 Sep 2019

TL;DR: This paper turns to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables, achieving comparable performance with state-of-the-art non-autoregressive NMT models.

...read moreread less

Abstract: Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art non-autoregressive NMT models and almost constant decoding time w.r.t the sequence length.

...read moreread less

101 citations

Journal Article•DOI•

Distributed inference for quantile regression processes

[...]

Stanislav Volgushev, Shih-Kang Chao, Guang Cheng

01 Jun 2019-Annals of Statistics

TL;DR: This work proposes computationally efficient approaches to conducting inference in the distributed estimation setting and proves that the proposed procedure does not sacrifice any statistical inferential accuracy provided that the number of distributed computing units and quantile levels are chosen properly.

...read moreread less

Abstract: The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big data, we propose a two-step procedure: (i) estimate conditional quantile functions at different levels in a parallel computing environment; (ii) construct a conditional quantile regression process through projection based on these estimated quantile curves. Our general quantile regression framework covers both linear models with fixed or growing dimension and series approximation models. We prove that the proposed procedure does not sacrifice any statistical inferential accuracy provided that the number of distributed computing units and quantile levels are chosen properly. In particular, a sharp upper bound for the former and a sharp lower bound for the latter are derived to capture the minimal computational cost from a statistical perspective. As an important application, the statistical inference on conditional distribution functions is considered. Moreover, we propose computationally efficient approaches to conducting inference in the distributed estimation setting described above. Those approaches directly utilize the availability of estimators from subsamples and can be carried out at almost no additional computational cost. Simulations confirm our statistical inferential theory.

...read moreread less

94 citations

Posted Content•

Conditioning by adaptive sampling for robust design

[...]

David H. Brookes¹, Hahnbeom Park², Jennifer Listgarten³•Institutions (3)

University of California¹, University of Washington², University of California, Berkeley³

29 Jan 2019-arXiv: Learning

TL;DR: This work proposes a new method, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches.

...read moreread less

Abstract: We present a new method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest. For example, in protein design, one may wish to find the protein sequence that maximizes fluorescence. We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e.g., protein sequences) design space to a distribution over a property of interest (e.g. protein fluorescence). At first glance, this problem can be framed as one of optimizing the oracle(s) with respect to the input. However, many state-of-the-art predictive models, such as neural networks, are known to suffer from pathologies, especially for data far from the training distribution. Thus we need to modulate the optimization of the oracle inputs with prior knowledge about what makes `realistic' inputs (e.g., proteins that stably fold). Herein, we propose a new method to solve this problem, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches. Formally, our method achieves its success by using model-based adaptive sampling to estimate the conditional distribution of the input sequences given the desired properties.

...read moreread less

89 citations

Proceedings Article•

Diversity-Sensitive Conditional Generative Adversarial Networks

[...]

Dingdong Yang¹, Seunghoon Hong¹, Yunseok Jang², Tianchen Zhao¹, Honglak Lee¹ - Show less +1 more•Institutions (2)

University of Michigan¹, Seoul National University²

01 Jan 2019

TL;DR: In this paper, the authors propose to explicitly regularize the generator to produce diverse outputs depending on latent codes, which can be easily integrated into most conditional GAN objectives to control a balance between visual quality and diversity.

...read moreread less

Abstract: We propose a simple yet highly effective method that addresses the mode-collapse problem in the Conditional Generative Adversarial Network (cGAN). Although conditional distributions are multi-modal (i.e., having many modes) in practice, most cGAN approaches tend to learn an overly simplified distribution where an input is always mapped to a single output regardless of variations in latent code. To address such issue, we propose to explicitly regularize the generator to produce diverse outputs depending on latent codes. The proposed regularization is simple, general, and can be easily integrated into most conditional GAN objectives. Additionally, explicit regularization on generator allows our method to control a balance between visual quality and diversity. We demonstrate the effectiveness of our method on three conditional generation tasks: image-to-image translation, image inpainting, and future video prediction. We show that simple addition of our regularization to existing models leads to surprisingly diverse generations, substantially outperforming the previous approaches for multi-modal conditional generation specifically designed in each individual task.

...read moreread less

76 citations

Journal Article•DOI•

Transfer Learning with Dynamic Distribution Adaptation.

[...]

Jindong Wang¹, Yiqiang Chen², Wenjie Feng², Han Yu³, Meiyu Huang⁴, Qiang Yang⁵ - Show less +2 more•Institutions (5)

Microsoft¹, Chinese Academy of Sciences², Nanyang Technological University³, China Academy of Space Technology⁴, Hong Kong University of Science and Technology⁵

17 Sep 2019-arXiv: Learning

TL;DR: It is shown that marginal and conditional distributions have different contributions to the domain divergence, and the proposed Dynamic Distribution Adaptation (DDA) is able to provide good quantitative evaluation of their relative importance, which leads to better performance.

...read moreread less

Abstract: Transfer learning aims to learn robust classifiers for the target domain by leveraging knowledge from a source domain. Since the source and the target domains are usually from different distributions, existing methods mainly focus on adapting the cross-domain marginal or conditional distributions. However, in real applications, the marginal and conditional distributions usually have different contributions to the domain discrepancy. Existing methods fail to quantitatively evaluate the different importance of these two distributions, which will result in unsatisfactory transfer performance. In this paper, we propose a novel concept called Dynamic Distribution Adaptation (DDA), which is capable of quantitatively evaluating the relative importance of each distribution. DDA can be easily incorporated into the framework of structural risk minimization to solve transfer learning problems. On the basis of DDA, we propose two novel learning algorithms: (1) Manifold Dynamic Distribution Adaptation (MDDA) for traditional transfer learning, and (2) Dynamic Distribution Adaptation Network (DDAN) for deep transfer learning. Extensive experiments demonstrate that MDDA and DDAN significantly improve the transfer learning performance and setup a strong baseline over the latest deep and adversarial methods on digits recognition, sentiment analysis, and image classification. More importantly, it is shown that marginal and conditional distributions have different contributions to the domain divergence, and our DDA is able to provide good quantitative evaluation of their relative importance which leads to better performance. We believe this observation can be helpful for future research in transfer learning.

...read moreread less

73 citations

Proceedings Article•DOI•

[...]

Aditya Arun¹, C. V. Jawahar¹, M. Pawan Kumar²•Institutions (2)

International Institute of Information Technology, Hyderabad¹, University of Oxford²

15 Jun 2019

TL;DR: In this article, a dissimilarity coefficient based probabilistic learning objective is proposed to minimize the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object.

...read moreread less

Abstract: We consider the problem of weakly supervised object detection, where the training samples are annotated using only image-level labels that indicate the presence or absence of an object category. In order to model the uncertainty in the location of the objects, we employ a dissimilarity coefficient based probabilistic learning objective. The learning objective minimizes the difference between an annotation agnostic prediction distribution and an annotation aware conditional distribution. The main computational challenge is the complex nature of the conditional distribution, which consists of terms over hundreds or thousands of variables. The complexity of the conditional distribution rules out the possibility of explicitly modeling it. Instead, we exploit the fact that deep learning frameworks rely on stochastic optimization. This allows us to use a state of the art discrete generative model that can provide annotation consistent samples from the conditional distribution. Extensive experiments on PASCAL VOC 2007 and 2012 data sets demonstrate the efficacy of our proposed approach.

...read moreread less

Book•

The Art Of Probability

[...]

Richard W. Hamming¹•Institutions (1)

Naval Postgraduate School¹

07 May 2019

TL;DR: In this article, Bernoulli's model is used for estimating the probability of a single event model with respect to a set of variables, including the mean and expected value of the variables.

...read moreread less

Abstract: Probability * Introduction * Models in General * The Frequency Approach Rejected * The Single Event Model * Symmetry as the Measure of Probability * Independence * Subsets of a Sample Space * Conditional Probability * Randomness * Critique of the Model Some Mathematical Tools * Permutations * Combinations * The Binomial DistributionBernoulli Trials * Random Variables, Mean and the Expected Value * The Variance * The Generating Function * The Weak Law of Large Numbers * The Statistical Assignment of Probability * The Representation of Information Methods for Solving Problems * The Five Methods * The Total Sample Space and Fair Games * Enumeration * Historical Approach * Recursive Approach * Recursive Approach * The Method of Random Variables * Critique of the Notion of a Fair Game * Bernoulli Evaluation * Robustness * InclusionExclusion Principle Countably Infinite Sample Spaces * Introduction * Bernoulli Trials * On the Strategy to be Adopted * State Diagrams * Generating Functions of State Diagrams * Expanding a Rational Generating Function * Checking the Solution * Paradoxes Continuous Sample Spaces * A Philosophy of the Real Number System * Some First Examples * Some Paradoxes * The Normal Distribution * The Distribution of Numbers * Convergence to the Reciprocal Distribution * Random Times * Dead Times * Poisson Distribution in Time * Queing Theorem * Birth and Death Systems * Summary Uniform Probability Assignments Maximum Entropy * What is Entropy? * Shannons Entropy * Some Mathematical Properties of the Entropy Function * Some Simple Applications * The Maximum Entropy Principle Models of Probability * General Remarks * Maximum Likelihood in a Binary Choice * Von Mises Probability * The Mathematical Approach * The Statistical Approach * When The Mean Does Not Exist * Probability as an Extension of Logic * Di Finetti * Subjective Probability * Fuzzy Probability * Probability in Science * Complex Probability Some Limit Theorems * The Biomial Approximation for the case p=1/2 * Approximation by the Normal Distribution * Another Derivation of the Normal Distribution * Random Times * The Zipf Distribution * Summary An Essay on Simulation

...read moreread less

Posted Content•

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

[...]

Xuezhe Ma¹, Chunting Zhou¹, Xian Li², Graham Neubig¹, Eduard Hovy¹ - Show less +1 more•Institutions (2)

Carnegie Mellon University¹, Facebook²

05 Sep 2019-arXiv: Computation and Language

TL;DR: This paper proposed a generative flow-based model for non-autoregressive sequence generation using latent variable models and achieved state-of-the-art performance on NMT NMT benchmark datasets.

...read moreread less

Abstract: Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased efficiency through parallel processing on hardware such as GPUs. However, directly modeling the joint distribution of all tokens simultaneously is challenging, and even with increasingly complex model structures accuracy lags significantly behind autoregressive models. In this paper, we propose a simple, efficient, and effective model for non-autoregressive sequence generation using latent variable models. Specifically, we turn to generative flow, an elegant technique to model complex distributions using neural networks, and design several layers of flow tailored for modeling the conditional density of sequential latent variables. We evaluate this model on three neural machine translation (NMT) benchmark datasets, achieving comparable performance with state-of-the-art non-autoregressive NMT models and almost constant decoding time w.r.t the sequence length.

...read moreread less

Posted Content•

Distributional conformal prediction

[...]

Victor Chernozhukov¹, Kaspar Wüthrich², Yinchu Zhu³•Institutions (3)

Massachusetts Institute of Technology¹, University of California, San Diego², Brandeis University³

17 Sep 2019-arXiv: Econometrics

TL;DR: This work develops a prediction method that works in conjunction with many powerful classical methods as well as modern high-dimensional methods for estimating conditional distributions and establishes approximate conditional validity under consistent estimation and approximate unconditional validity under model misspecification, under overfitting, and with time series data.

...read moreread less

Abstract: We propose a robust method for constructing conditionally valid prediction intervals based on models for conditional distributions such as quantile and distribution regression. Our approach can be applied to important prediction problems including cross-sectional prediction, k-step-ahead forecasts, synthetic controls and counterfactual prediction, and individual treatment effects prediction. Our method exploits the probability integral transform and relies on permuting estimated ranks. Unlike regression residuals, ranks are independent of the predictors, allowing us to construct conditionally valid prediction intervals under heteroskedasticity. We establish approximate conditional validity under consistent estimation and provide approximate unconditional validity under model misspecification, overfitting, and with time series data. We also propose a simple "shape" adjustment of our baseline method that yields optimal prediction intervals.

...read moreread less

Proceedings Article•

Conditioning by adaptive sampling for robust design

[...]

David H. Brookes¹, Hahnbeom Park², Jennifer Listgarten³•Institutions (3)

University of California¹, University of Washington², University of California, Berkeley³

24 May 2019

TL;DR: In this paper, the authors propose a new method for design problems where the goal is to maximize or specify the value of one or more properties of interest (e.g., protein fluorescence).

...read moreread less

Abstract: We present a new method for design problems wherein the goal is to maximize or specify the value of one or more properties of interest. For example, in protein design, one may wish to find the protein sequence that maximizes fluorescence. We assume access to one or more, potentially black box, stochastic "oracle" predictive functions, each of which maps from input (e.g., protein sequences) design space to a distribution over a property of interest (e.g. protein fluorescence). At first glance, this problem can be framed as one of optimizing the oracle(s) with respect to the input. However, many state-of-the-art predictive models, such as neural networks, are known to suffer from pathologies, especially for data far from the training distribution. Thus we need to modulate the optimization of the oracle inputs with prior knowledge about what makes `realistic' inputs (e.g., proteins that stably fold). Herein, we propose a new method to solve this problem, Conditioning by Adaptive Sampling, which yields state-of-the-art results on a protein fluorescence problem, as compared to other recently published approaches. Formally, our method achieves its success by using model-based adaptive sampling to estimate the conditional distribution of the input sequences given the desired properties.

...read moreread less

Journal Article•DOI•

Markov-Switching GARCH Models in R: The MSGARCH Package

[...]

David Ardia¹, Keven Bluteau², Keven Bluteau³, Kris Boudt, Leopoldo Catania⁴, Denis-Alexandre Trottier⁵ - Show less +2 more•Institutions (5)

HEC Montréal¹, University of Neuchâtel², Université libre de Bruxelles³, Aarhus University⁴, Laval University⁵

31 Oct 2019-Journal of Statistical Software

TL;DR: The package MSGARCH as discussed by the authors implements Markov-switching GARCH (generalized autoregressive conditional heteroscedasticity) models in R with efficient C++ object-oriented programming.

...read moreread less

Abstract: We describe the package MSGARCH, which implements Markov-switching GARCH (generalized autoregressive conditional heteroscedasticity) models in R with efficient C++ object-oriented programming. Markov-switching GARCH models have become popular methods to account for regime changes in the conditional variance dynamics of time series. The package MSGARCH allows the user to perform simulations as well as maximum likelihood and Bayesian Markov chain Monte Carlo estimations of a very large class of Markov-switching GARCH-type models. The package also provides methods to make single-step and multi-step ahead forecasts of the complete conditional density of the variable of interest. Risk management tools to estimate conditional volatility, value-at-risk, and expected-shortfall are also available. We illustrate the broad functionality of the MSGARCH package using exchange rate and stock market return data.

...read moreread less

Journal Article•DOI•

A Deeper Look at Facial Expression Dataset Bias

[...]

Shan Li¹, Weihong Deng¹•Institutions (1)

Beijing University of Posts and Telecommunications¹

25 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel deep Emotion-Conditional Adaption Network (ECAN) is proposed to learn domain-invariant and discriminative feature representations, which can match both the marginal and the conditional distributions across domains simultaneously.

...read moreread less

Abstract: Datasets play an important role in the progress of facial expression recognition algorithms, but they may suffer from obvious biases caused by different cultures and collection conditions. To look deeper into this bias, we first conduct comprehensive experiments on dataset recognition and crossdataset generalization tasks, and for the first time explore the intrinsic causes of the dataset discrepancy. The results quantitatively verify that current datasets have a strong buildin bias and corresponding analyses indicate that the conditional probability distributions between source and target datasets are different. However, previous researches are mainly based on shallow features with limited discriminative ability under the assumption that the conditional distribution remains unchanged across domains. To address these issues, we further propose a novel deep Emotion-Conditional Adaption Network (ECAN) to learn domain-invariant and discriminative feature representations, which can match both the marginal and the conditional distributions across domains simultaneously. In addition, the largely ignored expression class distribution bias is also addressed by a learnable re-weighting parameter, so that the training and testing domains can share similar class distribution. Extensive cross-database experiments on both lab-controlled datasets (CK+, JAFFE, MMI and Oulu-CASIA) and real-world databases (AffectNet, FER2013, RAF-DB 2.0 and SFEW 2.0) demonstrate that our ECAN can yield competitive performances across various facial expression transfer tasks and outperform the state-of-theart methods.

...read moreread less

Journal Article•DOI•

Conditional Density Forecast of Electricity Price Based on Ensemble ELM and Logistic EMOS

[...]

Songjian Chai¹, Zhao Xu¹, Youwei Jia¹•Institutions (1)

Hong Kong Polytechnic University¹

01 May 2019-IEEE Transactions on Smart Grid

TL;DR: The proposed hybrid approach effectively quantifies the uncertainty involved in extreme learning machine network by applying an ensemble structure and a logistic distribution-based ensemble model output statistics technique to provide superior full distributional forecasting skill over the existing approaches.

...read moreread less

Abstract: In recent years, probabilistic forecast of electricity price has become of particular interests to market participants as it can effectively model the uncertainties due to competitive market behaviors. Decision makers heavily rely on such forecast to formulate optimal strategies with minimal risk and maximum profits to deal with stochasticity in market and system operation. Different from the widely used volatility models with least square or maximum likelihood techniques in probabilistic forecast of prices, this paper proposes a reliable continuous ranked probability score-oriented predictive density construction strategy for day-ahead electricity prices. The proposed method effectively quantifies the uncertainty involved in extreme learning machine network by applying an ensemble structure and a logistic distribution-based ensemble model output statistics technique. Moreover, an efficient covariance structure directly determined by the empirical correlations of observed probabilistic forecast series is developed to capture the essential temporal interdependence thus to facilitate the operational scenarios’ generation. Through validating on the real day-ahead market in Sweden, the proposed hybrid approach proves to provide superior full distributional forecasting skill over the existing approaches.

...read moreread less

Journal Article•DOI•

Deep transfer network for rotating machine fault analysis

[...]

Weiwei Qian¹, Shunming Li¹, Xingxing Jiang²•Institutions (2)

Nanjing University of Aeronautics and Astronautics¹, Soochow University (Suzhou)²

01 Dec 2019-Pattern Recognition

TL;DR: A novel distribution discrepancy evaluating method called auto-balanced high-order Kullback–Leibler (AHKL) divergence is proposed, which can evaluate both the first and higher-order moment discrepancies and adapt the weights between them dimensionally and automatically, and smooth conditional distribution alignment (SCDA) is developed.

...read moreread less

Proceedings Article•

Generalized No Free Lunch Theorem for Adversarial Robustness.

[...]

Elvis Dohmatob¹•Institutions (1)

French Institute for Research in Computer Science and Automation¹

24 May 2019

TL;DR: The Strong "No Free Lunch" Theorem as mentioned in this paperawzi et al. showed that any classifier can be adversarially fooled with high probability once the perturbations are slightly greater than the natural noise level in the problem.

...read moreread less

Abstract: This manuscript presents some new impossibility results on adversarial robustness in machine learning, a very important yet largely open problem. We show that if conditioned on a class label the data distribution satisfies the $W_2$ Talagrand transportation-cost inequality (for example, this condition is satisfied if the conditional distribution has density which is log-concave; is the uniform measure on a compact Riemannian manifold with positive Ricci curvature, any classifier can be adversarially fooled with high probability once the perturbations are slightly greater than the natural noise level in the problem. We call this result The Strong "No Free Lunch" Theorem as some recent results (Tsipras et al. 2018, Fawzi et al. 2018, etc.) on the subject can be immediately recovered as very particular cases. Our theoretical bounds are demonstrated on both simulated and real data (MNIST). We conclude the manuscript with some speculation on possible future research directions.

...read moreread less

Journal Article•DOI•

Prediction based on conditional distributions of vine copulas

[...]

Bo Chang¹, Harry Joe¹•Institutions (1)

University of British Columbia¹

01 Nov 2019-Computational Statistics & Data Analysis

TL;DR: A vine copula regression method that uses regular vines and handles mixed continuous and discrete variables and can efficiently compute the conditional distribution of the response variable given the explanatory variables is proposed.

...read moreread less

Posted Content•

Conditional Flow Variational Autoencoders for Structured Sequence Prediction

[...]

Apratim Bhattacharyya¹, Michael Hanselmann, Mario Fritz, Bernt Schiele¹, Christoph-Nikolas Straehle - Show less +1 more•Institutions (1)

Max Planck Society¹

24 Aug 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work introduces Conditional Flow Variational Autoencoders (CF-VAE) and proposes two novel regularization schemes which stabilizes training and deals with posterior collapse for stable training and better fit to the target data distribution.

...read moreread less

Abstract: Prediction of future states of the environment and interacting agents is a key competence required for autonomous agents to operate successfully in the real world. Prior work for structured sequence prediction based on latent variable models imposes a uni-modal standard Gaussian prior on the latent variables. This induces a strong model bias which makes it challenging to fully capture the multi-modality of the distribution of the future states. In this work, we introduce Conditional Flow Variational Autoencoders (CF-VAE) using our novel conditional normalizing flow based prior to capture complex multi-modal conditional distributions for effective structured sequence prediction. Moreover, we propose two novel regularization schemes which stabilizes training and deals with posterior collapse for stable training and better fit to the target data distribution. Our experiments on three multi-modal structured sequence prediction datasets -- MNIST Sequences, Stanford Drone and HighD -- show that the proposed method obtains state of art results across different evaluation metrics.

...read moreread less

Conditional Flow Variational Autoencoders for Structured Sequence Prediction

[...]

Apratim Bhattacharyya¹, Michael Hanselmann, Mario Fritz, Bernt Schiele¹, Christoph-Nikolas Straehle - Show less +1 more•Institutions (1)

Max Planck Society¹

25 Sep 2019

TL;DR: In this article, a conditional normalizing flow based prior is proposed to capture complex multi-modal conditional distributions for effective structured sequence prediction, and two novel regularization schemes are proposed to stabilize training and deal with posterior collapse for stable training and better fit to the target data distribution.

...read moreread less

Abstract: Prediction of future states of the environment and interacting agents is a key competence required for autonomous agents to operate successfully in the real world. Prior work for structured sequence prediction based on latent variable models imposes a uni-modal standard Gaussian prior on the latent variables. This induces a strong model bias which makes it challenging to fully capture the multi-modality of the distribution of the future states. In this work, we introduce Conditional Flow Variational Autoencoders (CF-VAE) using our novel conditional normalizing flow based prior to capture complex multi-modal conditional distributions for effective structured sequence prediction. Moreover, we propose two novel regularization schemes which stabilizes training and deals with posterior collapse for stable training and better fit to the target data distribution. Our experiments on three multi-modal structured sequence prediction datasets -- MNIST Sequences, Stanford Drone and HighD -- show that the proposed method obtains state of art results across different evaluation metrics.

...read moreread less

Journal Article•DOI•

A bootstrap approach for bandwidth selection in estimating conditional efficiency measures

[...]

Luiza Badin¹, Cinzia Daraio², Léopold Simar², Léopold Simar³•Institutions (3)

Bucharest University of Economic Studies¹, Sapienza University of Rome², Université catholique de Louvain³

01 Sep 2019-European Journal of Operational Research

TL;DR: In this article, the authors proposed a new approach based on the bootstrap which overcomes these difficulties and analyzed and compared the performances of LSCV techniques with their bootstrap approach in finite samples.

...read moreread less

Journal Article•DOI•

Fatigue damage assessment of orthotropic steel deck using dynamic Bayesian networks

[...]

Jin Zhu¹, Wei Zhang², Xingwang Li²•Institutions (2)

Southwest Jiaotong University¹, University of Connecticut²

01 Jan 2019-International Journal of Fatigue

TL;DR: A probabilistic model for fatigue damage diagnosis and prognosis of an OSD through integrating the physical model with field inspections while accounting for the associated uncertainties, using the dynamic Bayesian network (DBN).

...read moreread less

Journal Article•DOI•

The Information Bottleneck and Geometric Clustering

[...]

DJ Strouse¹, David J. Schwab²•Institutions (2)

Princeton University¹, The Graduate Center, CUNY²

15 Feb 2019-Neural Computation

TL;DR: In this paper, the authors use the deterministic information bottleneck (DIB) to perform geometric clustering by choosing cluster labels that preserve information about data point location on a smoothed data set.

...read moreread less

Abstract: The information bottleneck (IB) approach to clustering takes a joint distribution P ( X , Y ) and maps the data X to cluster labels T , which retain maximal information about Y (Tishby, Pereira, & Bialek, 1999 ). This objective results in an algorithm that clusters data points based on the similarity of their conditional distributions P ( Y ∣ X ) . This is in contrast to classic geometric clustering algorithms such as k -means and gaussian mixture models (GMMs), which take a set of observed data points { x i } i = 1 : N and cluster them based on their geometric (typically Euclidean) distance from one another. Here, we show how to use the deterministic information bottleneck (DIB) (Strouse & Schwab, 2017 ), a variant of IB, to perform geometric clustering by choosing cluster labels that preserve information about data point location on a smoothed data set. We also introduce a novel intuitive method to choose the number of clusters via kinks in the information curve. We apply this approach to a variety of simple clustering problems, showing that DIB with our model selection procedure recovers the generative cluster labels. We also show that, in particular limits of our model parameters, clustering with DIB and IB is equivalent to k -means and EM fitting of a GMM with hard and soft assignments, respectively. Thus, clustering with (D)IB generalizes and provides an information-theoretic perspective on these classic algorithms.

...read moreread less

Journal Article•DOI•

Joint estimation of conditional quantiles in multivariate linear regression models with an application to financial distress

[...]

Lea Petrella¹, Valentina Raponi², Valentina Raponi¹•Institutions (2)

Sapienza University of Rome¹, Imperial College London²

01 Sep 2019-Journal of Multivariate Analysis

TL;DR: In this paper, a maximum likelihood approach is proposed to jointly estimate marginal conditional quantiles of multivariate response variables in a linear regression framework, where the authors consider a slight reparameterization of the multivariate asymmetric Laplace distribution proposed by Kotz et al. and exploit its location-scale mixture representation.

...read moreread less

Journal Article•DOI•

CUSUM test for general nonlinear integer-valued GARCH models: comparison study

[...]

Youngmi Lee¹, Sangyeol Lee¹•Institutions (1)

Seoul National University¹

01 Oct 2019-Annals of the Institute of Statistical Mathematics

TL;DR: In this article, the problem of testing a parameter change in general nonlinear integer-valued time series models where the conditional distribution of current observations is assumed to follow a one-parameter exponential family is considered.

...read moreread less

Abstract: This study considers the problem of testing a parameter change in general nonlinear integer-valued time series models where the conditional distribution of current observations is assumed to follow a one-parameter exponential family. We consider score-, (standardized) residual-, and estimate-based CUSUM tests and show that their limiting null distributions take the form of the functions of Brownian bridges. Based on the obtained results, we then conduct a comparison study of the performance of CUSUM tests through the use of Monte Carlo simulations. Our findings demonstrate that the standardized residual-based CUSUM test largely outperforms the others.

...read moreread less

Posted Content•

Distribution-free conditional predictive bands using density estimators.

[...]

Rafael Izbicki, Gilson T. Shimizu, Rafael Bassi Stern

12 Oct 2019

TL;DR: Two conformal methods based on conditional density estimators that do not depend on this type of assumption to obtain asymptotic conditional coverage are introduced: Dist-split and CD-split.

...read moreread less

Abstract: Conformal methods create prediction bands that control average coverage under no assumptions besides i.i.d. data. Besides average coverage, one might also desire to control conditional coverage, that is, coverage for every new testing point. However, without strong assumptions, conditional coverage is unachievable. Given this limitation, the literature has focused on methods with asymptotical conditional coverage. In order to obtain this property, these methods require strong conditions on the dependence between the target variable and the features. We introduce two conformal methods based on conditional density estimators that do not depend on this type of assumption to obtain asymptotic conditional coverage: Dist-split and CD-split. While Dist-split asymptotically obtains optimal intervals, which are easier to interpret than general regions, CD-split obtains optimal size regions, which are smaller than intervals. CD-split also obtains local coverage by creating a data-driven partition of the feature space that scales to high-dimensional settings and by generating prediction bands locally on the partition elements. In a wide variety of simulated scenarios, our methods have a better control of conditional coverage and have smaller length than previously proposed methods.

...read moreread less

Proceedings Article•DOI•

Calibrate: Frequency Estimation and Heavy Hitter Identification with Local Differential Privacy via Incorporating Prior Knowledge

[...]

Jinyuan Jia¹, Neil Zhenqiang Gong¹•Institutions (1)

Iowa State University¹

01 Apr 2019

TL;DR: Calibrate is designed to incorporate the prior knowledge about the noise and the true item frequencies as two probability distributions, respectively, via statistical inference and significantly outperforms state-of-the-art LDP algorithms for frequency estimation and heavy hitter identification.

...read moreread less

Abstract: Estimating frequencies of certain items among a population is a basic step in data analytics, which enables more advanced data analytics (e.g., heavy hitter identification, frequent pattern mining), client software optimization, and detecting unwanted or malicious hijacking of user settings in browsers. Frequency estimation and heavy hitter identification with local differential privacy (LDP) protect user privacy as well as the data collector. Existing LDP algorithms cannot leverage 1) prior knowledge about the noise in the estimated item frequencies and 2) prior knowledge about the true item frequencies. As a result, they achieve suboptimal performance in practice. In this work, we aim to design LDP algorithms that can leverage such prior knowledge. Specifically, we design Calibrate to incorporate the prior knowledge via statistical inference. Calibrate can be appended to an existing LDP algorithm to reduce its estimation errors. We model the prior knowledge about the noise and the true item frequencies as two probability distributions, respectively. Given the two probability distributions and an estimated frequency of an item produced by an existing LDP algorithm, our Calibrate computes the conditional probability distribution of the item’s frequency and uses the mean of the conditional probability distribution as the calibrated frequency for the item. It is challenging to estimate the two probability distributions due to data sparsity. We address the challenge via integrating techniques from statistics and machine learning. Our empirical results on two real-world datasets show that Calibrate significantly outperforms state-of-the-art LDP algorithms for frequency estimation and heavy hitter identification.

...read moreread less

Showing papers on "Conditional probability distribution published in 2019"