Showing papers on "Probability mass function published in 2020"

PDF

Open Access

Proceedings Article•

The Curious Case of Neural Text Degeneration

[...]

Ari Holtzman¹, Jan Buys², Leo Du², Maxwell Forbes², Yejin Choi¹ - Show less +1 more•Institutions (2)

Allen Institute for Artificial Intelligence¹, University of Washington²

30 Apr 2020

TL;DR: By sampling text from the dynamic nucleus of the probability distribution, which allows for diversity while effectively truncating the less reliable tail of the distribution, the resulting text better demonstrates the quality of human text, yielding enhanced diversity without sacrificing fluency and coherence.

...read moreread less

Abstract: Despite considerable advances in neural language modeling, it remains an open question what the best decoding strategy is for text generation from a language model (e.g. to generate a story). The counter-intuitive empirical observation is that even though the use of likelihood as training objective leads to high quality models for a broad range of language understanding tasks, maximization-based decoding methods such as beam search lead to degeneration — output text that is bland, incoherent, or gets stuck in repetitive loops. To address this we propose Nucleus Sampling, a simple but effective method to draw considerably higher quality text out of neural language models. Our approach avoids text degeneration by truncating the unreliable tail of the probability distribution, sampling from the dynamic nucleus of tokens containing the vast majority of the probability mass. To properly examine current maximization-based and stochastic decoding methods, we compare generations from each of these methods to the distribution of human text along several axes such as likelihood, diversity, and repetition. Our results show that (1) maximization is an inappropriate decoding objective for open-ended text generation, (2) the probability distributions of the best current language models have an unreliable tail which needs to be truncated during generation and (3) Nucleus Sampling is the best decoding strategy for generating long-form text that is both high-quality — as measured by human evaluation — and as diverse as human-written text.

...read moreread less

682 citations

Journal Article•DOI•

Discontinuous Hamiltonian Monte Carlo for discrete parameters and discontinuous likelihoods

[...]

Akihiko Nishimura¹, David B. Dunson², Jianfeng Lu²•Institutions (2)

University of California, Los Angeles¹, Duke University²

01 Jun 2020-Biometrika

TL;DR: An extension of Hamiltonian Monte Carlo that can efficiently explore target distributions with discontinuous densities and enables efficient sampling from ordinal parameters though embedding of probability mass functions into continuous spaces is presented.

...read moreread less

Abstract: Hamiltonian Monte Carlo has emerged as a standard tool for posterior computation. In this article, we present an extension that can efficiently explore target distributions with discontinuous densities. Our extension in particular enables efficient sampling from ordinal parameters though embedding of probability mass functions into continuous spaces. We motivate our approach through a theory of discontinuous Hamiltonian dynamics and develop a corresponding numerical solver. The proposed solver is the first of its kind, with a remarkable ability to exactly preserve the Hamiltonian. We apply our algorithm to challenging posterior inference problems to demonstrate its wide applicability and competitive performance.

...read moreread less

69 citations

Posted Content•

Generalized Energy Based Models

[...]

Michael Arbel¹, Liang Zhou¹, Arthur Gretton¹•Institutions (1)

University College London¹

10 Mar 2020-arXiv: Machine Learning

TL;DR: The GEBM samples on image-generation tasks are of much better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity.

...read moreread less

Abstract: We introduce the Generalized Energy Based Model (GEBM) for generative modelling. These models combine two trained components: a base distribution (generally an implicit model), which can learn the support of data with low intrinsic dimension in a high dimensional space; and an energy function, to refine the probability mass on the learned support. Both the energy function and base jointly constitute the final model, unlike GANs, which retain only the base distribution (the "generator"). GEBMs are trained by alternating between learning the energy and the base. We show that both training stages are well-defined: the energy is learned by maximising a generalized likelihood, and the resulting energy-based loss provides informative gradients for learning the base. Samples from the posterior on the latent space of the trained model can be obtained via MCMC, thus finding regions in this space that produce better quality samples. Empirically, the GEBM samples on image-generation tasks are of much better quality than those from the learned generator alone, indicating that all else being equal, the GEBM will outperform a GAN of the same complexity. When using normalizing flows as base measures, GEBMs succeed on density modelling tasks, returning comparable performance to direct maximum likelihood of the same networks.

...read moreread less

44 citations

Journal Article•DOI•

A new two-parameter exponentiated discrete Lindley distribution: properties, estimation and applications

[...]

M. El-Morshedy¹, M. S. Eliwa¹, Heba F. Nagy¹•Institutions (1)

Mansoura University¹

25 Jan 2020-Journal of Applied Statistics

TL;DR: In this paper, a new two-parameter exponentiated discrete Lindley distribution is introduced and a wide range of its structural properties are investigated, including the shape of the probability mass.

...read moreread less

Abstract: This paper introduces a new two-parameter exponentiated discrete Lindley distribution. A wide range of its structural properties are investigated. This includes the shape of the probability mass fu...

...read moreread less

42 citations

Journal Article•DOI•

Distributionally Robust CVaR Constraints for Power Flow Optimization

[...]

Rabih A. Jabr¹•Institutions (1)

American University of Beirut¹

05 Feb 2020-IEEE Transactions on Power Systems

TL;DR: The proposed distributionally robust framework for solving the OPF problem benefits from the observed correlation amongst the uncertain parameters to mitigate branch flow limit violations, and yet maintain an acceptable worst-case expected operational cost as compared to RO and SO solutions.

...read moreread less

Abstract: Recent research on optimal power flow (OPF) in networks with renewable power involves optimizing both first and second stage variables that adjust the decision once the uncertainty is revealed. In general, only partial information on the underlying probability distribution of renewable power production is available. This paper considers a distributionally robust framework for solving the OPF problem. The formulation stipulates a probability mass function of wind power production, whose probabilities and scenario locations vary in a box of ambiguity with bounds that can be tuned based on historical data. Distributionally robust optimization (DRO) is used to derive new conditional value-at-risk (CVaR) constraints that limit the frequency and severity of branch flow limit violations whenever the renewable power generation deviates from its forecast. Numerical results are reported on networks with up to 2736 nodes and contrasted with classical robust optimization (RO) and stochastic optimization (SO) solutions. The results show an advantage to adopting the proposed DRO for load flow control. In particular, the solution benefits from the observed correlation amongst the uncertain parameters to mitigate branch flow limit violations, and yet maintain an acceptable worst-case expected operational cost as compared to RO and SO solutions.

...read moreread less

40 citations

Journal Article•DOI•

On the Dynamics of Classification Measures for Imbalanced and Streaming Data

[...]

Dariusz Brzezinski¹, Jerzy Stefanowski¹, Robert Susmaga¹, Izabela Szczęch¹•Institutions (1)

Poznań University of Technology¹

01 Aug 2020-IEEE Transactions on Neural Networks

TL;DR: A histogram-based normalization method is put forward that provides a unified, probabilistic interpretation of any measure over data sets with different class distributions and shows a direct connection between class ratio changes and certain types of concept drift, which could be influential in designing new types of classifiers and drift detectors for imbalanced data streams.

...read moreread less

Abstract: As each imbalanced classification problem comes with its own set of challenges, the measure used to evaluate classifiers must be individually selected. To help researchers make this decision in an informed manner, experimental and theoretical investigations compare general properties of measures. However, existing studies do not analyze changes in measure behavior imposed by different imbalance ratios. Moreover, several characteristics of imbalanced data streams, such as the effect of dynamically changing class proportions, have not been thoroughly investigated from the perspective of different metrics. In this paper, we study measure dynamics by analyzing changes of measure values, distributions, and gradients with diverging class proportions. For this purpose, we visualize measure probability mass functions and gradients. In addition, we put forward a histogram-based normalization method that provides a unified, probabilistic interpretation of any measure over data sets with different class distributions. The results of analyzing eight popular classification measures show that the effect class proportions have on each measure is different and should be taken into account when evaluating classifiers. Apart from highlighting imbalance-related properties of each measure, our study shows a direct connection between class ratio changes and certain types of concept drift, which could be influential in designing new types of classifiers and drift detectors for imbalanced data streams.

...read moreread less

37 citations

Journal Article•DOI•

Stability analysis and stabilization of discrete-time non-homogeneous semi-Markov jump linear systems: A polytopic approach

[...]

Zepeng Ning¹, Zepeng Ning², Lixian Zhang², Ali Mesbah¹, Patrizio Colaneri³ - Show less +1 more•Institutions (3)

University of California, Berkeley¹, Harbin Institute of Technology², Polytechnic University of Milan³

01 Oct 2020-Automatica

TL;DR: The results reveal the significance of considering the inhomogeneity of S-MJLSs and the importance of the constructed Lyapunov function, which depends on both the current mode and the elapsed time in the current modes.

...read moreread less

35 citations

Journal Article•DOI•

A New Discrete Analog of the Continuous Lindley Distribution, with Reliability Applications

[...]

Abdulhakim A. Al-Babtain¹, Abdul-Hadi N. Ahmed², Ahmed Z. Afify³•Institutions (3)

King Saud University¹, Cairo University², Banha University³

28 May 2020-Entropy

TL;DR: A new probability mass function is proposed by creating a natural discrete analog to the continuous Lindley distribution as a mixture of geometric and negative binomial distributions that has many interesting properties.

...read moreread less

Abstract: In this paper, we propose and study a new probability mass function by creating a natural discrete analog to the continuous Lindley distribution as a mixture of geometric and negative binomial distributions. The new distribution has many interesting properties that make it superior to many other discrete distributions, particularly in analyzing over-dispersed count data. Several statistical properties of the introduced distribution have been established including moments and moment generating function, residual moments, characterization, entropy, estimation of the parameter by the maximum likelihood method. A bias reduction method is applied to the derived estimator; its existence and uniqueness are discussed. Applications of the goodness of fit of the proposed distribution have been examined and compared with other discrete distributions using three real data sets from biological sciences.

...read moreread less

29 citations

Journal Article•DOI•

Assessment of a generalized discrete time mixed δ-shock model for the multi-state systems

[...]

H. Lorvand¹, Alireza Nematollahi¹, Mohammad Hossein Poursaeed²•Institutions (2)

Shiraz University¹, Lorestan University²

01 Mar 2020-Journal of Computational and Applied Mathematics

TL;DR: The probability mass functions of system’s life time, the time spent by the system in a perfectly functioning state, and the total time spent in partially working states are derived for the proposed model.

...read moreread less

28 citations

Journal Article•DOI•

Probability Analysis of Age of Information in Multi-Hop Networks

[...]

Onur Ayan¹, H. Murat Gursu¹, Arled Papa¹, Wolfgang Kellerer¹•Institutions (1)

Technische Universität München¹

28 Feb 2020

TL;DR: The results show that the performance indicators considered in the literature such as average or peak AoI may give misleading insights into the real AoI performance.

...read moreread less

Abstract: Age-of-information (AoI) is a metric quantifying information freshness at the receiver. It captures the delay together with packet loss and packet generation rate. However, the existing literature focuses on average or peak AoI and neglects the complete distribution. In this letter, we consider a ${N}$ -hop network with time-invariant packet loss probabilities on each link. We derive closed form equations for the probability mass function of AoI. We verify our findings with simulations. Our results show that the performance indicators considered in the literature such as average or peak AoI may give misleading insights into the real AoI performance.

...read moreread less

24 citations

Journal Article•DOI•

A New Three-Parameter Discrete Distribution with Associated INAR(1) Process and Applications

[...]

M. S. Eliwa, Emrah Altun, Mohamed El-Dawoody, M. El-Morshedy

01 Jan 2020-IEEE Access

TL;DR: To demonstrate the importance of the proposed distribution, three data sets on coronavirus, length of stay at psychiatric ward and monthly counts of larceny calls are analyzed and the maximum likelihood approach is used to estimate the model parameters.

...read moreread less

Abstract: The aim of this article is to propose a new three-parameter discrete Lindley distribution. A wide range of its structural properties are investigated. This includes the shape of the probability mass function, hazard rate function, moments, skewness, kurtosis, index of dispersion, mean residual life, mean past life and stress-strength reliability. These properties are expressed in explicit forms. The maximum likelihood approach is used to estimate the model parameters. A detailed simulation study is carried out to examine the bias and mean square error of the estimators. Using the proposed distribution, a new first-order integer-valued autoregressive process is introduced for the over-dispersed, equi-dispersed and under-dispersed time series of counts. To demonstrate the importance of the proposed distribution, three data sets on coronavirus, length of stay at psychiatric ward and monthly counts of larceny calls are analyzed.

...read moreread less

Posted Content•

Random quantum circuits anti-concentrate in log depth

[...]

Alexander M. Dalzell, Nicholas Hunter-Jones, Fernando G. S. L. Brandão

24 Nov 2020-arXiv: Quantum Physics

TL;DR: The definition of anti-concentration is that the expected collision probability, that is, the probability that two independently drawn outcomes will agree, is only a constant factor larger than if the distribution were uniform, and it is shown that when the 2-local gates are each drawn from the Haar measure, at least $\Omega(n \log(n)$ gates are needed for this condition to be met on an qudit circuit.

...read moreread less

Abstract: We consider quantum circuits consisting of randomly chosen two-local gates and study the number of gates needed for the distribution over measurement outcomes for typical circuit instances to be anti-concentrated, roughly meaning that the probability mass is not too concentrated on a small number of measurement outcomes. Understanding the conditions for anti-concentration is important for determining which quantum circuits are difficult to simulate classically, as anti-concentration has been in some cases an ingredient of mathematical arguments that simulation is hard and in other cases a necessary condition for easy simulation. Our definition of anti-concentration is that the expected collision probability, that is, the probability that two independently drawn outcomes will agree, is only a constant factor larger than if the distribution were uniform. We show that when the 2-local gates are each drawn from the Haar measure (or any two-design), at least $\Omega(n \log(n))$ gates (and thus $\Omega(\log(n))$ circuit depth) are needed for this condition to be met on an $n$ qudit circuit. In both the case where the gates are nearest-neighbor on a 1D ring and the case where gates are long-range, we show $O(n \log(n))$ gates are also sufficient, and we precisely compute the optimal constant prefactor for the $n \log(n)$. The technique we employ relies upon a mapping from the expected collision probability to the partition function of an Ising-like classical statistical mechanical model, which we manage to bound using stochastic and combinatorial techniques.

...read moreread less

Posted Content•

Learning and Testing Junta Distributions with Subcube Conditioning

[...]

Xi Chen¹, Rajesh Jayaram², Amit Levi³, Erik Waingarten¹•Institutions (3)

Columbia University¹, Carnegie Mellon University², University of Waterloo³

26 Apr 2020-arXiv: Data Structures and Algorithms

TL;DR: The results show that subcube conditioning, as a natural model for accessing high-dimensional distributions, enables significant savings in learning and testing junta distributions compared to the standard sampling model.

...read moreread less

Abstract: We study the problems of learning and testing junta distributions on $\{-1,1\}^n$ with respect to the uniform distribution, where a distribution $p$ is a $k$-junta if its probability mass function $p(x)$ depends on a subset of at most $k$ variables. The main contribution is an algorithm for finding relevant coordinates in a $k$-junta distribution with subcube conditioning [BC18, CCKLW20]. We give two applications: 1. An algorithm for learning $k$-junta distributions with $\tilde{O}(k/\epsilon^2) \log n + O(2^k/\epsilon^2)$ subcube conditioning queries, and 2. An algorithm for testing $k$-junta distributions with $\tilde{O}((k + \sqrt{n})/\epsilon^2)$ subcube conditioning queries. All our algorithms are optimal up to poly-logarithmic factors. Our results show that subcube conditioning, as a natural model for accessing high-dimensional distributions, enables significant savings in learning and testing junta distributions compared to the standard sampling model. This addresses an open question posed by Aliakbarpour, Blais, and Rubinfeld [ABR17].

...read moreread less

Journal Article•DOI•

Tweedie gradient boosting for extremely unbalanced zero-inflated data

[...]

He Zhou¹, Yi Yang², Wei Qian³•Institutions (3)

University of Minnesota¹, McGill University², University of Delaware³

11 Jul 2020-Communications in Statistics - Simulation and Computation

TL;DR: The Tweedie's compound Poisson model is a popular method to model insurance claims with probability mass at zero and nonnegative, highly right-skewed distribution as discussed by the authors.

...read moreread less

Abstract: Tweedie’s compound Poisson model is a popular method to model insurance claims with probability mass at zero and nonnegative, highly right-skewed distribution. In particular, it is not uncommon to ...

...read moreread less

Journal Article•DOI•

Design and Analysis of Rotated-QAM Based Probabilistic Shaping Scheme for Rayleigh Fading Channels

[...]

Yao Yao, Kexin Xiao¹, Bin Xia¹, Qingjian Gu¹•Institutions (1)

Shanghai Jiao Tong University¹

04 Feb 2020-IEEE Transactions on Wireless Communications

TL;DR: The closed-form pairwise error probability is derived for the first time, which also validates the superiority of the proposed PS scheme in comparison with the uniform schemes with elaborate rotations.

...read moreread less

Abstract: Probabilistic shaping (PS) combined with bipolar amplitude shift keying modulation is a promising transmission scheme. However, such 1-dimensional PS scheme inevitably suffers shaping loss in Rayleigh fading channels due to the lack of signal diversity. In this paper, we design a rotated quadrature amplitude modulation based PS strategy. With the discrete inputs, the ergodic constellation constrained (CC) capacity is derived based on the mutual information conditioned on full channel state information available at the receiver, which is found to be mutually influenced by both the probability mass function (PMF) and the rotation angle (RA) of the input signals. Furthermore, the closed-form cutoff rate is also derived to simplify the analysis, which is the lower bound of the mutual information with finite decoding complexity. Correspondingly, the optimizations of RA and PMF are both considered for maximizing the ergodic CC capacity and the cutoff rate, respectively, where the single parameter RA is firstly obtained by the exhaustive search. And then, the optimal PMFs of the subproblem for the fixed RAs and the fixed constellation scaling factor are obtained. Furthermore, with this non-equiprobable transmission scheme, the closed-form pairwise error probability is derived for the first time, which also validates the superiority of the proposed PS scheme in comparison with the uniform schemes with elaborate rotations.

...read moreread less

Journal Article•DOI•

Virtual linguistic trust degree-based evidential reasoning approach and its application to emergency response assessment of railway station

[...]

Jianmei Ye¹, Zeshui Xu¹, Xunjie Gou¹•Institutions (1)

Sichuan University¹

01 Mar 2020-Information Sciences

TL;DR: This paper proposes a novel evidential reasoning approach that combines the virtual linguistic trust degree with basic unit-interval monotonic function and presents the whole framework of an extended evidential Reasoning algorithm.

...read moreread less

Posted Content•

aphBO-2GP-3B: A budgeted asynchronously-parallel multi-acquisition for known/unknown constrained Bayesian optimization on high-performing computing architecture.

[...]

Anh Tran¹, Scott McCann, John M. Furlan, Krishnan V. Pagalthivarthi, Robert J. Visintainer, Tim Wildey - Show less +2 more•Institutions (1)

Sandia National Laboratories¹

20 Mar 2020-arXiv: Machine Learning

TL;DR: An asynchronous constrained batch-parallel Bayesian optimization method is proposed to efficiently solve the computationally-expensive simulation-based optimization problems on the HPC platform, with a budgeted computational resource, where the maximum number of simulations is a constant.

...read moreread less

Abstract: High-fidelity complex engineering simulations are highly predictive, but also computationally expensive and often require substantial computational efforts. The mitigation of computational burden is usually enabled through parallelism in high-performance cluster (HPC) architecture. In this paper, an asynchronous constrained batch-parallel Bayesian optimization method is proposed to efficiently solve the computationally-expensive simulation-based optimization problems on the HPC platform, with a budgeted computational resource, where the maximum number of simulations is a constant. The advantages of this method are three-fold. First, the efficiency of the Bayesian optimization is improved, where multiple input locations are evaluated massively parallel in an asynchronous manner to accelerate the optimization convergence with respect to physical runtime. This efficiency feature is further improved so that when each of the inputs is finished, another input is queried without waiting for the whole batch to complete. Second, the method can handle both known and unknown constraints. Third, the proposed method considers several acquisition functions at the same time and sample based on an evolving probability mass distribution function using a modified GP-Hedge scheme, where parameters are corresponding to the performance of each acquisition function. The proposed framework is termed aphBO-2GP-3B, which corresponds to asynchronous parallel hedge Bayesian optimization with two Gaussian processes and three batches. The aphBO-2GP-3B framework is demonstrated using two high-fidelity expensive industrial applications, where the first one is based on finite element analysis (FEA) and the second one is based on computational fluid dynamics (CFD) simulations.

...read moreread less

Journal Article•DOI•

On the two-parameter Bell–Touchard discrete distribution

[...]

Fredy Castellares¹, Artur J. Lemonte², Germán Moreno–Arenas³•Institutions (3)

Universidade Federal de Minas Gerais¹, Federal University of Rio Grande do Norte², Industrial University of Santander³

01 Oct 2020-Communications in Statistics-theory and Methods

TL;DR: In this paper, a two-parameter discrete distribution for counting data is introduced, where the probability mass function is very simple and it may have a zero v... and the probability distribution may be useful for modeling count data.

...read moreread less

Abstract: In this paper we introduce a new two-parameter discrete distribution which may be useful for modeling count data. Additionally, the probability mass function is very simple and it may have a zero v...

...read moreread less

Journal Article•DOI•

Entropy: (1) The former trouble with particle-tracking simulation, and (2) A measure of computational information penalty

[...]

David A. Benson¹, Stephen Pankavich¹, Michael J. Schmidt¹, Guillem Sole-Mari²•Institutions (2)

Colorado School of Mines¹, Polytechnic University of Catalonia²

01 Mar 2020-Advances in Water Resources

TL;DR: Computational means are used to demonstrate the fitness of each of these MTPT methods for simulating 1-D advective-dispersive transport with uniform coefficients and reveal that increased accuracy is not always justified relative to increased computational complexity.

...read moreread less

Journal Article•DOI•

Empirical geodesic graphs and CAT(k) metrics for data analysis

[...]

Kei Kobayashi¹, Henry P. Wynn²•Institutions (2)

Keio University¹, London School of Economics and Political Science²

01 Feb 2020-Statistics and Computing

TL;DR: In this article, a methodology is developed for data analysis based on empirically constructed geodesic metric spaces, which reveal properties of the data based on geometry, such as those that are difficult to see from the raw Euclidean distances.

...read moreread less

Abstract: A methodology is developed for data analysis based on empirically constructed geodesic metric spaces. For a probability distribution, the length along a path between two points can be defined as the amount of probability mass accumulated along the path. The geodesic, then, is the shortest such path and defines a geodesic metric. Such metrics are transformed in a number of ways to produce parametrised families of geodesic metric spaces, empirical versions of which allow computation of intrinsic means and associated measures of dispersion. These reveal properties of the data, based on geometry, such as those that are difficult to see from the raw Euclidean distances. Examples of application include clustering and classification. For certain parameter ranges, the spaces become CAT(0) spaces and the intrinsic means are unique. In one case, a minimal spanning tree of a graph based on the data becomes CAT(0). In another, a so-called “metric cone” construction allows extension to CAT(k) spaces. It is shown how to empirically tune the parameters of the metrics, making it possible to apply them to a number of real cases.

...read moreread less

Journal Article•DOI•

Theoretical bounds and approximation of the probability mass function of future hospital bed demand

[...]

Samuel Davis¹, Nasser Fard¹•Institutions (1)

Northeastern University¹

01 Mar 2020-Health Care Management Science

TL;DR: The theoretical bounds on optimal bed demand prediction accuracy are defined and a flexible statistical model to approximate the probability mass function of future bed demand is developed to approximate that of an ideal forecast is developed.

...read moreread less

Abstract: Failing to match the supply of resources to the demand for resources in a hospital can cause non-clinical transfers, diversions, safety risks, and expensive under-utilized resource capacity. Forecasting bed demand helps achieve appropriate safety standards and cost management by proactively adjusting staffing levels and patient flow protocols. This paper defines the theoretical bounds on optimal bed demand prediction accuracy and develops a flexible statistical model to approximate the probability mass function of future bed demand. A case study validates the model using blinded data from a mid-sized Massachusetts community hospital. This approach expands upon similar work by forecasting multiple days in advance instead of a single day, providing a probability mass function of demand instead of a point estimate, using the exact surgery schedule instead of assuming a cyclic schedule, and using patient-level duration-varying length-of-stay distributions instead of assuming patient homogeneity and exponential length of stay distributions. The primary results of this work are an accurate and lengthy forecast, which provides managers better information and more time to optimize short-term staffing adaptations to stochastic bed demand, and a derivation of the minimum mean absolute error of an ideal forecast.

...read moreread less

Posted Content•

KALE: When Energy-Based Learning Meets Adversarial Training.

[...]

Michael Arbel, Liang Zhou, Arthur Gretton

10 Mar 2020

TL;DR: This work usesLegendre duality to provide a variational lowerbound for the Kullback-Leibler divergence, and shows that this estimator, the KL Approximate Lower-bound Estimate (KALE), provides a maximum likelihood estimate (MLE) and extends this procedure to adversarial training.

...read moreread less

Posted Content•

A differentiable measure of pointwise shared information.

[...]

Abdullah Makkeh, Aaron J. Gutknecht, Michael Wibral

09 Feb 2020

TL;DR: An operational interpretation of the measure is given based on the decisions that an agent should take if given only the shared information, which lends itself for example to local learning in artificial neural networks.

...read moreread less

Abstract: Partial information decomposition (PID) of the multivariate mutual information describes the distinct ways in which a set of source variables contains information about a target variable. The groundbreaking work of Williams and Beer has shown that this decomposition can not be determined from classic information theory without making additional assumptions, and several candidate measures have been proposed, often drawing on principles from related fields such as decision theory. None of these measures is differentiable with respect to the underlying probability mass function. We here present a novel measure that draws only on the principle linking the local mutual information to exclusion of probability mass. This principle is foundational to the original definition of the mutual information by Fano. We reuse this principle to define a measure of shared information based on the shared exclusion of probability mass by the realizations of source variables. Our measure is differentiable and well-defined for individual realizations of the random variables. Thus, it lends itself for example to local learning in artificial neural networks. We show that the measure can be interpreted as local mutual information with the help of an auxiliary variable. We also show that it has a meaningful Moebius inversion on a redundancy lattice and obeys a target chain rule. We give an operational interpretation of the measure based on the decisions that an agent should take if given only the shared information.

...read moreread less

Journal Article•DOI•

Density and distribution evaluation for convolution of independent gamma variables

[...]

Chaoran Hu¹, Vladimir Pozdnyakov¹, Jun Yan¹•Institutions (1)

University of Connecticut¹

01 Mar 2020-Computational Statistics

TL;DR: The Kummer confluent hypergeometric function is demonstrated to be computationally most efficient and a new computationally efficient formula is derived for the probability mass function of the number of renewals by a given time.

...read moreread less

Abstract: Convolutions of independent gamma variables are encountered in many applications such as insurance, reliability, and network engineering. Accurate and fast evaluations of their density and distribution functions are critical for such applications, but no open source, user-friendly software implementation has been available. We review several numerical evaluations of the density and distribution of convolution of independent gamma variables and compare them with respect to their accuracy and speed. The methods that are based on the Kummer confluent hypergeometric function are computationally most efficient. The benefit of employing the confluent hypergeometric function is further demonstrated by a renewal process application, where the gamma variables in the convolution are Erlang variables. We derive a new computationally efficient formula for the probability mass function of the number of renewals by a given time. An R package coga provides efficient C++ based implementations of the discussed methods and are available in CRAN.

...read moreread less

Journal Article•DOI•

Least squares estimation of a completely monotone pmf: From Analysis to Statistics

[...]

Fadoua Balabdaoui¹, Gabriella de Fournas-Labrosse¹•Institutions (1)

ETH Zurich¹

01 Jan 2020-Journal of Statistical Planning and Inference

TL;DR: In this paper, the class of completely monotone probability mass functions (pmf) from a statistical perspective is considered, and it is shown that the complete monotonous least squares estimator exists, is strongly consistent and converges weakly to the truth at the n-rate.

...read moreread less

Journal Article•DOI•

Discrete time Markov chain model for age of information

[...]

Attahiru Sule Alfa¹•Institutions (1)

University of Manitoba¹

01 Sep 2020-Operations Research Letters

TL;DR: The system is modelled as a discrete time Markov chain and matrix-geometric method is used to obtain the probability mass function of the AoI.

...read moreread less

Journal Article•DOI•

Dynamic Image Sampling Using a Novel Variance Based Probability Mass Function

[...]

Simon Grosche¹, Michael Koller¹, Jurgen Seiler¹, Andre Kaup¹•Institutions (1)

University of Erlangen-Nuremberg¹

01 Jan 2020-IEEE Transactions on Computational Imaging

TL;DR: A novel probabilistic approach to dynamic image sampling (PADIS) which is based on a data driven probability mass function which uses a local variance map and the runtime is faster than for the other methods.

...read moreread less

Abstract: Incremental sampling can be applied in scientific imaging techniques whenever the measurements are taken incrementally, i.e., one pixel position is measured at a time. It can be used to reduce the measurement time as well as the dose impinging onto a specimen. For incremental sampling, the choice of the sampling pattern plays a major role in order to achieve a high reconstruction quality. Besides using static incremental sampling patterns, it is also possible to dynamically adapt the sampling pattern based on the already measured data. This is called dynamic sampling and allows for a higher reconstruction quality, as the inhomogeneity of the sampled image content can be taken into account. Several approaches for dynamic sampling have been published in the literature. However, they share the common drawback that homogeneous regions are sampled too late. This reduces the reconstruction quality as fine details can be missed. We overcome this drawback using a novel probabilistic approach to dynamic image sampling (PADIS). It is based on a data driven probability mass function which uses a local variance map. In our experiments, we evaluate the reconstruction quality for scanning electron microscopy images as well as for natural image content. For scanning electron microscopy images with a sampling density of 35% and frequency selective reconstruction, our approach achieves a PSNR gain of +0.92 dB compared to other dynamic sampling approaches and +1.42 dB compared to the best static patterns. For natural images, even higher gains are achieved. Experiments with additional measurement noise show that for our method the sampling patterns are more stable. Moreover, the runtime is faster than for the other methods.

...read moreread less

Journal Article•DOI•

Discrete uniform and binomial distributions with infinite support

[...]

Andrey Pepelyshev¹, Anatoly Zhigljavsky¹•Institutions (1)

Cardiff University¹

31 Dec 2020

TL;DR: An approximation for the probability mass function of the binomial distribution Bin Bin with p=c/\textcircled {1}^{\alpha }$$ with $$1/2<\alpha \le 1$$ .

...read moreread less

Abstract: We study properties of two probability distributions defined on the infinite set {0,1,2,…} and generalizing the ordinary discrete uniform and binomial distributions. Both extensions use the grossone-model of infinity. The first of the two distributions we study is uniform and assigns masses 1/\textcircled1 to all points in the set {0,1,…,\textcircled1−1}, where \textcircled1 denotes the grossone. For this distribution, we study the problem of decomposing a random variable ξ with this distribution as a sum ξ=dξ1+⋯+ξm, where ξ1,…,ξm are independent non-degenerate random variables. Then, we develop an approximation for the probability mass function of the binomial distribution Bin(\textcircled1,p) with p=c/\textcircled1α with 1/2<α≤1. The accuracy of this approximation is assessed using a numerical study.

...read moreread less

Journal Article•DOI•

Postprocessing the outputs of an interacting multiple-model Kalman filter using a Markovian trellis to estimate parameter values of aged Li-ion cells

[...]

Adam J. Smiley¹, Willie K. Harrison², Gregory L. Plett¹•Institutions (2)

University of Colorado Colorado Springs¹, Brigham Young University²

01 Feb 2020-Journal of energy storage

TL;DR: Methods to analyze the time-series probability mass function produced by the IMM using the Viterbi and BCJR algorithms in common use in the digital-communications discipline to identify the “best path” through the space of available models over time, based on the likelihoods produced byThe IMM and the Markovian transition probabilities.

...read moreread less

Abstract: Lithium-ion battery modeling for use in battery management systems requires models that can adapt to the changing behavior of a cell as it ages. One method to enable this adaptivity is to select the most representative model from a set of “pre-aged” models that represent the characteristics of the cell as different cyclic and calendar aging processes occur. By modeling the aging of a cell as a Markovian process, an interacting multiple-model Kalman filter (IMM) can be utilized to determine a time-varying probability mass function that specifies the probability that each of the models under consideration is the best representation of the cell under observation. While the output of the IMM is useful by itself, its predictions can be improved by post-processing. In this paper, we present methods to analyze the time-series probability mass function produced by the IMM using the Viterbi and BCJR algorithms in common use in the digital-communications discipline. These algorithms seek to identify the “best path” through the space of available models over time, based on the likelihoods produced by the IMM and the Markovian transition probabilities. Through the use of these post-processing algorithms, confidence in the best-fitting model can be improved.

...read moreread less

Posted Content•

Exponential Dispersion Models for Overdispersed Zero-Inflated Count Data

[...]

Shaul K. Bar-Lev¹, Ad Ridder²•Institutions (2)

Holon Institute of Technology¹, VU University Amsterdam²

30 Mar 2020-arXiv: Methodology

TL;DR: The framework of three new classes of exponential dispersion models of discrete probability distributions defined by specifying their variance functions in their mean value parameterization are developed and proved to be overdispersed and zero inflated, making them as competitive statistical models for those in use in statistical modeling.

...read moreread less

Abstract: We consider three new classes of exponential dispersion models of discrete probability distributions which are defined by specifying their variance functions in their mean value parameterization. In a previous paper (Bar-Lev and Ridder, 2020a), we have developed the framework of these classes and proved that they have some desirable properties. Each of these classes was shown to be overdispersed and zero inflated in ascending order, making them as competitive statistical models for those in use in statistical modeling. In this paper we elaborate on the computational aspects of their probability mass functions. Furthermore, we apply these classes for fitting real data sets having overdispersed and zero-inflated statistics. Classic models based on Poisson or negative binomial distributions show poor fits, and therefore many alternatives have already proposed in recent years. We execute an extensive comparison with these other proposals, from which we may conclude that our framework is a flexible tool that gives excellent results in all cases. Moreover, in most cases our model gives the best fit.

...read moreread less